Python

[Python] Boolean Indexing

,,๐Ÿชจ,, 2024. 10. 16. 23:57
๋ฐ˜์‘ํ˜•

ํŠน์ • ๊ฐ’์ด ์ตœ๋Œ€๊ฐ’์„ ๊ฐ€์ง€๋Š” ํ–‰ ์ถ”์ถœ


In [2]:

import pandas as pd
import numpy as np
import seaborn as sns

df = sns.load_dataset('titanic')
df.head()

Out[2]:

boolean indexing


์‹œ๋ฆฌ์ฆˆ ๊ฐ์ฒด์— ํŠน์ • ์กฐ๊ฑด์‹์„ ์ ์šฉํ•ด ํ•ด๋‹น ์กฐ๊ฑด์— ์ฐธ์ธ ํ–‰(row)์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด์„  boolean indexing์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

๋‹ค์Œ์˜ ์˜ˆ์ œ๋กœ ์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ด๋ฉด์„œ ์ƒ์กดํ•œ ์ธ์›์— ๋Œ€ํ•œ ํ–‰๋“ค์„ ์ถ”์ถœํ•ด๋ณด์ž

In [19]:

condition = (df.sex == 'male') & (df.survived ==1) # ์กฐ๊ฑด์‹ ์ž‘์„ฑ

df[condition]

Out[19]:

In [20]:

df.loc[condition] # loc๋ฅผ ์ด์šฉํ•ด๋„ ๋ฌด๋ฐฉํ•˜๋‹ค.

Out[20]:

์ด๋ฒˆ์—๋Š” or ์—ฐ์‚ฐ์ž๋ฅผ ์ด์šฉํ•ด pclass๊ฐ€ 1 ํ˜น์€ 2์ธ ํ–‰๋“ค์„ ์ถ”์ถœํ•ด๋ณด์ž.

In [28]:

condition = (df.pclass == 1) | (df.pclass == 2)

df.loc[condition, ['survived', 'sex', 'pclass']] # ํ•„ํ„ฐ๋ง ๋œ row์˜ ํŠน์ • ์ปฌ๋Ÿผ๋งŒ ์ถ”์ถœ

Out[28]:

isin()


sibsp๊ฐ€ 1, 2, 3์˜ ๊ฐ’์„ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด์„  ์•„๋ž˜์˜ ๋ฐฉ๋ฒ•์ฒ˜๋Ÿผ boolean indexing์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

In [31]:

con1 = df['sibsp'] == 1
con2 = df['sibsp'] == 2
con3 = df['sibsp'] == 3

filtered_df = df.loc[con1 | con2 | con3, :]
filtered_df

Out[31]:

์œ„์ฒ˜๋Ÿผ ํŠน์ •ํ•œ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ์˜ ๊ฐ’๋“ค์„ ํฌํ•จํ•˜๋Š” ํ–‰์„ ์ถ”์ถœํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด isin ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋”์šฑ ๊ฐ„ํŽธํ•˜๋‹ค.

In [32]:

isin_con = df['sibsp'].isin([1, 2, 3])
df.loc[isin_con]

Out[32]:

๋ฐ˜์‘ํ˜•