[Python] Dataframe ์ปฌ๋Ÿผ ์žฌ๊ตฌ์„ฑ
ยท
Python
column ์ˆœ์„œ ๋ณ€๊ฒฝIn [2]:import pandas as pdimport seaborn as snstitanic = sns.load_dataset('titanic')ML ๋ชจ๋ธ ํ•™์Šต์— ํ•„์š”ํ•œ ๋‹ค์–‘ํ•œ ํ”ผ์ฒ˜๋ฅผ ์ž…๋ ฅํ•  ๋•Œ ์—ด ์ธ๋ฑ์Šค๊ฐ€ ๋’ค์ฃฝ๋ฐ•์ฃฝ์œผ๋กœ ์„ž์ด๊ฒŒ ๋ ˆ์ด๋ธ”๊ณผ ํ”ผ์ฒ˜๋ฅผ ๋ถ„๋ฆฌํ•˜๊ธฐ ๋ถˆํŽธํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๋ฌผ๋ก  ํ•˜๋‚˜ํ•˜๋‚˜ ํ”ผ์ฒ˜๋ฅผ ์ž…๋ ฅํ•ด์„œ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ ๋˜ํ•œ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๊ฐ„๋‹จํžˆ iloc๋ฅผ ์ด์šฉํ•ด์„œ ํ”ผ์ฒ˜์™€ ๋ ˆ์ด๋ธ”์„ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ํ”ผ์ฒ˜๊ฐ€ ๋ณ€๊ฒฝ๋˜์–ด๋„ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.In [8]:# lable์„ ๋ถ„๋ฆฌํ•˜๋Š” ์˜ˆ์‹œtitanic.iloc[:,0]Out[8]:0 01 12 13 14 0 ..886 0887 1888 0889 1890 0N..
[Python] Dataframe ๋ณ‘ํ•ฉ
ยท
Python
๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘ํ•ฉ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์ž‘์—…์„ ์ง„ํ–‰ํ•˜๋‹ค๋ณด๋ฉด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„๊ณ  ๋˜๋‹ค์‹œ ํ•˜๋‚˜๋กœ ํ•ฉ์น˜๋Š” ๋“ฑ์˜ ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘ํ•ฉ ๋ฉ”์†Œ๋“œ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃจ์–ด๋ณด์ž.pd.concat()concat์ด๋ผ๋Š” ํ•จ์ˆ˜๋ช…์—์„œ๋„ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด concat ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์—ฐ๊ฒฐ์‹œ์ผœ์ฃผ๋Š” ํ•จ์ˆ˜์ด๋‹ค.๋จผ์ € ๋ณ‘ํ•ฉ์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๋‘๊ฐœ ์ƒ์„ฑํ•ด๋ณด์žIn [5]:import pandas as pddf1 = pd.DataFrame({'a':['a0','a1','a2','a3'], 'b':['b0','b1','b2','b3'], 'c':['c0','c1','c2','c3']}, index..
[Python] pd.cut ๊ณผ pd.qcut ๋น„๊ต
ยท
Python
pd.cut() ํ•จ์ˆ˜์™€ pd.cut() ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜๋ฅผ ํŠน์ • ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆˆ ๋ฒ”์ฃผํ˜• ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์œ„ ํ•จ์ˆ˜๋“ค์„ ์ด์šฉํ•˜์—ฌ ํŠน์ • ๊ตฌ๊ฐ„๋“ค์— ๋Œ€ํ•œ ๊ทธ๋ฃน๋ณ„ ํ†ต๊ณ„๋Ÿ‰์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค.์œ„์˜ ๋‘ ํ•จ์ˆ˜์˜ ์ฐจ์ด์ ์€ ์•„๋ž˜์˜ ๋„ํ‘œ๋ฅผ ํ†ตํ•ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. cut์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋™์ผํ•œ ๊ธธ์ด๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด๊ณ  qcut์€ ๋™์ผํ•œ ๊ฐฏ์ˆ˜๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด๋‹ค.pd.cut()๋จผ์ € ์˜ˆ์ œ๋กœ ์‚ฌ์šฉํ•  ํƒ€์ดํƒ€๋‹‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.In [1]:import pandas as pdimport seaborn as snsimport matplotlib.pyplot as plttitanic = sns.load_dataset('titanic')titanic.head()์œ„ ๋ฐ์ดํ„ฐ์—์„œ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜๋กœ age ์ปฌ๋Ÿผ์ด ์กด์žฌํ•œ๋‹ค. ์ด age ์ปฌ๋Ÿผ..
[Python] Boolean Indexing
ยท
Python
ํŠน์ • ๊ฐ’์ด ์ตœ๋Œ€๊ฐ’์„ ๊ฐ€์ง€๋Š” ํ–‰ ์ถ”์ถœIn [2]:import pandas as pdimport numpy as npimport seaborn as snsdf = sns.load_dataset('titanic')df.head()Out[2]:boolean indexing์‹œ๋ฆฌ์ฆˆ ๊ฐ์ฒด์— ํŠน์ • ์กฐ๊ฑด์‹์„ ์ ์šฉํ•ด ํ•ด๋‹น ์กฐ๊ฑด์— ์ฐธ์ธ ํ–‰(row)์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด์„  boolean indexing์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.๋‹ค์Œ์˜ ์˜ˆ์ œ๋กœ ์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ด๋ฉด์„œ ์ƒ์กดํ•œ ์ธ์›์— ๋Œ€ํ•œ ํ–‰๋“ค์„ ์ถ”์ถœํ•ด๋ณด์žIn [19]:condition = (df.sex == 'male') & (df.survived ==1) # ์กฐ๊ฑด์‹ ์ž‘์„ฑdf[condition]Out[19]:In [20]:df.loc[condition] # loc๋ฅผ ์ด์šฉํ•ด๋„ ๋ฌด๋ฐฉํ•˜๋‹ค.Out[20]..
[Python] lambda
ยท
Python
lambda ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•ธ๋“ค๋ง ํ•˜๊ธฐ ์œ„ํ•ด pandas์™€ ๊ด€๋ จ๋œ map, apply ๋“ฑ๊ณผ ํ•จ๊ป˜ ์œ ์šฉํ•˜๊ฒŒ ์“ฐ์ธ๋‹ค.def ํ‚ค์›Œ๋“œ๋กœ ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ๋˜ํ•œ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๊ฐ„๋‹จํ•œ ์ „์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด์„  1ํšŒ์šฉ ํ•จ์ˆ˜์ธ lambda ํ•จ์ˆ˜๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.์ผ๋ฐ˜์ ์ธ ํ•จ์ˆ˜์™€ lambda ํ•จ์ˆ˜์˜ ๋น„๊ต์ผ๋ฐ˜์ ์ธ def ํ‚ค์›Œ๋“œ๋กœ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.In [2]:def plus(left, right): result = left + right return resultplus(10, 20)Out[2]:30์ด๋ฒˆ์—๋Š” ๊ฐ™์€ ํ•จ์ˆ˜๋ฅผ lambda ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ ๋งŒ๋“ค์–ด๋ณด์žIn [6]:f = lambda x, y:x + yf(10, 20)Out[6]:30์œ„์™€ ๊ฐ™์ด lambda์— ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•˜๊ณ  :(์ฝœ๋ก ) ๋’ค ๋ฐ˜ํ™˜๊ฐ’์œผ..
[Python] map, apply, applymap
ยท
Python
map()mapํ•จ์ˆ˜๋Š” DataFrame์˜ Seriesํƒ€์ž…์„ Input์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค.๋”ฐ๋ผ์„œ mapํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜๋ฉด Series๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์›ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.In [1]:import pandas as pdIn [2]:df = pd.DataFrame({"age":[30,25,25,12,40], "height":[178,180,160,140,176], "weight":[80,100,55,40,70]})dfOut[2]:ageheightweight01234301788025180100251605512140404017670In [4]:df['weight(pound)'] = df['weight'].map(lambda x: x*2.205)dfOut..
[Python] ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”
ยท
Python
ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋ž€?ํ”ผ๋ฒ—(Pivot) ํ…Œ์ด๋ธ”์ด๋ž€ ์ˆ˜๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ ์†์—์„œ ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋งŒ ๋ฝ‘์•„ ์œ ์˜๋ฏธํ•œ ํ‘œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค.์—‘์…€์—์„œ ์ด ๊ธฐ๋Šฅ์ด ์ž์ฃผ ์‚ฌ์šฉ๋˜๋ฉฐ Pandas๋ฅผ ์ด์šฉํ•ด์‚ฌ์šฉ์ž ์ž„์˜๋Œ€๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ธ๋“ค๋ง ํ•  ์ˆ˜ ์žˆ๋‹ค.data.pivot() | python์„ ์ด์šฉํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ# ์˜ˆ์ œ๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๋ฐ์ดํ„ฐ ๋กœ๋“œimport numpy as npimport pandas as pdimport seaborn as snstips = sns.load_dataset('tips')titanic = sns.load_dataset('titanic')iris = sns.load_dataset('iris')์˜ˆ์ œ๋ฅผ ์œ„ํ•ด seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ œ๊ณตํ•˜๋Š” tips ํŒ, iris ๊ฝƒ์žŽ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ–ˆ๋‹ค.tips ๋ฐ..