[Python] Dataframe ์ปฌ๋Ÿผ ์žฌ๊ตฌ์„ฑ
ยท
Python
column ์ˆœ์„œ ๋ณ€๊ฒฝIn [2]:import pandas as pdimport seaborn as snstitanic = sns.load_dataset('titanic')ML ๋ชจ๋ธ ํ•™์Šต์— ํ•„์š”ํ•œ ๋‹ค์–‘ํ•œ ํ”ผ์ฒ˜๋ฅผ ์ž…๋ ฅํ•  ๋•Œ ์—ด ์ธ๋ฑ์Šค๊ฐ€ ๋’ค์ฃฝ๋ฐ•์ฃฝ์œผ๋กœ ์„ž์ด๊ฒŒ ๋ ˆ์ด๋ธ”๊ณผ ํ”ผ์ฒ˜๋ฅผ ๋ถ„๋ฆฌํ•˜๊ธฐ ๋ถˆํŽธํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๋ฌผ๋ก  ํ•˜๋‚˜ํ•˜๋‚˜ ํ”ผ์ฒ˜๋ฅผ ์ž…๋ ฅํ•ด์„œ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ ๋˜ํ•œ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๊ฐ„๋‹จํžˆ iloc๋ฅผ ์ด์šฉํ•ด์„œ ํ”ผ์ฒ˜์™€ ๋ ˆ์ด๋ธ”์„ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ํ”ผ์ฒ˜๊ฐ€ ๋ณ€๊ฒฝ๋˜์–ด๋„ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.In [8]:# lable์„ ๋ถ„๋ฆฌํ•˜๋Š” ์˜ˆ์‹œtitanic.iloc[:,0]Out[8]:0 01 12 13 14 0 ..886 0887 1888 0889 1890 0N..
[Python] Dataframe ๋ณ‘ํ•ฉ
ยท
Python
๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘ํ•ฉ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์ž‘์—…์„ ์ง„ํ–‰ํ•˜๋‹ค๋ณด๋ฉด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„๊ณ  ๋˜๋‹ค์‹œ ํ•˜๋‚˜๋กœ ํ•ฉ์น˜๋Š” ๋“ฑ์˜ ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘ํ•ฉ ๋ฉ”์†Œ๋“œ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃจ์–ด๋ณด์ž.pd.concat()concat์ด๋ผ๋Š” ํ•จ์ˆ˜๋ช…์—์„œ๋„ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด concat ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์—ฐ๊ฒฐ์‹œ์ผœ์ฃผ๋Š” ํ•จ์ˆ˜์ด๋‹ค.๋จผ์ € ๋ณ‘ํ•ฉ์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๋‘๊ฐœ ์ƒ์„ฑํ•ด๋ณด์žIn [5]:import pandas as pddf1 = pd.DataFrame({'a':['a0','a1','a2','a3'], 'b':['b0','b1','b2','b3'], 'c':['c0','c1','c2','c3']}, index..
[Python] pd.cut ๊ณผ pd.qcut ๋น„๊ต
ยท
Python
pd.cut() ํ•จ์ˆ˜์™€ pd.cut() ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜๋ฅผ ํŠน์ • ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆˆ ๋ฒ”์ฃผํ˜• ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์œ„ ํ•จ์ˆ˜๋“ค์„ ์ด์šฉํ•˜์—ฌ ํŠน์ • ๊ตฌ๊ฐ„๋“ค์— ๋Œ€ํ•œ ๊ทธ๋ฃน๋ณ„ ํ†ต๊ณ„๋Ÿ‰์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค.์œ„์˜ ๋‘ ํ•จ์ˆ˜์˜ ์ฐจ์ด์ ์€ ์•„๋ž˜์˜ ๋„ํ‘œ๋ฅผ ํ†ตํ•ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. cut์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋™์ผํ•œ ๊ธธ์ด๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด๊ณ  qcut์€ ๋™์ผํ•œ ๊ฐฏ์ˆ˜๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด๋‹ค.pd.cut()๋จผ์ € ์˜ˆ์ œ๋กœ ์‚ฌ์šฉํ•  ํƒ€์ดํƒ€๋‹‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.In [1]:import pandas as pdimport seaborn as snsimport matplotlib.pyplot as plttitanic = sns.load_dataset('titanic')titanic.head()์œ„ ๋ฐ์ดํ„ฐ์—์„œ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜๋กœ age ์ปฌ๋Ÿผ์ด ์กด์žฌํ•œ๋‹ค. ์ด age ์ปฌ๋Ÿผ..
[Python] Boolean Indexing
ยท
Python
ํŠน์ • ๊ฐ’์ด ์ตœ๋Œ€๊ฐ’์„ ๊ฐ€์ง€๋Š” ํ–‰ ์ถ”์ถœIn [2]:import pandas as pdimport numpy as npimport seaborn as snsdf = sns.load_dataset('titanic')df.head()Out[2]:boolean indexing์‹œ๋ฆฌ์ฆˆ ๊ฐ์ฒด์— ํŠน์ • ์กฐ๊ฑด์‹์„ ์ ์šฉํ•ด ํ•ด๋‹น ์กฐ๊ฑด์— ์ฐธ์ธ ํ–‰(row)์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด์„  boolean indexing์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.๋‹ค์Œ์˜ ์˜ˆ์ œ๋กœ ์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ด๋ฉด์„œ ์ƒ์กดํ•œ ์ธ์›์— ๋Œ€ํ•œ ํ–‰๋“ค์„ ์ถ”์ถœํ•ด๋ณด์žIn [19]:condition = (df.sex == 'male') & (df.survived ==1) # ์กฐ๊ฑด์‹ ์ž‘์„ฑdf[condition]Out[19]:In [20]:df.loc[condition] # loc๋ฅผ ์ด์šฉํ•ด๋„ ๋ฌด๋ฐฉํ•˜๋‹ค.Out[20]..
[Python] lambda
ยท
Python
lambda ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•ธ๋“ค๋ง ํ•˜๊ธฐ ์œ„ํ•ด pandas์™€ ๊ด€๋ จ๋œ map, apply ๋“ฑ๊ณผ ํ•จ๊ป˜ ์œ ์šฉํ•˜๊ฒŒ ์“ฐ์ธ๋‹ค.def ํ‚ค์›Œ๋“œ๋กœ ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ๋˜ํ•œ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๊ฐ„๋‹จํ•œ ์ „์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด์„  1ํšŒ์šฉ ํ•จ์ˆ˜์ธ lambda ํ•จ์ˆ˜๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.์ผ๋ฐ˜์ ์ธ ํ•จ์ˆ˜์™€ lambda ํ•จ์ˆ˜์˜ ๋น„๊ต์ผ๋ฐ˜์ ์ธ def ํ‚ค์›Œ๋“œ๋กœ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.In [2]:def plus(left, right): result = left + right return resultplus(10, 20)Out[2]:30์ด๋ฒˆ์—๋Š” ๊ฐ™์€ ํ•จ์ˆ˜๋ฅผ lambda ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ ๋งŒ๋“ค์–ด๋ณด์žIn [6]:f = lambda x, y:x + yf(10, 20)Out[6]:30์œ„์™€ ๊ฐ™์ด lambda์— ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•˜๊ณ  :(์ฝœ๋ก ) ๋’ค ๋ฐ˜ํ™˜๊ฐ’์œผ..
[Python] map, apply, applymap
ยท
Python
map()mapํ•จ์ˆ˜๋Š” DataFrame์˜ Seriesํƒ€์ž…์„ Input์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค.๋”ฐ๋ผ์„œ mapํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜๋ฉด Series๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์›ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.In [1]:import pandas as pdIn [2]:df = pd.DataFrame({"age":[30,25,25,12,40], "height":[178,180,160,140,176], "weight":[80,100,55,40,70]})dfOut[2]:ageheightweight01234301788025180100251605512140404017670In [4]:df['weight(pound)'] = df['weight'].map(lambda x: x*2.205)dfOut..
[Python] ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”
ยท
Python
ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์ด๋ž€?ํ”ผ๋ฒ—(Pivot) ํ…Œ์ด๋ธ”์ด๋ž€ ์ˆ˜๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ ์†์—์„œ ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋งŒ ๋ฝ‘์•„ ์œ ์˜๋ฏธํ•œ ํ‘œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค.์—‘์…€์—์„œ ์ด ๊ธฐ๋Šฅ์ด ์ž์ฃผ ์‚ฌ์šฉ๋˜๋ฉฐ Pandas๋ฅผ ์ด์šฉํ•ด์‚ฌ์šฉ์ž ์ž„์˜๋Œ€๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ธ๋“ค๋ง ํ•  ์ˆ˜ ์žˆ๋‹ค.data.pivot() | python์„ ์ด์šฉํ•œ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ# ์˜ˆ์ œ๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๋ฐ์ดํ„ฐ ๋กœ๋“œimport numpy as npimport pandas as pdimport seaborn as snstips = sns.load_dataset('tips')titanic = sns.load_dataset('titanic')iris = sns.load_dataset('iris')์˜ˆ์ œ๋ฅผ ์œ„ํ•ด seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ œ๊ณตํ•˜๋Š” tips ํŒ, iris ๊ฝƒ์žŽ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ–ˆ๋‹ค.tips ๋ฐ..
[Airflow] ์ฒซ DAG ๋งŒ๋“ค๊ธฐ
ยท
Airflow
๐Ÿ”จ ์ฒซ DAG ๋งŒ๋“ค๊ธฐDAG๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋จผ์ € dags ํด๋”๋ฅผ ํ•˜๋‚˜ ๋งŒ๋“ค๊ณ  tutorials.py ๋ผ๋Š” DAG๋ฅผ ์ž‘์„ฑํ•  python ํŒŒ์ผ์„ ์ƒ์„ฑํ•œ๋‹ค.์ฒ˜์Œ์œผ๋กœ ๋งŒ๋“ค์–ด๋ณผ DAG๋Š” 3๊ฐœ์˜ Task๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.Task1 : ํ˜„์žฌ ๋‚ ์งœ์™€ ์‹œ๊ฐ„์„ ๋ฌธ์ž์—ด๋กœ ์ถœ๋ ฅTask2 : 5์ดˆ ๋™์•ˆ ๋Œ€๊ธฐ ์ƒํƒœ ์œ ์ง€Task3 : "Hello Airflow!" ๋ฌธ์ž์—ด์„ 5๋ฒˆ ์ถœ๋ ฅTask4 : Gmail๋กœ ์ž‘์—… ์™„๋ฃŒ ๋ฉ”์ผ ์ „์†กDAG๊ฐ€ ํŠธ๋ฆฌ๊ฑฐ๋˜๋ฉด Task1์ด ๊ฐ€์žฅ ๋จผ์ € ์ˆ˜ํ–‰๋˜๊ณ  Task2, Task3๊ฐ€ ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰๋˜๊ณ  ๊ทธ ์ดํ›„ Task4๊ฐ€ ์ˆ˜ํ–‰๋œ๋‹ค.webserver์—์„œ Graph ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚ด๊ฐ€ ๋งŒ๋“  DAG๊ฐ€ ์–ด๋–ค ์ˆœ์„œ๋กœ ๋™์ž‘ํ•˜๋Š”์ง€ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.๐Ÿช„ Importing ModulesDAG ๊ฐ์ฒด๋ฅผ ..
[Airflow] Airflow ์„ธํŒ…ํ•˜๊ธฐ
ยท
Airflow
โš™๏ธ Airflow ์„ค์น˜Airflow๋ฅผ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด ๋จผ์ € ๊ฐ€์ƒํ™˜๊ฒฝ์„ ์„ธํŒ…ํ•œ๋‹ค.๊ฐœ์ธ์ ์œผ๋กœ poetry๊ฐ€ ํŽธํ•ด poetry๋ฅผ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ venv๋ฅผ ์‚ฌ์šฉํ•ด๋„ ๋ฌด๋ฐฉํ•˜๋‹ค.poetry add apache-airflow๐Ÿ›ข๏ธ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ดˆ๊ธฐํ™”๊ฐ€์ƒํ™˜๊ฒฝ์—์„œ apache-airflow๋ฅผ ์„ค์น˜ํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ initํ•ด์ค€๋‹ค.airflow db init๐Ÿค  Admin ๊ณ„์ • ์ƒ์„ฑwebserver์— ์ ‘์†ํ•˜๊ธฐ ์œ„ํ•œ ํ…Œ์ŠคํŠธ์šฉ Admin ๊ณ„์ •์„ ์ƒ์„ฑํ•œ๋‹ค.๊ณ„์ • ์ƒ์„ฑ ์–‘์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.airflow users create \ --username admin \ --firstname FIRST_NAME \ --lastname LAST_NAME \ --role Admin ..
[Airflow] Airflow๊ฐ€ ๋ฌด์—‡์ธ๊ฐ€?
ยท
Airflow
Airflow๋ž€?Apache Airflow๋Š” ๋ฐฐ์น˜ ์ง€ํ–ฅ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๊ฐœ๋ฐœ, ์Šค์ผ€์ค„๋ง ๋ฐ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ธฐ ์œ„ํ•œ ์˜คํ”ˆ ์†Œ์Šค ํ”Œ๋žซํผ์œผ๋กœ python์„ ์ด์šฉํ•ด ๊ตฌ์ถ• ๊ฐ€๋Šฅํ•˜๋ฉฐ ์›น ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ†ตํ•ด ์›Œํฌํ”Œ๋กœ์šฐ ์ƒํƒœ ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์ด ํŠน์ง•์ด๋‹ค.์ถœ์ฒ˜ : https://airflow.apache.org/docs/apache-airflow/stable/index.htmlWorkflow๋ž€?Workflow๋Š” ํŠน์ • ์ด๋ฒคํŠธ ํ˜น์€ ์Šค์ผ€์ค„์— ์˜ํ•ด ํŠธ๋ฆฌ๊ฑฐ๋˜๋Š” ํƒœ์Šคํฌ์˜ ์‹œํ€€์Šค๋ฅผ ์˜๋ฏธํ•œ๋‹ค.ETL๊ณผ ๊ฐ™์€ ๋น…๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.์ผ๋ฐ˜์ ์ธ Workflow์ „ํ˜•์ ์ธ Workflow๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.๋ฐ์ดํ„ฐ์ˆ˜์ง‘๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ๋ชจ๋‹ˆํ„ฐ๋ง๋ฆฌํฌํŠธ ์ƒ์„ฑ๋ฆฌํฌํŠธ ๋ฐœ์†ก์ถœ์ฒ˜ : https://www.youtube.com/watch?v=AHMm1w..