[kaggle] Learn Tutorial_Pandas (정리)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

yoooniverse

[kaggle] Learn Tutorial_Pandas $정리$ _2 본문

카테고리 없음

[kaggle] Learn Tutorial_Pandas $정리$ _2

Ykl 2022. 11. 24. 01:00

< 5 > Data Types and Missing Values

this is about) how to investigate data types within a DataFrame or Series

also going to learn about)how to find and replace entries.

Dtypes

What is Dtype? The data type for a column in a DataFrame or a Series

1. dtype, dtypes, astype

- dtypes: returns the dtype of every column in the DataFrame

- keep in mind) columns consisting entirely of strings do not get their own type; they are given the object type

reviews.price.dtype			# dtype('float64')
reviews.dtypes     #shows each coloumns' dtype

'''
country                   object
description               object
designation               object
points                     int64
price                    float64
province                  object
region_1                  object
region_2                  object
taster_name               object
taster_twitter_handle     object
title                     object
variety                   object
winery                    object
dtype: object
'''

- astype

: to convert a column of one type into another

reviews.points.astype('float64')

- even index of DataFrame or Series has its own dtype

reviews.index.dtype			# dtype('int64')

Missing Data

NaN: "Not a Number", 'float64' dtype

1. pd.isnull

: to select NaN values

reviews[pd.isnull(reviews.country)]     #find if 'country' value is NaN

2. pd.notnull

: to select not NaN values

reviews[pd.notnull(reviews.country)]

3. fillna

: Replacing missing values

reviews.region_2.fillna("Unknown")

4. replace $"A", "B"$ // replace "A" to "B"

: Used at non-null value that we would like to replace

reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")

< 6 > Renaming and Combining

Renaming

$1$ rename(columns={'A': 'B'}) # column 이름 A를 B로 변경

: lets you change index names and/or column names

rename index or column values by specifying an index or column keyword parameter, respectively

reviews.rename(columns={'points': 'score'})

$2$ rename(index={0 : 'firstEntry', 1 : 'secondEntry'}) #row index 숫자를 문자열로 변경

reviews.rename(index={0:'firstEntry', 1:'secondEntry'})

$3$ rename_axis("name_you_want", axis='rows or columns')

: Both the row index and the column index can have their own name attribute

reviews.rename_axis("fields", axis='columns').rename_axis("wines", axis='rows')

Combining

1. concat

: The simplest combining method

This is useful when we have data in different DataFrame or Series objects but have the same fields $columns$ .

canadian_youtube = pd.read_csv("/content/drive/MyDrive/Kaggle/project0/CAvideos.csv")
british_youtube = pd.read_csv("/content/drive/MyDrive/Kaggle/project0/GBvideos.csv")

pd.concat([canadian_youtube, british_youtube])

2. join

: The middlemost combiner in terms of complexity

combine different DataFrame objects which have an index in common.

parameters

lsuffix : 중복된 column이 있을 경우 left DataFrame의 column명에 붙일 suffix

rsuffix : 중복된 column이 있을 경우 right DataFrame의 column명에 붙일 suffix

left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])

left.join(right, lsuffix='_CAN', rsuffix='_UK')

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 $권한 있는 경우$	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

yoooniverse

yoooniverse

[kaggle] Learn Tutorial_Pandas $정리$ _2 본문

[kaggle] Learn Tutorial_Pandas $정리$ _2

< 5 > Data Types and Missing Values

< 6 > Renaming and Combining

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

yoooniverse

[kaggle] Learn Tutorial_Pandas 정리정리_2 본문

[kaggle] Learn Tutorial_Pandas 정리정리_2

< 5 > Data Types and Missing Values

< 6 > Renaming and Combining

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[kaggle] Learn Tutorial_Pandas $정리$ _2 본문

[kaggle] Learn Tutorial_Pandas $정리$ _2