카테고리 없음

스파르타 AI-8기 TIL(10/5)

kimjunki-8 2024. 10. 5. 21:19

Missing Data -> '결측치'

How to check if there is missing data?

1.isna()

2. isnull()

Both of them show the none value(true), additionally, can also find the number of the missing value by using sum()

Then I can delete the row that contains the none value by using dropna()

If I want to delete the column itself, must use axis =  1

Or I can replace it with another value by fillna(), () -> replacement

I also can fill in missing values ​​with mean, median, mode, etc.

It's quite confusing. but It's okay

remember that data['age'] -> 3,2,None,4. 

fillna(data['age].mean()) -> 3 (3+2+4/3 -> 9/3 = 3).

-> data['age'].fillna(3) -> it will replace the missing file to 3 not changing the whole part

data['age'] = 3,2,3,4

interpolation -> is a method to estimate an appropriate value by using data before and after the location where a missing value occurs. -> useful when dealing with date data

 

Additionally, period -> is a parameter that specifies the number of dates to be generated in the pd.date_range() function. now it is using periods=5 to generate a total of 5 dates starting from '2023-01-01'.

Alternatively, I can handle missing values ​​based on certain conditions, for example, filling in missing values ​​based on values ​​in another column.

Remember!!!!!! loc[condition, location] 

condition

data1['type'] == 'aust & data1['age'].isna()

meaning

if there is a data(data1['type']) named aust('aust') return true(true)

if there is a missing data(.isna())

and find the data age('age') and changed the data into 1

 

apply() -> to apply a user-defined function that handles missing values.

if that function receives all the numbers in the age column, it will give NaN to the function. Then if the parameter is missing file, it will return 1.

Good night