What if there is specific data in the column(Null or Not Given)
1. Just exclude data.
select restaurant_name,
avg(rating) avg_rating,
avg(if(ratiing<>'Not given', rating, null))
from food_orders fo
if(rating<>'Not given', rating, null -> if there is 'Not Given' data on the column, change it to null(means there is no data.), or just show the data itself(name of table = rating)
or
can use "is not null"
select a.order_id,
a.customer_id,
a.restaurant_name,
a.price,
b.name,
b.age,
b.gender
from food_orders a left join customers b on a.customer_id=b.customer_id
where b.customer_id is not null
is not null -> means to call all the data that doesn't contain null from b.customer.id
or
can replace the data with another data
coalesce(data table, replacement data)
select a.order_id,
a.customer_id,
a.restaurant_name,
a.price,
b.name,
b.age,
coalesce(b.age, 20) "null 제거",
b.gender
from food_orders a left join customers b on a.customer_id=b.customer_id
where b.age is null
coalesce -> means if there is a null data, it will replace to 20
2. What if there is an abnormal data?
1. set the exact data range by IF statement
As I see, there is abnormal data in there(like age 6, or more than 80) so, I can change it with IF
Here, If there is an age over 80, it will be changed to 80 or lower than 15, it will be changed to 15
Pivot table
-> Aggregating data by two or more criteria means displaying it in an easily viewable arrangement.
Example.
select restaurant_name,
max(if(hh='15', cnt_order, 0)) "15",
max(if(hh='16', cnt_order, 0)) "16",
max(if(hh='17', cnt_order, 0)) "17",
max(if(hh='18', cnt_order, 0)) "18",
max(if(hh='19', cnt_order, 0)) "19",
max(if(hh='20', cnt_order, 0)) "20"
from
(
select a.restaurant_name,
substring(b.time, 1, 2) hh,
count(1) cnt_order
from food_orders a inner join payments b on a.order_id=b.order_id
where substring(b.time, 1, 2) between 15 and 20
group by 1, 2
) a
group by 1
order by 7 desc
From here,
max(if(hh='15', cnt_order, 0)) "15",
max(if(hh='16', cnt_order, 0)) "16",
max(if(hh='17', cnt_order, 0)) "17",
max(if(hh='18', cnt_order, 0)) "18",
max(if(hh='19', cnt_order, 0)) "19",
max(if(hh='20', cnt_order, 0)) "20"
by max-> just consider it as one of the frames to make a column.
Window function
frame->
1. window_function(argument) over (partition by (group standard column) order by (sort by)
1.1 Rank -> the name itself. it will rank certain columns
How to use rank?
select cuisine_type,
restaurant_name,
cnt_order,
rank() over (PARTITION by cuisine_type order by cnt_order desc) ranking
from
(
select cuisine_type,
restaurant_name,
count(1) cnt_order #based on the first statement(cuisine_type) it will count all the columns
from food_orders fo
group by 1, 2
) a
rank() over (partition by cuisine_type order by cnt_order desc) ranking
1. rank() over -> these are pair
2. after that I must write 2 things 2.1 partition by 2.2 order by
2.1 partition by -> It will separate the data based on the name of the table
2.2 order by -> after partition by, must specify how to order the table
Based on this picture, after the American column, It will restart ranking on Chinese columns.
Because these data have been partitioned by partition by
Date function(date)
Turn the column into a date-type
select date,
date(date) -> remember date is a function
from payments
format -> format the data type
select date(date) date_type,
date_format(date(date), '%Y') "년", -> format the date(date) data to %Y form(Y -> year)
date_format(date(date), '%m') "월", -> (m -> month)
date_format(date(date), '%d') "일", -> (d -> day)
date_format(date(date), '%w') "요일" (w -> weekend)
from payments
remember!
from the data,
0 -> Sunday
1 -> Monday
2 -> Tuesday
3 -> Wednesday
4 -> Thursday
5 -> Friday
6 -> Saturday
H.W
1. Create a pivot view of the number of orders by food type and age (age is between 10 and 59 years old)
select cuisine_type,
max(if(age = 10, cnt_orders, 0)) '10대',
max(if(age = 20, cnt_orders, 0)) '20대',
max(if(age = 30, cnt_orders, 0)) '30대',
max(if(age = 40, cnt_orders, 0)) '40대',
max(if(age = 50, cnt_orders, 0)) '50대'
from
(
select cuisine_type,
case when age between 10 and 19 then 10
when age between 20 and 29 then 20
when age between 30 and 39 then 30
when age between 40 and 49 then 40
else '50대' end age,
count(1) cnt_orders
from food_orders f inner join customers c on f.customer_id = c.customer_id
group by 1, 2
) a
group by 1

Done. Good night.