joyful-pandas(下)学习笔记——第10章 综合练习
一、端午节的淘宝粽子交易
首先读取数据
data_Zongzi
= pd
.read_csv
('Pandas(下)综合练习数据集/端午粽子数据.csv')
data_Zongzi
data_Zongzi
.columns
Index([‘标题’, ’ 价格’, ‘付款人数’, ‘店铺’, '发货地址 '], dtype=‘object’)
(1) 请删除最后一列为缺失值的行,并求所有在杭州发货的商品单价均值。
data_Zongzi
[data_Zongzi
['发货地址 '].isna
()]
data_Zongzi
= data_Zongzi
.dropna
(subset
=['发货地址 '],axis
=0)
data_Zongzi
.info
()
data_hangzhou
= data_Zongzi
[data_Zongzi
['发货地址 '].str.contains
('杭州')]
data_hangzhou
.info
()
data_hangzhou
[' 价格'] = data_hangzhou
[' 价格'].str.findall
(r
'\d+\.?\d*')
data_hangzhou
[' 价格'] = data_hangzhou
[' 价格'].apply(lambda x
:x
[0]).astype
('float')
data_hangzhou
[' 价格']
data_hangzhou
[' 价格'].mean
()
80.90088888888877
(2) 商品标题带有“嘉兴”但发货地却不在嘉兴的商品有多少条记录?
data_jiaxing
= data_Zongzi
[data_Zongzi
['标题'].str.contains
('嘉兴')]
data_jiaxing
data_jiaxing
= data_jiaxing
[data_jiaxing
['发货地址 '].str.contains
('嘉兴') == False]
data_jiaxing
(3) 请按照分位数将价格分为“高、较高、中、较低、低”5 个类别,再将类别结果插入到标题一列之后,最后对类别列进行降序排序。
data_Zongzi
[' 价格'] = data_Zongzi
[' 价格'].str.findall
(r
'\d+\.?\d*')
data_Zongzi
[' 价格'] = data_Zongzi
[' 价格'].apply(lambda x
:x
[0]).astype
('float')
data_Zongzi
[' 价格']
price_02
= data_Zongzi
[' 价格'].quantile
(q
=0.2)
print(price_02
)
price_04
= data_Zongzi
[' 价格'].quantile
(q
=0.4)
print(price_04
)
price_06
= data_Zongzi
[' 价格'].quantile
(q
=0.6)
print(price_06
)
price_08
= data_Zongzi
[' 价格'].quantile
(q
=0.8)
print(price_08
)
price_1
= data_Zongzi
[' 价格'].quantile
(q
=1)
print(price_1
)
22.8 39.8 60.0 106.12000000000008 1780.0
data_Zongzi
['类别结果'] = pd
.cut
(data_Zongzi
[' 价格'], [0,price_02
,price_04
,price_06
,price_08
,price_1
], labels
=['低','较低','中','较高','高'])
data_Zongzi
= data_Zongzi
.reindex
(columns
=['标题','类别结果', ' 价格', '付款人数', '店铺', '发货地址 '])
data_Zongzi
data_Zongzi
= data_Zongzi
.sort_values
(by
='类别结果',ascending
=False)
data_Zongzi
本文是对joyful-pandas教程的学习思考,感谢GYH大佬的分享!感谢Datawhale的学习组织!
joyful-pandas教程地址http://dwz.date/aZCT