1 ,空值,全部干掉 :data = data.dropna(axis=0)
代码
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = ["PassengerId","Pclass","Fare","Survived","Sex","Age"]
data = data[cols]
data = data.dropna(axis=0)
print(data)
2 ,去重 : data[“Pclass”].to_frame().drop_duplicates()
目的 : 船舱等级有多少种代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = ["PassengerId","Pclass","Fare","Survived","Sex","Age"]
data = data[cols]
data = data.dropna(axis=0)
res = data["Pclass"].to_frame().drop_duplicates()
print(res)
===============================
Pclass
0 3
1 1
9 2
3 ,聚合,平均数 : data.pivot_table(index=“x”,values=“x”,aggfunc=np.mean)
目的 : 求男女生还率代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = ["PassengerId","Pclass","Fare","Survived","Sex","Age"]
data = data[cols]
data = data.dropna(axis=0)
res = data.pivot_table(index="Sex",values="Survived",aggfunc=np.mean)
print(res)
print(type(res))
=====================================
Survived
Sex
female 0.754789
male 0.205298
<class 'pandas.core.frame.DataFrame'>
得到 : dataframe取一个值 : 女性幸存率
4 ,聚合,结果取值 : res.loc[“female”][0]
目的 : 从结果中把一个具体的元素取出来代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = ["PassengerId","Pclass","Fare","Survived","Sex","Age"]
data = data[cols]
data = data.dropna(axis=0)
res = data.pivot_table(index="Sex",values="Survived",aggfunc=np.mean)
print(res)
print("=========================")
res = res.loc["female"][0]
print(res)
print(type(res))
====================================
Survived
Sex
female 0.754789
male 0.205298
=========================
0.7547892720306514
<class 'numpy.float64'>
5 ,聚合,总数 :groupby
目的 : 每个等级的人数精华代码 :
data.groupby(by="Pclass").size()
代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = ["PassengerId","Pclass","Fare","Survived","Sex","Age"]
data = data[cols]
data = data.dropna(axis=0)
res = data.groupby(by="Pclass").size()
print(res)
=========================================================
Pclass
1 186
2 173
3 355
dtype: int64
6 ,groupby 中的 size() 和 count()
意义不同 : 1 ,size : 一共几行 2 ,count : 不算 Nan ,几行作用对象不同 : 1 ,size : 针对整体 2 ,count : 针对每一列
7 ,聚合,总和 :pivot_table
目的 : 每个等级,船票总额精华代码 :
res = data.pivot_table(index="Pclass",values="Fare",aggfunc=[np.sum,np.mean])
代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = ["PassengerId","Pclass","Fare","Survived","Sex","Age"]
data = data[cols]
res = data.pivot_table(index="Pclass",values="Fare",aggfunc=[np.sum,np.mean])
print(res)
===========================================
sum mean
Fare Fare
Pclass
1 18177.4125 84.154687
2 3801.8417 20.662183
3 6714.6951 13.675550
8 ,groupby 综合练习 : data.groupby(by=“Pclass”).agg(gz)
目的 : 1 ,不同船舱等级 2 ,共几人 3 ,幸存率 4 ,共花了多少钱 5 ,平均船票单价 6 ,做好之后,将列名改过来全部代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = ["PassengerId","Pclass","Fare","Survived","Sex","Age"]
data = data[cols]
gz = {"PassengerId":np.size,"Survived":np.mean,"Fare":np.sum}
res = data.groupby(by="Pclass").agg(gz)
res.rename(columns={"PassengerId":"all_people","Fare":"all_morney","Survived":"sur_people"},inplace=True)
print(res)
===============================================
all_morney all_people sur_people
Pclass
1 18177.4125 216 0.629630
2 3801.8417 184 0.472826
3 6714.6951 491 0.242363
9 ,关于 pivot_table :
常用函数 :
np.size : 共几个,算上空值
np.mean : 平均值
np.sum : 总和
np.max : 最大值
np.min : 最小值
转载请注明原文地址:https://ipadbbs.8miu.com/read-62546.html