08 ,df 列操作 :字段名,dtype 字段类型,字段操作案例,列计算,大,小,平均值

    技术2025-03-31  46

    1 ,字段名 : data.columns

    代码 : if __name__ == '__main__': # 读文件 csv data = pd.read_csv("titanic_train.csv") # 所有字段 : cols = data.columns print(cols) ================================== Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], dtype='object')

    2 ,字段类型,查看 : data.dtypes

    代码 : if __name__ == '__main__': # 读文件 csv data = pd.read_csv("titanic_train.csv") # 所有字段 : res = data.dtypes print(res) ================================ PassengerId int64 Survived int64 Pclass int64 Name object Sex object Age float64 SibSp int64 Parch int64 Ticket object Fare float64 Cabin object Embarked object

    3 ,字段类型,修改 :data[“PassengerId”].astype(“object”)

    代码 : if __name__ == '__main__': # 读文件 csv data = pd.read_csv("titanic_train.csv") # 所有字段 : print(data.dtypes) data["PassengerId"] = data["PassengerId"].astype("object") print(data.dtypes) ======================================================= PassengerId int64 Survived int64 Pclass int64 Name object Sex object Age float64 SibSp int64 Parch int64 Ticket object Fare float64 Cabin object Embarked object ======================================================= PassengerId object Survived int64 Pclass int64 Name object Sex object Age float64 SibSp int64 Parch int64 Ticket object Fare float64 Cabin object Embarked object

    4 ,案例:字段操作

    字段名操作 : 1 ,定位 : 找出所有以 “d” 结尾的字段,并且取出。 2 ,操作 : 将这些字段 * 2,得到新的字段 3 ,替换 : 将原字段删除,将新字段放入精华代码 : # 3 ,制造新 df new_df = data[new_cols] * 2 # 4 ,去掉旧 df res_data = data.drop(new_cols,axis=1) # 5 ,添加新列 res_data[["new01","new02","new03"]] = new_df 全部代码 : import numpy as np import pandas as pd import pandas.core.frame if __name__ == '__main__': # 读文件 csv data = pd.read_csv("titanic_train.csv") # 1 ,所有字段 : old_cols = data.columns.tolist() # 2 ,找到所有的 d 结尾字段 new_cols = [] for e in old_cols: if str(e).endswith("d"): new_cols.append(e) print(old_cols) print(new_cols) # 3 ,制造新 df new_df = data[new_cols] * 2 # 4 ,去掉旧 df res_data = data.drop(new_cols,axis=1) # 5 ,添加新列 res_data[["new01","new02","new03"]] = new_df print(res_data) ================================================================ ['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'] ['PassengerId', 'Survived', 'Embarked'] Pclass Name ... new02 new03 0 3 Braund, Mr. Owen Harris ... 0 SS 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... ... 2 CC ...................... ......................

    5 ,列计算,列乘 :data[“PassengerId”] * data[“Survived”]

    不同列之间 :可以做计算,加减乘除代码 : if __name__ == '__main__': # 读文件 csv data = pd.read_csv("titanic_train.csv") # 取出两列 df_new = data[["PassengerId","Survived"]] df_tow = data["PassengerId"] * data["Survived"] df_new["tow"] = df_tow print(df_new) =================================== PassengerId Survived tow 0 1 0 0 1 2 1 2 2 3 1 3

    6 ,最大值 :data[“Age”].max()

    代码 : 年龄最大的人 if __name__ == '__main__': # 读文件 csv data = pd.read_csv("titanic_train.csv") # 取出两列 res = data["Age"].max() print(res) ======================= 80.0

    7 ,最小值 : data[“Age”].min()

    8 ,平均值 : data[“Age”].mean()

    注意,这个平均值,不是 :总和/总数是 : 不算空值

    9 ,总和 : data[“Age”].sum()

    Processed: 0.011, SQL: 9