06 ,df 索引操作 : 提取小 df ( m行n列 ),所有字段名,索引操作,自定义索引

    技术2024-08-21  70

    1 ,所有字段名 : data.columns

    目的 : 得到所有字段名得到 : index 对象取一个字段名 : res[n]代码 : if __name__ == '__main__': # 全列显示 : pd.set_option('display.max_columns', None) # 读文件 csv data = pd.read_csv("titanic_train.csv") # 取数据 res = data.columns res.tolist print(res) print(type(res)) res02 = res[2] print(res02) print(type(res02)) ======================================= Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], dtype='object') <class 'pandas.core.indexes.base.Index'> Pclass <class 'str'>

    2 ,提取小 df :data[[‘Sex’,‘Age’,‘Survived’]].loc[3:6]

    思路 : 先提取列,再提取行4,5,6,7 行,[‘PassengerId’,‘Sex’,‘Age’,‘Survived’] 列代码 : if __name__ == '__main__': # 全列显示 : pd.set_option('display.max_columns', None) # 读文件 csv data = pd.read_csv("titanic_train.csv") res = data[['PassengerId','Sex','Age','Survived']].loc[3:6] print(res) =================================================================== PassengerId Sex Age Survived 3 4 female 35.0 1 4 5 male 35.0 0 5 6 male NaN 0 6 7 male 54.0 0

    3 ,索引, 查看所有索引: res.index

    代码 : if __name__ == '__main__': # 全列显示 : pd.set_option('display.max_columns', None) # 读文件 csv data = pd.read_csv("titanic_train.csv") res = data[['PassengerId','Sex','Age','Survived']].loc[3:6] # 取索引 index = res.index print(index) print(type(index)) =========================================== RangeIndex(start=3, stop=7, step=1) <class 'pandas.core.indexes.range.RangeIndex'>

    4 ,索引,重新索引 : res.reset_index(drop=True, inplace=True)

    代码 : if __name__ == '__main__': # 全列显示 : pd.set_option('display.max_columns', None) # 读文件 csv data = pd.read_csv("titanic_train.csv") res = data[['PassengerId','Sex','Age','Survived']].loc[3:6] print(res) res.reset_index(drop=True, inplace=True) print(res) ====================================================== PassengerId Sex Age Survived 3 4 female 35.0 1 4 5 male 35.0 0 5 6 male NaN 0 6 7 male 54.0 0 ====================================================== PassengerId Sex Age Survived 0 4 female 35.0 1 1 5 male 35.0 0 2 6 male NaN 0 3 7 male 54.0 0

    5 ,索引,自定义 : res.index = pd.Series([“a”,“b”,“c”,“d”])

    代码 : if __name__ == '__main__': # 全列显示 : pd.set_option('display.max_columns', None) # 读文件 csv data = pd.read_csv("titanic_train.csv") res = data[['PassengerId','Sex','Age','Survived']].loc[3:6] print(res) res.index = pd.Series(["a","b","c","d"]) print(res) ========================================================== PassengerId Sex Age Survived 3 4 female 35.0 1 4 5 male 35.0 0 5 6 male NaN 0 6 7 male 54.0 0 ========================================================== PassengerId Sex Age Survived a 4 female 35.0 1 b 5 male 35.0 0 c 6 male NaN 0 d 7 male 54.0 0

    6 ,自定义索引取数据 :

    思路 : 1 ,使用 : 像正常索引一样使用 2 ,是否可以选取区间 : 可以代码 : if __name__ == '__main__': # 全列显示 : pd.set_option('display.max_columns', None) # 读文件 csv data = pd.read_csv("titanic_train.csv") res = data[['PassengerId','Sex','Age','Survived']].loc[3:6] res.index = pd.Series(["a","b","c","d"]) print(res) print(res.loc['a']) print(res.loc['b':'d']) =========================================================== PassengerId Sex Age Survived a 4 female 35.0 1 b 5 male 35.0 0 c 6 male NaN 0 d 7 male 54.0 0 ================================ PassengerId 4 Sex female Age 35 Survived 1 Name: a, dtype: object ================================ PassengerId Sex Age Survived b 5 male 35.0 0 c 6 male NaN 0 d 7 male 54.0 0
    Processed: 0.010, SQL: 9