【pandas-汇总2】series常用属性和函数

技术2022-07-10 138

1.Series常用属性

属性说明values获取数组index获取索引namevalues的nameindex.name索引的name

2.Series常用函数

Series可使用ndarray或dict的差不多所有索引操作和函数，集成了ndarray和dict的优点

函数

说明Series([x,y,...])Series({'a':x,'b':y,...}, index=param1)生成一个SeriesSeries.copy()复制一个Series

Series.reindex([x,y,...], fill_value=NaN)

Series.reindex([x,y,...], method=NaN)

Series.reindex(columns=[x,y,...])

重返回一个适应新索引的新对象，将缺失值填充为fill_value

返回适应新索引的新对象，填充方式为method

对列进行重新索引

Series.drop(index)丢弃指定项Series.map(f)应用元素级函数

排序函数

说明Series.sort_index(ascending=True)根据索引返回已排序的新对象Series.order(ascending=True)根据值返回已排序的对象，NaN值在末尾Series.rank(method='average', ascending=True, axis=0)为各组分配一个平均排名

df.argmax()

df.argmin()

返回含有最大值的索引位置

返回含有最小值的索引位置

reindex的method选项： ffill, bfill　　　　　向前填充/向后填充 pad, backfill　　　向前搬运，向后搬运 rank的method选项 'average'　　　　在相等分组中，为各个值分配平均排名 'max','min'　　　使用整个分组中的最小排名 'first'　　　　　　按值在原始数据中出现的顺序排名

3.Series常用属性例程

# -*- coding: utf-8 -*- """ @author: 蔚蓝的天空Tom Aim:pandas.series常用属性的例程属性说明 values 获取数组 index 获取索引 name values的name index.name 索引的name """ import pandas as pd from pandas import Series if __name__== '__main__': s = pd.Series([ 'Tom', 'Kim', 'Andy']) # 0 Tom # 1 Kim # 2 Andy # dtype: object #数值数组 s.values #['Tom' 'Kim' 'Andy'] #索引序列 s.index #RangeIndex(start=0, stop=3, step=1) #values的name s.name #None #索引的name s.index.name #None #设置series的name和index.name s.name = 'Name' s.index.name = 'ID' # ID # 0 Tom # 1 Kim # 2 Andy # Name: Name, dtype: object #获取series的name和index.name s.name #Name s.index.name #ID

4.Series常用函数例程

4.1创建Series对象

# -*- coding: utf-8 -*- """ @author:蔚蓝的天空Tom Aim:实现Series常用函数的例程---生成Series对象 (1)生成一个Series Series([x,y,...])Series({'a':x,'b':y,...}, index=param1) """ import pandas as pd from pandas import Series if __name__== '__main__': #(1)生成一个Series #Series([x,y,...]) #Series({'a':x,'b':y,...}, index=param1) #数组创建Series，使用默认整数数值行索引 s = pd.Series([ 'Tom', 'Kim', 'Andy']) # 0 Tom # 1 Kim # 2 Andy # dtype: object #数组创建Series，指定行索引index s = pd.Series([ 'Tom', 'Kim', 'Andy'], index=[ 'No.1', 'No.2', 'No.3']) # No.1 Tom # No.2 Kim # No.3 Andy # dtype: object #字典创建Series，使用默认整数数值行索引(索引名称的字典序) data = { 'No.2': 'Tom', 'No.1': 'Kim', 'No.3': 'Andy'} s = pd.Series(data) s.index.name = 'ID' s.name= 'StudentsInfo' # ID # No.1 Kim # No.2 Tom # No.3 Andy # Name: StudentsInfo, dtype: object #字典创建Series，指定行索引的排序Index data = { 'No.1': 'Tom', 'No.2': 'Kim', 'No.3': 'Andy'} ind = [ 'No.3', 'No.2', 'No.1'] s = pd.Series(data, index=ind) # No.3 Andy # No.2 Kim # No.1 Tom # dtype: object #字典创建Series，指定行索引的排序index data = { 'No.1': 'Tom', 'No.2': 'Kim', 'No.3': 'Andy'} ind = [ 'No.3', 'No.2'] s = pd.Series(data, index=ind) # No.3 Andy # No.2 Kim # dtype: object #字典创建Series，指定行索引的排序index data = { 'No.1': 'Tom', 'No.2': 'Kim', 'No.3': 'Andy'} ind = [ 'No.2', 'No.1', 'No.99'] s = pd.Series(data, index=ind) # No.2 Kim # No.1 Tom # No.99 NaN # dtype: object #使用pd.isnull(series)判断series对象是否含有NaN数值 data = { 'No.1': 'Tom', 'No.2': 'Kim', 'No.3': 'Andy'} ind = [ 'No.2', 'No.1', 'No.99'] s = pd.Series(data, index=ind) ret = pd.isnull(s) # No.2 False # No.1 False # No.99 True # dtype: bool #使用pd.notnull(series)判断series对象是否含有NaN数值 data = { 'No.1': 'Tom', 'No.2': 'Kim', 'No.3': 'Andy'} ind = [ 'No.2', 'No.1', 'No.99'] s = pd.Series(data, index=ind) ret = pd.notnull(s) # No.2 True # No.1 True # No.99 False # dtype: bool

4.2拷贝Series之深拷贝+浅拷贝

# -*- coding: utf-8 -*- """ @author:蔚蓝的天空Tom Aim:实现Series常用函数的例程----拷贝Series，深拷贝和浅拷贝 (2)复制一个Series Series.copy() """ import pandas as pd from pandas import Series if __name__== '__main__': #(2)复制一个Series #Series.copy() s = pd.Series([ 'Tom', 'Kim', 'Andy'], index=[ 'No.1', 'No.2', 'No.3']) # No.1 Tom # No.2 Kim # No.3 Andy # dtype: object #深拷贝series cpys = s.copy(deep= True) cpys[ 'No.1'] = 'xxx' #print(cpys) # No.1 xxx # No.2 Kim # No.3 Andy # dtype: object #print(s) # No.1 Tom # No.2 Kim # No.3 Andy # dtype: object #浅拷贝series cpys = s.copy(deep= False) cpys[ 'No.1'] = 'xxx' #print(cpys) # No.1 xxx # No.2 Kim # No.3 Andy # dtype: object #print(s) # No.1 xxx # No.2 Kim # No.3 Andy # dtype: object

4.3reindex函数

# -*- coding: utf-8 -*- """ @author:蔚蓝的天空Tom Aim:实现Series常用函数的例程---series.reindex()适应新索引的新对象，不修改源对象，返回新对象 (3)重返回一个适应新索引的新对象，将缺失值填充为fill_value Series.reindex([x,y,...], fill_value=NaN) (4)返回适应新索引的新对象，填充方式为method Series.reindex([x,y,...], method=NaN) (5)对列进行重新索引 Series.reindex(columns=[x,y,...]) """ import pandas as pd from pandas import Series if __name__== '__main__': #Series.reindex([x,y,...])重返回一个适应新索引的新对象，缺失索引对应数值使用默认值NaN s = pd.Series([ 'Tom', 'Kim', 'Andy'], index=[ 'No.1', 'No.2', 'No.3']) rs = s.reindex([ 'No.0', 'No.1', 'No.2', 'No.3', 'No.4']) #No.0 NaN #No.1 Tom #No.2 Kim #No.3 Andy #No.4 NaN #dtype: object #Series.reindex([x,y,...], fill_value=NaN)重返回一个适应新索引的新对象，缺失索引对应数值使用指定值 s = pd.Series([ 'Tom', 'Kim', 'Andy'], index=[ 'No.1', 'No.2', 'No.3']) rs = s.reindex([ 'No.0', 'No.1', 'No.2', 'No.3', 'No.4'], fill_value= 'XXX') #No.0 XXX #No.1 Tom #No.2 Kim #No.3 Andy #No.4 XXX #dtype: object #(4)Series.reindex([x,y,...], method=NaN) 返回适应新索引的新对象，填充方式为method #ffill或pad: 前向（或进位）填充 #bfill或backfill: 后向（或进位）填充 s = pd.Series([ 'Tom', 'Kim', 'Andy'], index=[ 'No.1', 'No.2', 'No.3']) rs = s.reindex([ 'No.0', 'No.1', 'No.4', 'No.5'], method= 'ffill') #method='pad'同效果 #No.0 NaN #No.1 Tom #No.4 Andy #因为前向填充(取No.3的值Andy作为填充值) #No.5 Andy #取No.4的值作为填充值 #dtype: object s = pd.Series([ 'Tom', 'Kim', 'Andy'], index=[ 'No.1', 'No.2', 'No.3']) rs = s.reindex([ 'No.0', 'No.1', 'No.4', 'No.5'], method= 'bfill') #No.0 Tom #因为后向填充(取No.1的值Tom作为填充值) #No.1 Tom #No.4 NaN #No.5 NaN #dtype: object

4.4drop()方法

# -*- coding: utf-8 -*- """ @author:蔚蓝的天空Tom Aim:实现Series常用函数的例程---drop()方法，丢弃指定项，不修改对源对象内容，返回新对象 (6)丢弃指定项 Series.drop(index) """ import pandas as pd from pandas import Series if __name__== '__main__': #(6)丢弃指定项Series.drop(index) s = pd.Series([ 'Tom', 'Kim', 'Andy'], index=[ 'No.1', 'No.2', 'No.3']) #删除一个元素，由索引号指定 ds = s.drop( 'No.1') #No.2 Kim #No.3 Andy #dtype: object data = { 'Name':{ 'No.1': 'Tom', 'No.2': 'Kim', 'No.3': 'Andy'}, 'Age':{ 'No.1': 18, 'No.2': 16, 'No.3': 19}} df = pd.DataFrame(data) # Age Name #No.1 18 Tom #No.2 16 Kim #No.3 19 Andy #删除指定行 ds = df.drop( 'No.1') # Age Name #No.2 16 Kim #No.3 19 Andy #删除指定列，可以产出多列，序列中指出就可以['Age','Name'] ds = df.drop([ 'Age'], axis= 1) # Name #No.1 Tom #No.2 Kim #No.3 Andy

4.5series.map(func)元素函数向量化

# -*- coding: utf-8 -*- """ @author:蔚蓝的天空Tom Aim:实现Series常用函数的例程---应用元素级函数series.map(func)，不修改源对象，返回新对象 (7)应用元素级函数 Series.map(f) """ import math import pandas as pd from pandas import Series if __name__== '__main__': #(7)应用元素级函数Series.map(f) func = lambda x:x* 2 s = pd.Series([ 1, 3, 5], index=[ 'No.1', 'No.2', 'No.3']) ms = s.map(func) #No.1 2 #No.2 6 #No.3 10 #dtype: int64 ms = s.map(np.exp) #No.1 2.718282 #No.2 20.085537 #No.3 148.413159 #dtype: float64 ms = s.map(math.exp) #No.1 2.718282 #No.2 20.085537 #No.3 148.413159 #dtype: float64

4.6 series排序函数

# -*- coding: utf-8 -*- """ @author: 蔚蓝的天空Tom Aim：实现Series常用函数的例程---series对象排序方法 Series.sort_index(ascending=True) 根据索引返回已排序的新对象 Series.order(ascending=True) 根据值返回已排序的对象，NaN值在末尾 Series.rank(method='average', ascending=True, axis=0) 为各组分配一个平均排名 df.argmax() df.argmin() 返回含有最大值的索引位置返回含有最小值的索引位置　　　　reindex的method选项：　　　　　　ffill, bfill　　　　　向前填充/向后填充　　　　　　pad, backfill　　　向前搬运，向后搬运　　　　rank的method选项　　　　　　'average'　　　　在相等分组中，为各个值分配平均排名　　　　　　'max','min'　　　使用整个分组中的最小排名　　　　　　'first'　　　　　　按值在原始数据中出现的顺序排名 """ import pandas as pd from pandas import Series if __name__== '__main__': #索引升序排序，Series.sort_index(ascending=True) ，默认True s = pd.Series([ 6, 2, 8], index=[ 'No.1', 'No.2', 'No.3']) ss = s.sort_index(ascending= True) #No.1 6 #No.2 2 #No.3 8 #dtype: int64 #索引降序排序，Series.sort_index(ascending=Flase) ss = s.sort_index(ascending= False) #No.3 8 #No.2 2 #No.1 6 #dtype: int64 #数值升序排序 Series.sort_values(ascending=True) ，默认True s = pd.Series([ 6, 2, 8], index=[ 'No.1', 'No.2', 'No.3']) so = s.sort_values(ascending= True) #No.2 2 #No.1 6 #No.3 8 #dtype: int64 #数值降序排序 Series.sort_values(ascending=False) so = s.sort_values(ascending= False) #No.3 8 #No.1 6 #No.2 2 #dtype: int64

4.7rank()排名方法

# -*- coding: utf-8 -*- """ @author: 蔚蓝的天空Tom Aim:实现Series的排名方法例程---series.rank() Aim:注意区分排名和排序的区别，排名是按照排序(降序/升序)结果，用排名数值(1~n)，替换数值，则每个数值对应一个排名 #排名（Series.rank(method='average', ascending=True)）的作用与排序的不同之处是： #他会把对象的 values 替换成名次（从 1 到 n），问题待解决问题：如何处理平级项， #method 参数有四个值可选：average, min, max, first来处理评级项问题。 Note:此处排序采用升序排序，然后排名以升序排序的结果进行排名。对降序排序的排名道理都是一样的，此处不予展示了。 """ import pandas as pd from pandas import Series if __name__== '__main__': s = pd.Series([ 6, 9, 6, 2]) s.index.name= 'ID' #ID #0 6 #1 9 #2 6 #3 2 #平均排名，rank()的method默认为average，如果存在评级项，则排名为名次/m,m为评级项元素个数 sr = s.rank() #ID #0 2.5 #两个6，排名2和3，平均排名为2.5 #1 4.0 #2 2.5 #两个6，排名2和3，平均排名为2.5 #3 1.0 #平均排名，显示调用method=average sr = s.rank(method= 'average') #ID #0 2.5 #1 4.0 #2 2.5 #3 1.0 #dtype: float64 #最小值排名 sr = s.rank(method= 'min') #ID #0 2.0 #两个6，排名2和3，最小排名都为2 #1 4.0 #2 2.0 #两个6，排名2和3，最小排名都为2 #3 1.0 #dtype: float64 #最大值排名 sr = s.rank(method= 'max') #ID #0 3.0 #两个6，排名2和3，最大排名都为3 #1 4.0 #2 3.0 #两个6，排名2和3，最大排名都为3 #3 1.0 #dtype: float64 #第一排名 sr = s.rank(method= 'first') #ID #0 2.0 #两个6，排名2和3，first排名时第一个6排名取2 #1 4.0 #2 3.0 #两个6，排名2和3，first排名时第二个6排名取3 #3 1.0 #dtype: float64

4.8最大值/最小值的行索引方法argmin()、argmax()

# -*- coding: utf-8 -*- """ @author: 蔚蓝的天空Tom Aim：Series中最大(最小)数值的索引方法例程----argmax()、argmin() df.argmax() 返回含有最大值的索引位置 df.argmin() 返回含有最小值的索引位置 """ import pandas as pd from pandas import Series if __name__== '__main__': s = pd.Series([ 6, 8, 9, 2], index=[ 'No.1', 'No.2', 'No.3', 'No.4']) #No.1 6 #No.2 8 #No.3 9 #No.4 2 #dtype: int64 ind = s.argmax() #No.3 ind = s.argmin() #No.4 v = ss[ss.argmin()] #2 v = ss.min() #2 #排序对argmin()、argmax()结果没有影响 ss = s.sort_values(ascending= False) #No.3 9 #No.2 8 #No.1 6 #No.4 2 #dtype: int64 ind =ss.argmax() #No.3 v = ss[ss.argmax()] #9 v = ss.max() #9

(end)

Processed: 0.067, SQL: 9