Pandas库学习
学习笔记手札及单元小结Pandas库的介绍Pandas库的引用Pandas库小测
Pandas库的Series类型从标量值创建从字典类型创建从ndarray类型创建Series类型的基本操作Series类型对齐操作Series类型的那么属性Series类型的修改
Pandas库的DataFrame类型DataFrame类型从二维ndarray对象创建从列表类型的字典创建
Pandas库的数据类型运算数据类型的算术运算数据类型的比较运算
学习笔记手札及单元小结
Pandas库的介绍
Pandas库网址:https://pandas.pydata.org/
Pandas库的引用
import pandas
as pd
Pandas基于Numpy实现,常与Numpy和Matplotlib一同使用
Pandas库小测
请在Anaconda的IPython平台运行代码
import pandas
as pd
d
= pd
.Series
(range(20))
d
Out
[3]:
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
dtype
: int64
d
.cumsum
()
Out
[4]:
0 0
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
10 55
11 66
12 78
13 91
14 105
15 120
16 136
17 153
18 171
19 190
dtype
: int64
Pandas库的Series类型
Series类型由一组数据及与之相关的数据索引组成
import pandas
as pd
a
= pd
.Series
([9,8,7,6])
a
Out
[3]:
0 9
1 8
2 7
3 6
dtype
: int64
import pandas
as pd
b
= pd
.Series
([9,8,7,6],index
=['a','b','c','d'])
b
Out
[6]:
a
9
b
8
c
7
d
6
dtype
: int64
从标量值创建
import pandas
as pd
s
= pd
.Series
(25,index
=['a','b','c'])
s
Out
[9]:
a
25
b
25
c
25
dtype
: int64
从字典类型创建
import pandas
as pd
d
= pd
.Series
({'a':9,'b':8,'c':7})
d
Out
[12]:
a
9
b
8
c
7
dtype
: int64
e
= pd
.Series
({'a':9,'b':8,'c':7},index
=['c','a','b','d'])
e
Out
[14]:
c
7.0
a
9.0
b
8.0
d NaN
dtype
: float64
从ndarray类型创建
import pandas
as pd
import numpy
as np
n
= pd
.Series
(np
.arange
(5))
n
Out
[4]:
0 0
1 1
2 2
3 3
4 4
dtype
: int32
m
= pd
.Series
(np
.arange
(5),index
=np
.arange
(9,4,-1))
m
Out
[6]:
9 0
8 1
7 2
6 3
5 4
dtype
: int32
Series类型的基本操作
import pandas
as pd
b
= pd
.Series
([9,8,7,6],['a','b','c','d'])
b
Out
[3]:
a
9
b
8
c
7
d
6
dtype
: int64
b
.index
Out
[4]: Index
(['a', 'b', 'c', 'd'], dtype
='object')
b
.values
Out
[5]: array
([9, 8, 7, 6], dtype
=int64
)
b
['b']
Out
[6]: 8
b
[1]
Out
[7]: 8
b
[['c','d',0]]
---------------------------------------------------------------------------
KeyError Traceback
(most recent call last
)
b
[['c','d','a']]
Out
[10]:
c
7
d
6
a
9
dtype
: int64
import pandas
as pd
b
= pd
.Series
([9,8,7,6],['a','b','c','d'])
b
Out
[3]:
a
9
b
8
c
7
d
6
dtype
: int64
b
[3]
Out
[4]: 6
b
[:3]
Out
[5]:
a
9
b
8
c
7
dtype
: int64
b
[b
> b
.median
()]
Out
[6]:
a
9
b
8
dtype
: int64
np
.exp
(b
)
---------------------------------------------------------------------------
NameError Traceback
(most recent call last
)
<ipython
-input-7-a3137cbb6ea4
> in <module
>
----> 1 np
.exp
(b
)
import pandas
as pd
b
= pd
.Series
([9,8,7,6],['a','b','c','d'])
b
['b']
Out
[3]: 8
'c' in b
Out
[4]: True
0 in b
Out
[5]: False
b
.get
('f',100)
Out
[6]: 100
Series类型对齐操作
Series + Series
import pandas
as pd
a
= pd
.Series
([1,2,3],['c','d','e'])
b
= pd
.Series
([9,8,7,6],['a','b','c','d'])
a
+ b
Out
[10]:
a NaN
b NaN
c
8.0
d
8.0
e NaN
dtype
: float64
Series类型在运算中会自动对齐不同索引的数据
Series类型的那么属性
import pandas
as pd
b
= pd
.Series
([9,8,7,6],['a','b','c','d'])
b
.name
b
.name
= 'Series对象'
b
.index
.name
= '索引列'
b
Out
[16]:
索引列
a
9
b
8
c
7
d
6
Name
: Series对象
, dtype
: int64
Series类型的修改
Series对象可以随时修改并即刻生效
import pandas
as pd
b
= pd
.Series
([9,8,7,6],['a','b','c','d'])
b
.name
= "Series"
b
Out
[20]:
a
9
b
8
c
7
d
6
Name
: Series
, dtype
: int64
b
.name
= "New Series"
b
['b','c'] = 20
b
Out
[23]:
a
9
b
20
c
20
d
6
Name
: New Series
, dtype
: int64
Pandas库的DataFrame类型
DataFrame类型
从二维ndarray对象创建
import pandas
as pd
import numpy
as np
d
= pd
.DataFrame
(np
.arange
(10).reshape
(2,5))
d
Out
[4]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
从列表类型的字典创建
import pandas
as pd
dl
= {'one':[1,2,3,4],'two':[9,8,7,6]}
d
= pd
.DataFrame
(dl
,index
= ['a','b','c','d'])
d
Out
[7]:
one two
a
1 9
b
2 8
c
3 7
d
4 6
Pandas库的数据类型运算
import pandas
as pd
import numpy
as np
a
= pd
.DataFrame
(np
.arange
(12).reshape
(3,4))
a
Out
[4]:
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
b
= pd
.DataFrame
(np
.arange
(20).reshape
(4,5))
b
Out
[6]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
a
+ b
Out
[7]:
0 1 2 3 4
0 0.0 2.0 4.0 6.0 NaN
1 9.0 11.0 13.0 15.0 NaN
2 18.0 20.0 22.0 24.0 NaN
3 NaN NaN NaN NaN NaN
a
* b
Out
[8]:
0 1 2 3 4
0 0.0 1.0 4.0 9.0 NaN
1 20.0 30.0 42.0 56.0 NaN
2 80.0 99.0 120.0 143.0 NaN
3 NaN NaN NaN NaN NaN
数据类型的算术运算
import pandas
as pd
import numpy
as np
a
= pd
.DataFrame
(np
.arange
(12).reshape
(3,4))
a
Out
[12]:
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
b
= pd
.DataFrame
(np
.arange
(20),.reshape
(4,5))
File
"<ipython-input-13-58d77ec75842>", line
1
b
= pd
.DataFrame
(np
.arange
(20),.reshape
(4,5))
^
SyntaxError
: invalid syntax
b
= pd
.DataFrame
(np
.arange
(20).reshape
(4,5))
b
Out
[15]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
b
.add
(a
,fill_value
= 100)
Out
[16]:
0 1 2 3 4
0 0.0 2.0 4.0 6.0 104.0
1 9.0 11.0 13.0 15.0 109.0
2 18.0 20.0 22.0 24.0 114.0
3 115.0 116.0 117.0 118.0 119.0
a
.mul
(b
,fill_value
= 0)
Out
[17]:
0 1 2 3 4
0 0.0 1.0 4.0 9.0 0.0
1 20.0 30.0 42.0 56.0 0.0
2 80.0 99.0 120.0 143.0 0.0
不同维度间为广播运算,一维Series默认在轴1参与运算
import pandas
as pd
import numpy
as np
b
= pd
.DataFrame
(np
.arange
(20).reshape
(4,5))
b
Out
[21]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
c
= pd
.Series
(np
.arange
(4))
c
Out
[23]:
0 0
1 1
2 2
3 3
dtype
: int32
c
-10
Out
[24]:
0 -10
1 -9
2 -8
3 -7
dtype
: int32
b
- c
Out
[25]:
0 1 2 3 4
0 0.0 0.0 0.0 0.0 NaN
1 5.0 5.0 5.0 5.0 NaN
2 10.0 10.0 10.0 10.0 NaN
3 15.0 15.0 15.0 15.0 NaN
使用运算方法可以令一维Series参与轴0运算
import pandas
as pd
import numpy
as np
b
= pd
.DataFrame
(np
.arange
(20).reshape
(4,5))
b
Out
[29]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
c
= pd
.Series
(np
.arange
(4))
c
Out
[31]:
0 0
1 1
2 2
3 3
dtype
: int32
b
.sub
(c
,axis
=0)
Out
[32]:
0 1 2 3 4
0 0 1 2 3 4
1 4 5 6 7 8
2 8 9 10 11 12
3 12 13 14 15 16
数据类型的比较运算
同维度运算,尺寸一致
import pandas
as pd
import numpy
as np
a
= pd
.DataFrame
(np
.arange
(12).reshape
(3,4))
a
Out
[36]:
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
d
= pd
.DataFrame
(np
.arange
(12,0,-1).reshape
(3,4))
d
Out
[38]:
0 1 2 3
0 12 11 10 9
1 8 7 6 5
2 4 3 2 1
a
> d
Out
[39]:
0 1 2 3
0 False False False False
1 False False False True
2 True True True True
a
== d
Out
[40]:
0 1 2 3
0 False False False False
1 False False True False
2 False False False False
不同维度,广播运算,默认在1轴
import pandas
as np
import numpy
as np
a
= pd
.DataFrame
(np
.arange
(12).reshape
(3,4))
a
Out
[44]:
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
c
= pd
.Series
(np
.arange
(4))
c
Out
[46]:
0 0
1 1
2 2
3 3
dtype
: int32
a
> c
Out
[47]:
0 1 2 3
0 False False False False
1 True True True True
2 True True True True
c
>0
Out
[48]:
0 False
1 True
2 True
3 True
dtype
: bool