量化交易吧 / 数理科学帖子：3371496 新帖：2

重温pandas

SCSDV_d发表于：5 月 9 日 21：53回复(1)

这几天把pandas重温了一下，结合聚宽数据，感觉更好理解聚宽的用法。

import pandas as pd import numpy as npimport matplotlib.pyplot as plt

一直没有系统的学习过pandas，正好看到一篇文章，结合聚宽再捋一遍。

1. Series¶

1.1 创建一个 Series 的基本语法如下:¶

my_series = pd.Series(data,index)¶

上面的 data 参数可以是任意数据对象，比如字典、列表甚至是 NumPy 数组，而index 参数则是对 data 的索引值，类似字典的 key。¶

1.1.1创建一个Series，data是数字，索引是字符串¶

guojia = ['usa','ru','cn','jp']qty = [100,300,500,200]

ser = pd.Series(qty,guojia)

ser

usa    100
ru     300
cn     500
jp     200
dtype: int64

ser2 = pd.Series(guojia,qty)

ser2

100    usa
300     ru
500     cn
200     jp
dtype: object

以上，括号前面的是Series的values，后面的是index¶

注意：请记住， index 参数是可省略的，你可以选择不输入这个参数。如果不带 index 参数，Pandas 会自动用默认 index 进行索引，类似数组，索引值是 [0, ..., len(data) - 1]¶

如果你从一个 Python 字典对象创建 Series，Pandas 会自动把字典的键值设置成 Series 的 index，并将对应的 values 放在和索引对应的 data 里。¶

和 NumPy 数组不同，Pandas 的 Series 能存放各种不同类型的对象。¶

1.1.2从 Series 里获取数据¶

访问 Series 里的数据的方式，和 Python 字典基本一样：¶

ser

usa    100
ru     300
cn     500
jp     200
dtype: int64

ser['ru']

ser['jp']

1.1.3 对 Series 进行算术运算操作¶

对 Series 的算术运算都是基于 index 进行的。我们可以用加减乘除（+ - * /）这样的运算符对两个 Series 进行运算，Pandas 将会根据索引 index，对响应的数据进行计算，结果将会以浮点数的形式存储，以避免丢失精度。¶

ser = pd.Series(qty,guojia)

guojia2 =['au','cn','jp','usa']qty2 =[200,300,400,598]

ser3 = pd.Series(qty2,guojia2)

ser3

au     200
cn     300
jp     400
usa    598
dtype: int64

ser - ser3

au       NaN
cn     200.0
jp    -200.0
ru       NaN
usa   -498.0
dtype: float64

ser + ser3

au       NaN
cn     800.0
jp     600.0
ru       NaN
usa    698.0
dtype: float64

Series基于索引进行运算，我们不必理会对齐的问题，系统自动将索引相同的值进行运算；如果一个series缺少某个索引，其运算结果为 NaN¶

2. DataFrames¶

Pandas 的 DataFrame（数据表）是一种 2 维数据结构，数据以表格的形式存储，分成若干行和列。通过 DataFrame，你能很方便地处理数据。常见的操作比如选取、替换行或列的数据，还能重组数据表、修改索引、多重筛选等。¶

2.1构建一个 DataFrame 对象的基本语法如下：¶

df = get_price('RB9999.XSGE',start_date='2018-12-1', end_date='2019-3-1',fields=None)

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11

df['open'].head()

2018-12-03    3250.0
2018-12-04    3306.0
2018-12-05    3389.0
2018-12-06    3470.0
2018-12-07    3360.0
Name: open, dtype: float64

df['volume'].tail()

2019-02-25    3659076.0
2019-02-26    3998202.0
2019-02-27    2948082.0
2019-02-28    3461158.0
2019-03-01    2986416.0
Name: volume, dtype: float64

看，上面表中的每一列基本上就是一个 Series ，它们都用了同一个 index。因此，我们基本上可以把 DataFrame 理解成一组采用同样索引的 Series 的集合。¶

2.2 获取 DataFrame 中的列 < color = red >¶

要获取一列的数据，还是用中括号 [] 的方式，跟 Series 类似。比如尝试获取上面这个表中的 open 列数据：¶

df['open'].head()

2018-12-03    3250.0
2018-12-04    3306.0
2018-12-05    3389.0
2018-12-06    3470.0
2018-12-07    3360.0
Name: open, dtype: float64

2.2.1 因为我们只获取一列，所以返回的就是一个 Series。可以用 type() 函数确认返回值的类型¶

type(df['open'])

pandas.core.series.Series

2.2.2 如果获取多个列，那返回的就是一个 DataFrame 类型：¶

df2 =df[['open','volume']].head()

df2

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	volume
2018-12-03	3250.0	5092070.0
2018-12-04	3306.0	4807696.0
2018-12-05	3389.0	5400890.0
2018-12-06	3470.0	4500922.0
2018-12-07	3360.0	5127774.0

type(df2)

pandas.core.frame.DataFrame

2.3 向 DataFrame 里增加数据列¶

创建一个列的时候，你需要先定义这个列的数据和索引。¶

增加数据列有两种办法：可以从头开始定义一个 pd.Series，再把它放到表中，也可以利用现有的列来产生需要的新列。比如下面两种操作：¶

2.3.1 定义一个 Series ，并放入 'open-close' 列中：新series的值是“NaN"¶

df['o-c'] =pd.Series()

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN

2.3.3 从现有的列创建新列¶

直接进行运算¶

df['o-c2'] = df['open'] - df['close']

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

def cmp(df1,df2):cmp = df1.values - df2.values#     if cmp > 0:#         cmp1 = True#     else:#         cmp1 = Falsereturn cmp

df['cmp'] = cmp(df['open'],df['close'])

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	cmp
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	-37.0

2.4 从 DataFrame 里删除行/列¶

2.4.1 想要删除某一行或一列，可以用 .drop() 函数。在使用这个函数的时候，你需要先指定具体的删除方向，axis=0 对应的是行 row，而 axis=1 对应的是列 column 。¶

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

要删除‘o-c’这一列，用axis=1，代表column¶

df.head().drop('o-c',axis=1)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	-37.0

请务必记住，除非用户明确指定，否则在调用 .drop() 的时候，Pandas 并不会真的永久性地删除这行/列。这主要是为了防止用户误操作丢失数据。

你可以通过调用 df 来确认数据的完整性。如果你确定要永久性删除某一行/列，你需要加上 inplace=True 参数，比如：

重新打印df

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

永久删除 inplace =True .....(报错了，以后再学)

df.head().drop('o-c',axis=1,inplace=True )

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py:3697: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the c*eats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)

2.5 获取 DataFrame 中的一行或多行数据¶

要获取某一行，你需要用 .loc[] 来按索引（标签名）引用这一行，或者用 .iloc[]，按这行在表中的位置（行数）来引用。¶

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

df.head().loc['2018-12-05']

open      3.389000e+03
close     3.463000e+03
high      3.466000e+03
low       3.362000e+03
volume    5.400890e+06
money     1.840964e+11
o-c                NaN
o-c2     -7.400000e+01
Name: 2018-12-05 00:00:00, dtype: float64

type(df.head().loc['2018-12-07'])

pandas.core.series.Series

输入： df.head(10).iloc[[1,9]] 试一试¶

用 .loc[] 来按索引（标签名）引用这一行¶

df.head(10).iloc[[1,9]]

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-14	3416.0	3445.0	3450.0	3405.0	4124206.0	1.413454e+11	NaN	-29.0

同时你可以用 .loc[] 来指定具体的行列范围，并生成一个子数据表，就像在 NumPy里做的一样。比如，提取 '12-08' 行中 'money’,'open'等1列或多列的内容，可以如下操作¶

df.head().loc['2018-12-07',['money']]

money    1.729243e+11
Name: 2018-12-07 00:00:00, dtype: float64

df.head().loc['2018-12-07',['money','high','open']]

money    1.729243e+11
high     3.426000e+03
open     3.360000e+03
Name: 2018-12-07 00:00:00, dtype: float64

type(df.head().loc['2018-12-05',['open','high','volume']])

pandas.core.series.Series

iloc 直接写索引内容，报错¶

因为：用 .iloc[]，按这行在表中的位置（行数）来引用。¶

df.head(10).iloc[['2018-12-04','2018-12-14']]

-ValueError                                Traceback (most recent call last)<ipython-input-73-74b1efb5eb04> in <module>> 1 df.head(10).iloc[['2018-12-04','2018-12-14']]/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)   1476    1477             maybe_callable = com._apply_if_callable(key, self.obj)-> 1478             return self._getitem_axis(maybe_callable, axis=axis)   1479    1480     def _is_scalar_access(self, key):/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)   2089         # a list of integers   2090         elif is_list_like_indexer(key):-> 2091             return self._get_list_axis(key, axis=axis)   2092    2093         # a single integer/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)   2068             axis = self.axis or 0   2069         try:-> 2070             return self.obj._take(key, axis=axis)   2071         except IndexError:   2072             # re-raise with different error message/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in _take(self, indices, axis, is_copy)   2787         new_data = self._data.take(indices,   2788                                    axis=self._get_block_manager_axis(axis),-> 2789                                    verify=True)   2790         result = self._constructor(new_data).__finalize__(self)   2791 /opt/conda/lib/python3.6/site-packages/pandas/core/internals.py in take(self, indexer, axis, verify, convert)   4524                              dtype='int64')   4525                    if isinstance(indexer, slice)-> 4526                    else np.asanyarray(indexer, dtype='int64'))   4527    4528         n = self.shape[axis]/opt/conda/lib/python3.6/site-packages/numpy/core/numeric.py in asanyarray(a, dtype, order)    542     543     """> 544     return array(a, dtype, copy=False, order=order, subok=True)    545     546 ValueError: invalid literal for int() with base 10: '2018-12-04'

2.5 条件筛选¶

2.51 用中括号 [] 的方式，除了直接指定选中某些列外，还能接收一个条件语句，然后筛选出符合条件的行/列。比如，我们希望在下面这个表格中筛选出 'o-c2'>0 的行：¶

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

df['o-c2']>0

2018-12-03    False
2018-12-04    False
2018-12-05    False
2018-12-06     True
2018-12-07    False
2018-12-10     True
2018-12-11    False
2018-12-12     True
2018-12-13    False
2018-12-14    False
2018-12-17     True
2018-12-18    False
2018-12-19     True
2018-12-20    False
2018-12-21    False
2018-12-24     True
2018-12-25     True
2018-12-26    False
2018-12-27     True
2018-12-28    False
2019-01-02     True
2019-01-03    False
2019-01-04    False
2019-01-07    False
2019-01-08     True
2019-01-09    False
2019-01-10    False
2019-01-11    False
2019-01-14    False
2019-01-15     True
2019-01-16    False
2019-01-17    False
2019-01-18    False
2019-01-21    False
2019-01-22     True
2019-01-23    False
2019-01-24    False
2019-01-25    False
2019-01-28     True
2019-01-29     True
2019-01-30     True
2019-01-31    False
2019-02-01    False
2019-02-11     True
2019-02-12     True
2019-02-13     True
2019-02-14     True
2019-02-15     True
2019-02-18    False
2019-02-19     True
2019-02-20     True
2019-02-21    False
2019-02-22    False
2019-02-25     True
2019-02-26    False
2019-02-27     True
2019-02-28    False
2019-03-01    False
Name: o-c2, dtype: bool

df[df['o-c2']>0][['open','close','money']]

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	money
2018-12-06	3470.0	3375.0	1.535990e+11
2018-12-10	3388.0	3312.0	1.584575e+11
2018-12-12	3340.0	3317.0	1.404878e+11
2018-12-17	3448.0	3435.0	1.345986e+11
2018-12-19	3432.0	3425.0	1.328182e+11
2018-12-24	3515.0	3451.0	1.227598e+11
2018-12-25	3442.0	3398.0	1.473822e+11
2018-12-27	3418.0	3396.0	1.428099e+11
2019-01-02	3398.0	3382.0	7.286511e+10
2019-01-08	3521.0	3505.0	9.602977e+10
2019-01-15	3576.0	3519.0	1.105785e+11
2019-01-22	3650.0	3633.0	1.455658e+11
2019-01-28	3712.0	3681.0	1.238699e+11
2019-01-29	3680.0	3675.0	9.212779e+10
2019-01-30	3683.0	3677.0	1.671236e+11
2019-02-11	3850.0	3825.0	1.342309e+11
2019-02-12	3818.0	3785.0	1.180507e+11
2019-02-13	3780.0	3702.0	1.425991e+11
2019-02-14	3711.0	3684.0	1.205294e+11
2019-02-15	3680.0	3599.0	1.423635e+11
2019-02-19	3670.0	3655.0	1.013994e+11
2019-02-20	3650.0	3641.0	1.418150e+11
2019-02-25	3736.0	3682.0	1.363059e+11
2019-02-27	3737.0	3715.0	1.098979e+11

你可以用逻辑运算符 &（与）和 |（或）来链接多个条件语句，以便一次应用多个筛选条件到当前的 DataFrame 上。举个栗子，你可以用下面的方法筛选出同时满足 'o-c2'>0 和'close'> 3400 的行：比如 ‘2018-12-06’这行

df.head(20)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0
2018-12-10	3388.0	3312.0	3398.0	3303.0	4741366.0	1.584575e+11	NaN	76.0
2018-12-11	3306.0	3327.0	3341.0	3284.0	3595984.0	1.189109e+11	NaN	-21.0
2018-12-12	3340.0	3317.0	3379.0	3317.0	4194776.0	1.404878e+11	NaN	23.0
2018-12-13	3325.0	3419.0	3431.0	3321.0	5327258.0	1.805737e+11	NaN	-94.0
2018-12-14	3416.0	3445.0	3450.0	3405.0	4124206.0	1.413454e+11	NaN	-29.0
2018-12-17	3448.0	3435.0	3470.0	3414.0	3915682.0	1.345986e+11	NaN	13.0
2018-12-18	3430.0	3435.0	3443.0	3391.0	3671214.0	1.255123e+11	NaN	-5.0
2018-12-19	3432.0	3425.0	3450.0	3411.0	3871012.0	1.328182e+11	NaN	7.0
2018-12-20	3433.0	3481.0	3492.0	3427.0	4306708.0	1.488089e+11	NaN	-48.0
2018-12-21	3500.0	3508.0	3535.0	3471.0	4313258.0	1.511073e+11	NaN	-8.0
2018-12-24	3515.0	3451.0	3516.0	3435.0	3542690.0	1.227598e+11	NaN	64.0
2018-12-25	3442.0	3398.0	3448.0	3357.0	4333400.0	1.473822e+11	NaN	44.0
2018-12-26	3393.0	3409.0	3430.0	3387.0	3285056.0	1.119878e+11	NaN	-16.0
2018-12-27	3418.0	3396.0	3471.0	3395.0	4172568.0	1.428099e+11	NaN	22.0
2018-12-28	3387.0	3404.0	3420.0	3382.0	2915936.0	9.916103e+10	NaN	-17.0

df[ (df['o-c2']>0) & (df['close'] > 3400) ]¶

df[ ( ) & () ] () = df['column name'] > 0¶

验证一下（）的类型¶

df[ (df['o-c2']>0) & (df['close'] > 3400)  ]

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-17	3448.0	3435.0	3470.0	3414.0	3915682.0	1.345986e+11	NaN	13.0
2018-12-19	3432.0	3425.0	3450.0	3411.0	3871012.0	1.328182e+11	NaN	7.0
2018-12-24	3515.0	3451.0	3516.0	3435.0	3542690.0	1.227598e+11	NaN	64.0
2019-01-08	3521.0	3505.0	3523.0	3483.0	2740906.0	9.602977e+10	NaN	16.0
2019-01-15	3576.0	3519.0	3576.0	3508.0	3128980.0	1.105785e+11	NaN	57.0
2019-01-22	3650.0	3633.0	3700.0	3630.0	3970252.0	1.455658e+11	NaN	17.0
2019-01-28	3712.0	3681.0	3757.0	3680.0	3337710.0	1.238699e+11	NaN	31.0
2019-01-29	3680.0	3675.0	3694.0	3653.0	2508228.0	9.212779e+10	NaN	5.0
2019-01-30	3683.0	3677.0	3767.0	3667.0	4490378.0	1.671236e+11	NaN	6.0
2019-02-11	3850.0	3825.0	3908.0	3816.0	3480360.0	1.342309e+11	NaN	25.0
2019-02-12	3818.0	3785.0	3830.0	3771.0	3109428.0	1.180507e+11	NaN	33.0
2019-02-13	3780.0	3702.0	3796.0	3692.0	3814494.0	1.425991e+11	NaN	78.0
2019-02-14	3711.0	3684.0	3717.0	3668.0	3266348.0	1.205294e+11	NaN	27.0
2019-02-15	3680.0	3599.0	3696.0	3593.0	3916316.0	1.423635e+11	NaN	81.0
2019-02-19	3670.0	3655.0	3698.0	3647.0	2760590.0	1.013994e+11	NaN	15.0
2019-02-20	3650.0	3641.0	3669.0	3578.0	3918544.0	1.418150e+11	NaN	9.0
2019-02-25	3736.0	3682.0	3766.0	3681.0	3659076.0	1.363059e+11	NaN	54.0
2019-02-27	3737.0	3715.0	3750.0	3697.0	2948082.0	1.098979e+11	NaN	22.0

xx = (df['o-c2']>0)

type(xx)

pandas.core.series.Series

yy = df['o-c2'] >0

type(yy)

pandas.core.series.Series

df[ (df['o-c2'] >0)  & (df['close'] > 3400)   ]

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-17	3448.0	3435.0	3470.0	3414.0	3915682.0	1.345986e+11	NaN	13.0
2018-12-19	3432.0	3425.0	3450.0	3411.0	3871012.0	1.328182e+11	NaN	7.0
2018-12-24	3515.0	3451.0	3516.0	3435.0	3542690.0	1.227598e+11	NaN	64.0
2019-01-08	3521.0	3505.0	3523.0	3483.0	2740906.0	9.602977e+10	NaN	16.0
2019-01-15	3576.0	3519.0	3576.0	3508.0	3128980.0	1.105785e+11	NaN	57.0
2019-01-22	3650.0	3633.0	3700.0	3630.0	3970252.0	1.455658e+11	NaN	17.0
2019-01-28	3712.0	3681.0	3757.0	3680.0	3337710.0	1.238699e+11	NaN	31.0
2019-01-29	3680.0	3675.0	3694.0	3653.0	2508228.0	9.212779e+10	NaN	5.0
2019-01-30	3683.0	3677.0	3767.0	3667.0	4490378.0	1.671236e+11	NaN	6.0
2019-02-11	3850.0	3825.0	3908.0	3816.0	3480360.0	1.342309e+11	NaN	25.0
2019-02-12	3818.0	3785.0	3830.0	3771.0	3109428.0	1.180507e+11	NaN	33.0
2019-02-13	3780.0	3702.0	3796.0	3692.0	3814494.0	1.425991e+11	NaN	78.0
2019-02-14	3711.0	3684.0	3717.0	3668.0	3266348.0	1.205294e+11	NaN	27.0
2019-02-15	3680.0	3599.0	3696.0	3593.0	3916316.0	1.423635e+11	NaN	81.0
2019-02-19	3670.0	3655.0	3698.0	3647.0	2760590.0	1.013994e+11	NaN	15.0
2019-02-20	3650.0	3641.0	3669.0	3578.0	3918544.0	1.418150e+11	NaN	9.0
2019-02-25	3736.0	3682.0	3766.0	3681.0	3659076.0	1.363059e+11	NaN	54.0
2019-02-27	3737.0	3715.0	3750.0	3697.0	2948082.0	1.098979e+11	NaN	22.0

2.6 重置 DataFrame 的索引¶

如果你觉得当前 DataFrame 的索引有问题，你可以用 .reset_index() 简单地把整个表的索引都重置掉。这个方法将把目标 DataFrame 的索引保存在一个叫 index 的列中，而把表格的索引变成默认的从零开始的数字，也就是 [0, ..., len(data) - 1] 。比如下面这样：¶

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

df.reset_index().head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

和删除操作差不多，.reset_index() 并不会永久改变你表格的索引，除非你调用的时候明确传入了 inplace 参数，比如：.reset_index(inplace=True)¶

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	o-c	o-c2
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

df.reset_index(inplace=True)

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

2.7 设置 DataFrame 的索引值¶

类似地，我们还可以用 .set_index() 方法，将 DataFrame 里的某一列作为索引来用。比如，我们在这个表里新建一个名为 "ID" 的列¶

：

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

df['code'] ='rb9999'

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2	code
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0	rb9999
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0	rb9999
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0	rb9999
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0	rb9999
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0	rb9999

df.set_index('code').head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2
code
rb9999	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0
rb9999	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0
rb9999	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0
rb9999	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0
rb9999	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2	code
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0	rb9999
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0	rb9999
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0	rb9999
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0	rb9999
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0	rb9999

注意，不像 .reset_index() 会保留一个备份，然后才用默认的索引值代替原索引，.set_index() 将会完全覆盖原来的索引值。

2.8 多级索引（MultiIndex）以及命名索引的不同等级¶

多级索引其实就是一个由元组（Tuple）组成的数组，每一个元组都是独一无二的。你可以从一个包含许多数组的列表中创建多级索引（调用 MultiIndex.from_arrays ），也可以用一个包含许多元组的数组（调用 MultiIndex.from_tuples ）或者是用一对可迭代对象的集合（比如两个列表，互相两两配对）来构建（调用MultiIndex.from_product ）。¶

下面这个例子，我们从元组中创建多级索引：¶

应该暂时用不到多级索引，以后学习。

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2	code
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0	rb9999
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0	rb9999
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0	rb9999
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0	rb9999
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0	rb9999

我们可以用 .index.names 给它们加上名字？？？？

2.9 清洗数据¶

删除或填充空值¶

在许多情况下，如果你用 Pandas 来读取大量数据，往往会发现原始数据中会存在不完整的地方。在 DataFrame 中缺少数据的位置， Pandas 会自动填入一个空值，比如 NaN或 Null 。因此，我们可以选择用 .dropna() 来丢弃这些自动填充的值，或是用.fillna() 来自动给这些空值填充数据。¶

比如这个例子：¶

df.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2	code
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0	rb9999
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0	rb9999
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0	rb9999
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0	rb9999
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0	rb9999

df.dropna()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2	code

请注意，如果你没有指定 axis 参数，默认是删除行。¶

axis =1 , 删除含有NaN，Null 的列¶

df.dropna(axis =  1).head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c2	code
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	-83.0	rb9999
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	-68.0	rb9999
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	-74.0	rb9999
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	95.0	rb9999
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	-37.0	rb9999

df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	index	open	close	high	low	volume	money	o-c	o-c2	code
0	2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	NaN	-83.0	rb9999
1	2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	NaN	-68.0	rb9999
2	2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	NaN	-74.0	rb9999
3	2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	NaN	95.0	rb9999
4	2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	NaN	-37.0	rb9999
5	2018-12-10	3388.0	3312.0	3398.0	3303.0	4741366.0	1.584575e+11	NaN	76.0	rb9999
6	2018-12-11	3306.0	3327.0	3341.0	3284.0	3595984.0	1.189109e+11	NaN	-21.0	rb9999
7	2018-12-12	3340.0	3317.0	3379.0	3317.0	4194776.0	1.404878e+11	NaN	23.0	rb9999
8	2018-12-13	3325.0	3419.0	3431.0	3321.0	5327258.0	1.805737e+11	NaN	-94.0	rb9999
9	2018-12-14	3416.0	3445.0	3450.0	3405.0	4124206.0	1.413454e+11	NaN	-29.0	rb9999
10	2018-12-17	3448.0	3435.0	3470.0	3414.0	3915682.0	1.345986e+11	NaN	13.0	rb9999
11	2018-12-18	3430.0	3435.0	3443.0	3391.0	3671214.0	1.255123e+11	NaN	-5.0	rb9999
12	2018-12-19	3432.0	3425.0	3450.0	3411.0	3871012.0	1.328182e+11	NaN	7.0	rb9999
13	2018-12-20	3433.0	3481.0	3492.0	3427.0	4306708.0	1.488089e+11	NaN	-48.0	rb9999
14	2018-12-21	3500.0	3508.0	3535.0	3471.0	4313258.0	1.511073e+11	NaN	-8.0	rb9999
15	2018-12-24	3515.0	3451.0	3516.0	3435.0	3542690.0	1.227598e+11	NaN	64.0	rb9999
16	2018-12-25	3442.0	3398.0	3448.0	3357.0	4333400.0	1.473822e+11	NaN	44.0	rb9999
17	2018-12-26	3393.0	3409.0	3430.0	3387.0	3285056.0	1.119878e+11	NaN	-16.0	rb9999
18	2018-12-27	3418.0	3396.0	3471.0	3395.0	4172568.0	1.428099e+11	NaN	22.0	rb9999
19	2018-12-28	3387.0	3404.0	3420.0	3382.0	2915936.0	9.916103e+10	NaN	-17.0	rb9999
20	2019-01-02	3398.0	3382.0	3439.0	3371.0	2149936.0	7.286511e+10	NaN	16.0	rb9999
21	2019-01-03	3378.0	3455.0	3456.0	3366.0	3719028.0	1.271959e+11	NaN	-77.0	rb9999
22	2019-01-04	3460.0	3486.0	3493.0	3430.0	3323322.0	1.149595e+11	NaN	-26.0	rb9999
23	2019-01-07	3490.0	3520.0	3526.0	3466.0	3053042.0	1.068218e+11	NaN	-30.0	rb9999
24	2019-01-08	3521.0	3505.0	3523.0	3483.0	2740906.0	9.602977e+10	NaN	16.0	rb9999
25	2019-01-09	3505.0	3507.0	3550.0	3494.0	3378598.0	1.190845e+11	NaN	-2.0	rb9999
26	2019-01-10	3512.0	3514.0	3543.0	3502.0	3134668.0	1.104706e+11	NaN	-2.0	rb9999
27	2019-01-11	3517.0	3539.0	3540.0	3492.0	3440962.0	1.210772e+11	NaN	-22.0	rb9999
28	2019-01-14	3535.0	3575.0	3576.0	3520.0	2624250.0	9.324587e+10	NaN	-40.0	rb9999
29	2019-01-15	3576.0	3519.0	3576.0	3508.0	3128980.0	1.105785e+11	NaN	57.0	rb9999
30	2019-01-16	3519.0	3534.0	3545.0	3510.0	2632026.0	9.282553e+10	NaN	-15.0	rb9999
31	2019-01-17	3529.0	3551.0	3565.0	3525.0	2258818.0	8.021438e+10	NaN	-22.0	rb9999
32	2019-01-18	3552.0	3633.0	3636.0	3540.0	3599670.0	1.294739e+11	NaN	-81.0	rb9999
33	2019-01-21	3636.0	3645.0	3675.0	3619.0	3526340.0	1.285795e+11	NaN	-9.0	rb9999
34	2019-01-22	3650.0	3633.0	3700.0	3630.0	3970252.0	1.455658e+11	NaN	17.0	rb9999
35	2019-01-23	3639.0	3644.0	3658.0	3625.0	2631994.0	9.588602e+10	NaN	-5.0	rb9999
36	2019-01-24	3647.0	3680.0	3684.0	3639.0	3200450.0	1.171679e+11	NaN	-33.0	rb9999
37	2019-01-25	3684.0	3710.0	3740.0	3675.0	3099552.0	1.150852e+11	NaN	-26.0	rb9999
38	2019-01-28	3712.0	3681.0	3757.0	3680.0	3337710.0	1.238699e+11	NaN	31.0	rb9999
39	2019-01-29	3680.0	3675.0	3694.0	3653.0	2508228.0	9.212779e+10	NaN	5.0	rb9999
40	2019-01-30	3683.0	3677.0	3767.0	3667.0	4490378.0	1.671236e+11	NaN	6.0	rb9999
41	2019-01-31	3682.0	3707.0	3718.0	3673.0	2724362.0	1.006953e+11	NaN	-25.0	rb9999
42	2019-02-01	3696.0	3754.0	3770.0	3696.0	2506846.0	9.364989e+10	NaN	-58.0	rb9999
43	2019-02-11	3850.0	3825.0	3908.0	3816.0	3480360.0	1.342309e+11	NaN	25.0	rb9999
44	2019-02-12	3818.0	3785.0	3830.0	3771.0	3109428.0	1.180507e+11	NaN	33.0	rb9999
45	2019-02-13	3780.0	3702.0	3796.0	3692.0	3814494.0	1.425991e+11	NaN	78.0	rb9999
46	2019-02-14	3711.0	3684.0	3717.0	3668.0	3266348.0	1.205294e+11	NaN	27.0	rb9999
47	2019-02-15	3680.0	3599.0	3696.0	3593.0	3916316.0	1.423635e+11	NaN	81.0	rb9999
48	2019-02-18	3615.0	3659.0	3675.0	3611.0	3532344.0	1.287828e+11	NaN	-44.0	rb9999
49	2019-02-19	3670.0	3655.0	3698.0	3647.0	2760590.0	1.013994e+11	NaN	15.0	rb9999
50	2019-02-20	3650.0	3641.0	3669.0	3578.0	3918544.0	1.418150e+11	NaN	9.0	rb9999
51	2019-02-21	3640.0	3677.0	3695.0	3635.0	3928972.0	1.442938e+11	NaN	-37.0	rb9999
52	2019-02-22	3669.0	3731.0	3735.0	3663.0	3278144.0	1.215958e+11	NaN	-62.0	rb9999
53	2019-02-25	3736.0	3682.0	3766.0	3681.0	3659076.0	1.363059e+11	NaN	54.0	rb9999
54	2019-02-26	3677.0	3736.0	3745.0	3672.0	3998202.0	1.482914e+11	NaN	-59.0	rb9999
55	2019-02-27	3737.0	3715.0	3750.0	3697.0	2948082.0	1.098979e+11	NaN	22.0	rb9999
56	2019-02-28	3728.0	3750.0	3755.0	3705.0	3461158.0	1.292056e+11	NaN	-22.0	rb9999
57	2019-03-01	3756.0	3815.0	3816.0	3751.0	2986416.0	1.127984e+11	NaN	-59.0	rb9999

类似的，如果你使用 .fillna() 方法，Pandas 将对这个 DataFrame 里所有的空值位置填上你指定的默认值。比如，将表中所有 NaN 替换成 20 ：

同理，.dropna() 和 .fillna() 并不会永久性改变你的数据，除非你传入了inplace=True 参数。

df['cross']= nan

df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	open	close	high	low	volume	money	cmp	cross
2018-12-03	3250.0	3333.0	3495.0	3208.0	5092070.0	1.696123e+11	-83.0	NaN
2018-12-04	3306.0	3374.0	3388.0	3283.0	4807696.0	1.600262e+11	-68.0	NaN
2018-12-05	3389.0	3463.0	3466.0	3362.0	5400890.0	1.840964e+11	-74.0	NaN
2018-12-06	3470.0	3375.0	3473.0	3371.0	4500922.0	1.535990e+11	95.0	NaN
2018-12-07	3360.0	3397.0	3426.0	3334.0	5127774.0	1.729243e+11	-37.0	NaN
2018-12-10	3388.0	3312.0	3398.0	3303.0	4741366.0	1.584575e+11	76.0	NaN
2018-12-11	3306.0	3327.0	3341.0	3284.0	3595984.0	1.189109e+11	-21.0	NaN
2018-12-12	3340.0	3317.0	3379.0	3317.0	4194776.0	1.404878e+11	23.0	NaN
2018-12-13	3325.0	3419.0	3431.0	3321.0	5327258.0	1.805737e+11	-94.0	NaN
2018-12-14	3416.0	3445.0	3450.0	3405.0	4124206.0	1.413454e+11	-29.0	NaN
2018-12-17	3448.0	3435.0	3470.0	3414.0	3915682.0	1.345986e+11	13.0	NaN
2018-12-18	3430.0	3435.0	3443.0	3391.0	3671214.0	1.255123e+11	-5.0	NaN
2018-12-19	3432.0	3425.0	3450.0	3411.0	3871012.0	1.328182e+11	7.0	NaN
2018-12-20	3433.0	3481.0	3492.0	3427.0	4306708.0	1.488089e+11	-48.0	NaN
2018-12-21	3500.0	3508.0	3535.0	3471.0	4313258.0	1.511073e+11	-8.0	NaN
2018-12-24	3515.0	3451.0	3516.0	3435.0	3542690.0	1.227598e+11	64.0	NaN
2018-12-25	3442.0	3398.0	3448.0	3357.0	4333400.0	1.473822e+11	44.0	NaN
2018-12-26	3393.0	3409.0	3430.0	3387.0	3285056.0	1.119878e+11	-16.0	NaN
2018-12-27	3418.0	3396.0	3471.0	3395.0	4172568.0	1.428099e+11	22.0	NaN
2018-12-28	3387.0	3404.0	3420.0	3382.0	2915936.0	9.916103e+10	-17.0	NaN
2019-01-02	3398.0	3382.0	3439.0	3371.0	2149936.0	7.286511e+10	16.0	NaN
2019-01-03	3378.0	3455.0	3456.0	3366.0	3719028.0	1.271959e+11	-77.0	NaN
2019-01-04	3460.0	3486.0	3493.0	3430.0	3323322.0	1.149595e+11	-26.0	NaN
2019-01-07	3490.0	3520.0	3526.0	3466.0	3053042.0	1.068218e+11	-30.0	NaN
2019-01-08	3521.0	3505.0	3523.0	3483.0	2740906.0	9.602977e+10	16.0	NaN
2019-01-09	3505.0	3507.0	3550.0	3494.0	3378598.0	1.190845e+11	-2.0	NaN
2019-01-10	3512.0	3514.0	3543.0	3502.0	3134668.0	1.104706e+11	-2.0	NaN
2019-01-11	3517.0	3539.0	3540.0	3492.0	3440962.0	1.210772e+11	-22.0	NaN
2019-01-14	3535.0	3575.0	3576.0	3520.0	2624250.0	9.324587e+10	-40.0	NaN
2019-01-15	3576.0	3519.0	3576.0	3508.0	3128980.0	1.105785e+11	57.0	NaN
2019-01-16	3519.0	3534.0	3545.0	3510.0	2632026.0	9.282553e+10	-15.0	NaN
2019-01-17	3529.0	3551.0	3565.0	3525.0	2258818.0	8.021438e+10	-22.0	NaN
2019-01-18	3552.0	3633.0	3636.0	3540.0	3599670.0	1.294739e+11	-81.0	NaN
2019-01-21	3636.0	3645.0	3675.0	3619.0	3526340.0	1.285795e+11	-9.0	NaN
2019-01-22	3650.0	3633.0	3700.0	3630.0	3970252.0	1.455658e+11	17.0	NaN
2019-01-23	3639.0	3644.0	3658.0	3625.0	2631994.0	9.588602e+10	-5.0	NaN
2019-01-24	3647.0	3680.0	3684.0	3639.0	3200450.0	1.171679e+11	-33.0	NaN
2019-01-25	3684.0	3710.0	3740.0	3675.0	3099552.0	1.150852e+11	-26.0	NaN
2019-01-28	3712.0	3681.0	3757.0	3680.0	3337710.0	1.238699e+11	31.0	NaN
2019-01-29	3680.0	3675.0	3694.0	3653.0	2508228.0	9.212779e+10	5.0	NaN
2019-01-30	3683.0	3677.0	3767.0	3667.0	4490378.0	1.671236e+11	6.0	NaN
2019-01-31	3682.0	3707.0	3718.0	3673.0	2724362.0	1.006953e+11	-25.0	NaN
2019-02-01	3696.0	3754.0	3770.0	3696.0	2506846.0	9.364989e+10	-58.0	NaN
2019-02-11	3850.0	3825.0	3908.0	3816.0	3480360.0	1.342309e+11	25.0	NaN
2019-02-12	3818.0	3785.0	3830.0	3771.0	3109428.0	1.180507e+11	33.0	NaN
2019-02-13	3780.0	3702.0	3796.0	3692.0	3814494.0	1.425991e+11	78.0	NaN
2019-02-14	3711.0	3684.0	3717.0	3668.0	3266348.0	1.205294e+11	27.0	NaN
2019-02-15	3680.0	3599.0	3696.0	3593.0	3916316.0	1.423635e+11	81.0	NaN
2019-02-18	3615.0	3659.0	3675.0	3611.0	3532344.0	1.287828e+11	-44.0	NaN
2019-02-19	3670.0	3655.0	3698.0	3647.0	2760590.0	1.013994e+11	15.0	NaN
2019-02-20	3650.0	3641.0	3669.0	3578.0	3918544.0	1.418150e+11	9.0	NaN
2019-02-21	3640.0	3677.0	3695.0	3635.0	3928972.0	1.442938e+11	-37.0	NaN
2019-02-22	3669.0	3731.0	3735.0	3663.0	3278144.0	1.215958e+11	-62.0	NaN
2019-02-25	3736.0	3682.0	3766.0	3681.0	3659076.0	1.363059e+11	54.0	NaN
2019-02-26	3677.0	3736.0	3745.0	3672.0	3998202.0	1.482914e+11	-59.0	NaN
2019-02-27	3737.0	3715.0	3750.0	3697.0	2948082.0	1.098979e+11	22.0	NaN
2019-02-28	3728.0	3750.0	3755.0	3705.0	3461158.0	1.292056e+11	-22.0	NaN
2019-03-01	3756.0	3815.0	3816.0	3751.0	2986416.0	1.127984e+11	-59.0	NaN

def cross(cmp):if cmp > 0:crs = Trueelse:crs = Falsereturn crs

cross(df['cmp'])

-ValueError                                Traceback (most recent call last)<ipython-input-26-7833c6976bc2> in <module>> 1 cross(df['cmp'])<ipython-input-25-70d306d78f78> in cross(cmp)      1 def cross(cmp):> 2     if cmp > 0:      3         crs = True      4     else:      5         crs = False/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)   1574         raise ValueError("The truth value of a {0} is ambiguous. "   1575                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."-> 1576                          .format(self.__class__.__name__))   1577    1578     __bool__ = __nonzero__ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

ts =pd.Series(np.random.randn(1000), index=pd.date_range("2001-1-1",periods = 1000))

df4 = pd.DataFrame(np.random.randn(1000,4),index=ts.index,columns = ['a','b','c','d'])

df5 =df4.cumsum()

df5.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f774e988400>

df5.dtypes

a    float64
b    float64
c    float64
d    float64
dtype: object

df.dtypes

index     datetime64[ns]
open             float64
close            float64
high             float64
low              float64
volume           float64
money            float64
o-c              float64
o-c2             float64
code              object
dtype: object

全部回复

0/140

本社区仅针对特定人员开放

查看需注册登录并通过风险意识测评

5秒后跳转登录页面...

达人推荐

关注
律孚

粉丝:734

帖子数:0

律孚

0帖子2关注734粉丝

关注拉黑私信
关注
临沂石老师

粉丝:565

帖子数:0

临沂石老师

0帖子12关注565粉丝

关注拉黑私信
关注
犇犇

粉丝:686

帖子数:0

犇犇

0帖子56关注686粉丝

关注拉黑私信

量化课程

移动端课程

量化交易吧 / 数理科学 帖子：3371496 新帖：2

重温pandas

SCSDV_d发表于：5 月 9 日 21：53回复(1)

1. Series¶

1.1 创建一个 Series 的基本语法如下:¶

my_series = pd.Series(data,index)¶

上面的 data 参数可以是任意数据对象，比如字典、列表甚至是 NumPy 数组，而index 参数则是对 data 的索引值，类似字典的 key。¶

1.1.1创建一个Series，data是数字，索引是字符串¶

以上，括号前面的是Series的values，后面的是index¶

注意：请记住， index 参数是可省略的，你可以选择不输入这个参数。如果不带 index 参数，Pandas 会自动用默认 index 进行索引，类似数组，索引值是 [0, ..., len(data) - 1]¶

如果你从一个 Python 字典对象创建 Series，Pandas 会自动把字典的键值设置成 Series 的 index，并将对应的 values 放在和索引对应的 data 里。¶

和 NumPy 数组不同，Pandas 的 Series 能存放各种不同类型的对象。¶

1.1.2从 Series 里获取数据¶

访问 Series 里的数据的方式，和 Python 字典基本一样：¶

1.1.3 对 Series 进行算术运算操作¶

对 Series 的算术运算都是基于 index 进行的。我们可以用加减乘除（+ - * /）这样的运算符对两个 Series 进行运算，Pandas 将会根据索引 index，对响应的数据进行计算，结果将会以浮点数的形式存储，以避免丢失精度。¶

Series基于索引进行运算，我们不必理会对齐的问题，系统自动将索引相同的值进行运算；如果一个series缺少某个索引，其运算结果为 NaN¶

2. DataFrames¶

Pandas 的 DataFrame（数据表）是一种 2 维数据结构，数据以表格的形式存储，分成若干行和列。通过 DataFrame，你能很方便地处理数据。常见的操作比如选取、替换行或列的数据，还能重组数据表、修改索引、多重筛选等。¶

2.1构建一个 DataFrame 对象的基本语法如下：¶

看，上面表中的每一列基本上就是一个 Series ，它们都用了同一个 index。因此，我们基本上可以把 DataFrame 理解成一组采用同样索引的 Series 的集合。¶

2.2 获取 DataFrame 中的列 < color = red >¶

要获取一列的数据，还是用中括号 [] 的方式，跟 Series 类似。比如尝试获取上面这个表中的 open 列数据：¶

2.2.1 因为我们只获取一列，所以返回的就是一个 Series。可以用 type() 函数确认返回值的类型¶

2.2.2 如果获取多个列，那返回的就是一个 DataFrame 类型：¶

2.3 向 DataFrame 里增加数据列¶

创建一个列的时候，你需要先定义这个列的数据和索引。¶

增加数据列有两种办法：可以从头开始定义一个 pd.Series，再把它放到表中，也可以利用现有的列来产生需要的新列。比如下面两种操作：¶

2.3.1 定义一个 Series ，并放入 'open-close' 列中： 新series的值是“NaN"¶

2.3.3 从现有的列创建新列¶

直接进行运算¶

2.4 从 DataFrame 里删除行/列¶

2.4.1 想要删除某一行或一列，可以用 .drop() 函数。在使用这个函数的时候，你需要先指定具体的删除方向，axis=0 对应的是行 row，而 axis=1 对应的是列 column 。¶

要删除‘o-c’这一列，用axis=1，代表column¶

2.5 获取 DataFrame 中的一行或多行数据¶

要获取某一行，你需要用 .loc[] 来按索引（标签名）引用这一行，或者用 .iloc[]，按这行在表中的位置（行数）来引用。¶

输入： df.head(10).iloc[[1,9]] 试一试¶

用 .loc[] 来按索引（标签名）引用这一行¶

同时你可以用 .loc[] 来指定具体的行列范围，并生成一个子数据表，就像在 NumPy里做的一样。比如，提取 '12-08' 行中 'money’,'open'等1列或多 列的内容，可以如下操作¶

iloc 直接写索引内容，报错¶

因为 ： 用 .iloc[]，按这行在表中的位置（行数）来引用。¶

2.5 条件筛选¶

2.51 用中括号 [] 的方式，除了直接指定选中某些列外，还能接收一个条件语句，然后筛选出符合条件的行/列。比如，我们希望在下面这个表格中筛选出 'o-c2'>0 的行：¶

df[ (df['o-c2']>0) & (df['close'] > 3400) ]¶

df[ ( ) & () ] () = df['column name'] > 0¶

验证一下（）的类型¶

2.6 重置 DataFrame 的索引¶

和删除操作差不多，.reset_index() 并不会永久改变你表格的索引，除非你调用的时候明确传入了 inplace 参数，比如：.reset_index(inplace=True)¶

2.7 设置 DataFrame 的索引值¶

类似地，我们还可以用 .set_index() 方法，将 DataFrame 里的某一列作为索引来用。比如，我们在这个表里新建一个名为 "ID" 的列¶

2.8 多级索引（MultiIndex）以及命名索引的不同等级¶

下面这个例子，我们从元组中创建多级索引：¶

2.9 清洗数据¶

删除或填充空值¶

比如这个例子：¶

请注意，如果你没有指定 axis 参数，默认是删除行。¶

axis =1 , 删除含有NaN，Null 的列¶

全部回复

0/140

粉丝:734

帖子数:0

粉丝:565

帖子数:0

粉丝:686

帖子数:0

量化课程

热门标签

删除回复

确认要删除这篇文章么？

举报用户

信息提示

该文章已删除

设置置顶

完成设置【置顶】！

设置置顶

已取消设置【置顶】！

设置精华

完成设置【精华】！

设置精华

已取消设置【精华】！

量化交易吧 / 数理科学帖子：3371496 新帖：2

2.3.1 定义一个 Series ，并放入 'open-close' 列中：新series的值是“NaN"¶

同时你可以用 .loc[] 来指定具体的行列范围，并生成一个子数据表，就像在 NumPy里做的一样。比如，提取 '12-08' 行中 'money’,'open'等1列或多列的内容，可以如下操作¶

因为：用 .iloc[]，按这行在表中的位置（行数）来引用。¶