如何用pythonic方式填充Pandas数据框的丢失记录? [英] How to fill the missing record of Pandas dataframe in pythonic way?
问题描述
我有一个像这样的Pandas数据框"df":
I have a Pandas dataframe 'df' like this :
X Y
IX1 IX2
A A1 20 30
A2 20 30
A5 20 30
B B2 20 30
B4 20 30
它丢失了一些行,我想这样填补中间的空白:
It lost some rows, and I want to fill in the gap in the middle like this:
X Y
IX1 IX2
A A1 20 30
A2 20 30
A3 NaN NaN
A4 NaN NaN
A5 20 30
B B2 20 30
B3 NaN NaN
B4 20 30
有没有一种pythonic的方法来做到这一点?
Is there a pythonic way to do this ?
推荐答案
您需要构造完整索引,然后使用数据框的reindex
方法.像这样...
You need to construct your full index, and then use the reindex
method of the dataframe. Like so...
import pandas
import StringIO
datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,A1,20,30
A,A2,20,30
A,A5,20,30
B,B2,20,30
B,B4,20,30""")
dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = [('A', 'A1'), ('A', 'A2'), ('A', 'A3'),
('A', 'A4'), ('A', 'A5'), ('B', 'B1'),
('B', 'B2'), ('B', 'B3'), ('B', 'B4')]
new_df = dataframe.reindex(full_index)
new_df
C3 C4
A A1 20 30
A2 20 30
A3 NaN NaN
A4 NaN NaN
A5 20 30
B B1 NaN NaN
B2 20 30
B3 20 30
B4 20 30
然后可以使用fillna
方法将NaN设置为所需的任何内容.
And then you can use the fillna
method to set the NaNs to whatever you want.
只需要自己重新审视一下...
在当前版本的熊猫中,有一个函数可从可迭代的笛卡尔乘积构建MultiIndex
.所以上面的解决方案可能变成:
Just had to revisit this myself...
In the current version of pandas, there is a function to build MultiIndex
from the Cartesian product of iterables. So the above solution could become:
datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,1,20,30
A,2,20,30
A,5,20,30
B,2,20,30
B,4,20,30""")
dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = pandas.MultiIndex.from_product([('A', 'B'), range(6)], names=['C1', 'C2'])
new_df = dataframe.reindex(full_index)
new_df
C3 C4
C1 C2
A 1 20 30
2 20 30
3 NaN NaN
4 NaN NaN
5 20 30
B 1 NaN NaN
2 20 30
3 20 30
4 20 30
5 NaN NaN
我认为这很优雅.
这篇关于如何用pythonic方式填充Pandas数据框的丢失记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!