如何用pythonic方式填充Pandas数据框的丢失记录? [英] How to fill the missing record of Pandas dataframe in pythonic way?

查看:54
本文介绍了如何用pythonic方式填充Pandas数据框的丢失记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的Pandas数据框"df":

I have a Pandas dataframe 'df' like this :

         X   Y  
IX1 IX2
A   A1  20  30
    A2  20  30
    A5  20  30
B   B2  20  30
    B4  20  30

它丢失了一些行,我想这样填补中间的空白:

It lost some rows, and I want to fill in the gap in the middle like this:

         X   Y  
IX1 IX2
A   A1  20  30
    A2  20  30
    A3  NaN NaN
    A4  NaN NaN
    A5  20  30
B   B2  20  30
    B3  NaN NaN
    B4  20  30

有没有一种pythonic的方法来做到这一点?

Is there a pythonic way to do this ?

推荐答案

您需要构造完整索引,然后使用数据框的reindex方法.像这样...

You need to construct your full index, and then use the reindex method of the dataframe. Like so...

import pandas
import StringIO
datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,A1,20,30
A,A2,20,30
A,A5,20,30
B,B2,20,30
B,B4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = [('A', 'A1'), ('A', 'A2'), ('A', 'A3'), 
              ('A', 'A4'), ('A', 'A5'), ('B', 'B1'), 
              ('B', 'B2'), ('B', 'B3'), ('B', 'B4')]
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
A A1  20  30
  A2  20  30
  A3 NaN NaN
  A4 NaN NaN
  A5  20  30
B B1 NaN NaN
  B2  20  30
  B3  20  30
  B4  20  30

然后可以使用fillna方法将NaN设置为所需的任何内容.

And then you can use the fillna method to set the NaNs to whatever you want.

只需要自己重新审视一下... 在当前版本的熊猫中,有一个函数可从可迭代的笛卡尔乘积构建MultiIndex.所以上面的解决方案可能变成:

Just had to revisit this myself... In the current version of pandas, there is a function to build MultiIndex from the Cartesian product of iterables. So the above solution could become:

datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,1,20,30
A,2,20,30
A,5,20,30
B,2,20,30
B,4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = pandas.MultiIndex.from_product([('A', 'B'), range(6)], names=['C1', 'C2'])
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
C1 C2
 A  1  20  30
    2  20  30
    3 NaN NaN
    4 NaN NaN
    5  20  30
 B  1 NaN NaN
    2  20  30
    3  20  30
    4  20  30
    5 NaN NaN

我认为这很优雅.

这篇关于如何用pythonic方式填充Pandas数据框的丢失记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆