使用groupby对象--pandas编辑数据框条目 [英] Edit dataframe entries using groupby object --pandas

查看:40
本文介绍了使用groupby对象--pandas编辑数据框条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下数据框:

     index      count     signal
       1          1         1
       2          1        NAN
       3          1        NAN
       4          1        -1
       5          1        NAN
       6          2        NAN
       7          2        -1
       8          2        NAN
       9          3        NAN
       10         3        NAN
       11         3        NAN
       12         4        1
       13         4        NAN
       14         4        NAN

我需要在信号"中填充" NAN,并且具有不同计数"值的值不应相互影响.这样我应该得到以下数据框:

I need to 'ffill' the NANs in 'signal' and values with different 'count' value should not affect each other. such that I should get the following dataframe:

     index      count     signal
       1          1         1
       2          1         1
       3          1         1
       4          1        -1
       5          1        -1
       6          2        NAN
       7          2        -1
       8          2        -1
       9          3        NAN
       10         3        NAN
       11         3        NAN
       12         4        1
       13         4        1
       14         4        1

现在,我逐个对象地遍历每个数据帧并填充NAN值,然后复制到新的数据帧:

Right now I iterate through each data frame in group by object and fill NAN value and then copy to a new data frame:

new_table = np.array([]); 
for key, group in df.groupby('count'):
    group['signal'] = group['signal'].fillna(method='ffill')
    group1 = group.copy()
    if new_table.shape[0]==0:
        new_table = group1
    else:
        new_table = pd.concat([new_table,group1])

这有点奏效,但考虑到数据帧很大,这真的很慢.我想知道是否有其他方法可以使用groupby方法或不使用groupby方法.谢谢!

which kinda works, but really slow considering the data frame is large. I am wondering if there is any other method to do it with or without groupby methods. Thanks!

已编辑

感谢Alexander和jwilner提供了替代方法.但是,对于我的拥有80万行数据的大数据框,这两种方法都非常慢.

Thanks to Alexander and jwilner for providing alternative methods. However both methods are very slow for my big dataframe which has 800,000 rows of data.

推荐答案

使用

但是,请注意 groupby 对商品进行重新排序.如果count列并非总是保持不变或增加,而是可以在其中重复输入值,则 groupby 可能会出现问题.也就是说,给定 count 系列,例如 [1、1、2、2、1] groupby 将会像这样分组:[1,1,1],[2,2] ,这可能会对您的向前填充产生不良影响.如果不希望如此,则必须创建一个新系列以与 groupby 一起使用,该系列始终根据计数序列的变化保持不变或增加-可能使用 pd.Series.diff pd.Series.cumsum

However, be aware that groupby reorders stuff. If the count column doesn't always stay the same or increase, but instead can have values repeated in it, groupby might be problematic. That is, given a count series like [1, 1, 2, 2, 1], groupby will group like so: [1, 1, 1], [2, 2], which could have possibly undesirable effects on your forward filling. If that were undesired, you'd have to create a new series to use with groupby that always stayed the same or increased according to changes in the count series -- probably using pd.Series.diff and pd.Series.cumsum

这篇关于使用groupby对象--pandas编辑数据框条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆