根据数字和组ID扩展 pandas 数据框行(Python 3). [英] Expanding pandas Data Frame rows based on number and group ID (Python 3).

查看:298
本文介绍了根据数字和组ID扩展 pandas 数据框行(Python 3).的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力寻找一种基于预定数字和分组变量(id)来扩展/克隆观察行的方法.对于上下文,这是一个使用pandas和numpy(python3)的示例数据帧.

I have been struggling with finding a way to expand/clone observation rows based on a pre-determined number and a grouping variable (id). For context, here is an example data frame using pandas and numpy (python3).

df = pd.DataFrame([[1, 15], [2, 20]], columns = ['id', 'num'])

df
Out[54]:
  id  num
0   1   15
1   2   20 

我要根据行的ID组按"num"变量中给出的数字扩展/克隆行.在这种情况下,我想要id = 1的15行和id = 2的20行.这可能是一个简单的问题,但是我正在努力进行这项工作.我一直在搞乱reindex和np.repeat,但是概念上的部分对我来说不太合适.

I want to expand/clone the rows by the number given in the "num" variable based on their ID group. In this case, I would want 15 rows for id = 1 and 20 rows for id = 2. This is probably an easy question, but I am struggling to make this work. I've been messing around with reindex and np.repeat, but the conceptual pieces are not fitting together for me.

在R中,我使用了splitstackshape包中的expandRows函数,它看起来像这样:

In R, I used the expandRows function found in the splitstackshape package, which would look something like this:

library(splitstackshape)

df <- data.frame(id = c(1, 2), num = c(15, 20))


df
  id num
1  1  15
2  2  20


df2 <- expandRows(df, "num", drop = FALSE)
df2
     id num
1     1  15
1.1   1  15
1.2   1  15
1.3   1  15
1.4   1  15
1.5   1  15
1.6   1  15
1.7   1  15
1.8   1  15
1.9   1  15
1.10  1  15
1.11  1  15
1.12  1  15
1.13  1  15
1.14  1  15
2     2  20
2.1   2  20
2.2   2  20
2.3   2  20
2.4   2  20
2.5   2  20
2.6   2  20
2.7   2  20
2.8   2  20
2.9   2  20
2.10  2  20
2.11  2  20
2.12  2  20
2.13  2  20
2.14  2  20
2.15  2  20
2.16  2  20
2.17  2  20
2.18  2  20
2.19  2  20

再次表示抱歉,如果这是一个愚蠢的问题,请先谢谢您的帮助.

Again, sorry if this is a stupid question and thanks in advance for any help.

推荐答案

我无法复制您的索引,但是我可以使用np.repeat复制您的值,实际上非常容易. /p>

I can't replicate your index, but I can replicate your values, using np.repeat, quite easily in fact.

v = df.values
df = pd.DataFrame(v.repeat(v[:, -1], axis=0), columns=df.columns)


如果您想要确切的索引(尽管我看不到为什么要这么做),则需要groupby操作-


If you want the exact index (although I can't see why you'd need to), you'd need a groupby operation -

def f(x):
    return x.astype(str) + '.' + np.arange(len(x)).astype(str)

idx = df.groupby('id').id.apply(f).values

idx分配给df的索引-

df.index = idx

这篇关于根据数字和组ID扩展 pandas 数据框行(Python 3).的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆