根据列值复制DataFrame中的行 [英] Duplicating rows in a DataFrame based on column value

查看:360
本文介绍了根据列值复制DataFrame中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我正在使用的一组示例数据:

Below is a set of sample data I am working with:

sample_dat = pd.DataFrame(
    np.array([[1,0,1,1,1,5],
              [0,0,0,0,1,3],
              [1,0,0,0,1,1],
              [1,0,0,1,1,1],
              [1,0,0,0,1,1],
              [1,1,0,0,1,1]]),
    columns=['var1','var2','var3','var4','var5','cnt']
)

我需要更改数据,以便根据最后一列中的值复制行.具体来说,我希望它可以根据cnt列中的值进行复制.

I need to change the data so the rows are duplicated according to the value in the last column. Specifically I wish for it to do be duplicated based on the value in the cnt column.

我的搜索产生了很多有关融化,分裂和其他内容的信息.我认为我所希望的是非常基本的.另请注意,我在第一列中可能会使用某种ID,该ID可以是整数或字符串.

My search yielded lots of stuff about melts, splits, and other stuff. I think what I am looking for is very basic, hopefully. Please also note that I will likely have some kind of an id in the first column that will be either an integer or string.

例如,第一条记录将再重复4次.第二条记录将重复两次.

For example, the first record will be duplicated 4 more times. The second record will be duplicated twice more.

以下是我用语法手动执行DataFrame的示例:

An example of what the DataFrame would look like if I were manually doing it with syntax is below:

sample_dat2 = pd.DataFrame(
    np.array([[1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [1,0,1,1,1,5],
              [0,0,0,0,1,3],
              [0,0,0,0,1,3],
              [0,0,0,0,1,3],
              [1,0,0,0,1,1],
              [1,0,0,1,1,1],
              [1,0,0,0,1,1],
              [1,1,0,0,1,1]]),
    columns=['var1','var2','var3','var4','var5','cnt']
)

推荐答案

您可以使用 numpy.repeat 和索引一起从列中返回确定重复次数的值数组.

You can use numpy.repeat along with indexing to return an array of values from the column that determines the number of repetitions.

import numpy as np
import pandas as pd

arr = np.array(
    [[1,0,1,1,1,5],
     [0,0,0,0,1,3],
     [1,0,0,0,1,1],
     [1,0,0,1,1,1],
     [1,0,0,0,1,1],
     [1,1,0,0,1,1]]
    )

df = pd.DataFrame(
    np.repeat(arr, arr[:,5], axis=0),
    columns=['var1','var2','var3','var4','var5','cnt']
    )

print(df)
#     var1  var2  var3  var4  var5  cnt
# 0      1     0     1     1     1    5
# 1      1     0     1     1     1    5
# 2      1     0     1     1     1    5
# 3      1     0     1     1     1    5
# 4      1     0     1     1     1    5
# 5      0     0     0     0     1    3
# 6      0     0     0     0     1    3
# 7      0     0     0     0     1    3
# 8      1     0     0     0     1    1
# 9      1     0     0     1     1    1
# 10     1     0     0     0     1    1
# 11     1     1     0     0     1    1

这篇关于根据列值复制DataFrame中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆