根据列值复制DataFrame中的行 [英] Duplicating rows in a DataFrame based on column value
问题描述
以下是我正在使用的一组示例数据:
Below is a set of sample data I am working with:
sample_dat = pd.DataFrame(
np.array([[1,0,1,1,1,5],
[0,0,0,0,1,3],
[1,0,0,0,1,1],
[1,0,0,1,1,1],
[1,0,0,0,1,1],
[1,1,0,0,1,1]]),
columns=['var1','var2','var3','var4','var5','cnt']
)
我需要更改数据,以便根据最后一列中的值复制行.具体来说,我希望它可以根据cnt
列中的值进行复制.
I need to change the data so the rows are duplicated according to the value in the last column. Specifically I wish for it to do be duplicated based on the value in the cnt
column.
我的搜索产生了很多有关融化,分裂和其他内容的信息.我认为我所希望的是非常基本的.另请注意,我在第一列中可能会使用某种ID,该ID可以是整数或字符串.
My search yielded lots of stuff about melts, splits, and other stuff. I think what I am looking for is very basic, hopefully. Please also note that I will likely have some kind of an id in the first column that will be either an integer or string.
例如,第一条记录将再重复4次.第二条记录将重复两次.
For example, the first record will be duplicated 4 more times. The second record will be duplicated twice more.
以下是我用语法手动执行DataFrame
的示例:
An example of what the DataFrame
would look like if I were manually doing it with syntax is below:
sample_dat2 = pd.DataFrame(
np.array([[1,0,1,1,1,5],
[1,0,1,1,1,5],
[1,0,1,1,1,5],
[1,0,1,1,1,5],
[1,0,1,1,1,5],
[0,0,0,0,1,3],
[0,0,0,0,1,3],
[0,0,0,0,1,3],
[1,0,0,0,1,1],
[1,0,0,1,1,1],
[1,0,0,0,1,1],
[1,1,0,0,1,1]]),
columns=['var1','var2','var3','var4','var5','cnt']
)
推荐答案
您可以使用 numpy.repeat
和索引一起从列中返回确定重复次数的值数组.
You can use numpy.repeat
along with indexing to return an array of values from the column that determines the number of repetitions.
import numpy as np
import pandas as pd
arr = np.array(
[[1,0,1,1,1,5],
[0,0,0,0,1,3],
[1,0,0,0,1,1],
[1,0,0,1,1,1],
[1,0,0,0,1,1],
[1,1,0,0,1,1]]
)
df = pd.DataFrame(
np.repeat(arr, arr[:,5], axis=0),
columns=['var1','var2','var3','var4','var5','cnt']
)
print(df)
# var1 var2 var3 var4 var5 cnt
# 0 1 0 1 1 1 5
# 1 1 0 1 1 1 5
# 2 1 0 1 1 1 5
# 3 1 0 1 1 1 5
# 4 1 0 1 1 1 5
# 5 0 0 0 0 1 3
# 6 0 0 0 0 1 3
# 7 0 0 0 0 1 3
# 8 1 0 0 0 1 1
# 9 1 0 0 1 1 1
# 10 1 0 0 0 1 1
# 11 1 1 0 0 1 1
这篇关于根据列值复制DataFrame中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!