如何将数据帧按行和按列随机化(或置换)? [英] How to randomize (or permute) a dataframe rowwise and columnwise?

查看:65
本文介绍了如何将数据帧按行和按列随机化(或置换)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框(df1).

I have a dataframe (df1) like this.

     f1   f2   f3   f4   f5
d1   1    0    1    1    1  
d2   1    0    0    1    0
d3   0    0    0    1    1
d4   0    1    0    0    1

d1 ... d4列是行名,f1 ... f5行是列名.

The d1...d4 column is the rowname, the f1...f5 row is the columnname.

要执行sample(df1),我得到一个新的数据帧,其计数与df1相同,为1.因此,在整个数据帧中保留1的计数,但在每一行或每一列中均保留1.

To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.

是否可以按行或按列进行随机化?

Is it possible to do the randomization row-wise or column-wise?

我想为每列按列随机分配df1,即每列1的数量保持不变.并且每列至少需要更改一次.例如,我可能会有这样的随机df2 :(请注意,每列1的计数保持不变,但每行1的计数却不同.

I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.

     f1   f2   f3   f4   f5
d1   1    0    0    0    1  
d2   0    1    0    1    1
d3   1    0    0    1    1
d4   0    0    1    1    0

同样,我也想为每行随机分配df1,即编号.每行中的1保持不变,并且每行都需要更改(但更改后的条目数可以不同).例如,随机的df3可能是这样的:

Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:

     f1   f2   f3   f4   f5
d1   0    1    1    1    1  <- two entries are different
d2   0    0    1    0    1  <- four entries are different
d3   1    0    0    0    1  <- two entries are different
d4   0    0    1    0    1  <- two entries are different

PS.非常感谢加文·辛普森(Gavin Simpson),乔里斯·梅斯(Joris Meys)和蔡斯(Chase)的回答,这是我之前对随机分配两列问题的回答.

PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.

推荐答案

给出R data.frame:

Given the R data.frame:

> df1
  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0

按行随机播放:

> df2 <- df1[sample(nrow(df1)),]
> df2
  a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0

默认情况下,sample()随机重新排序作为第一个参数传递的元素.这意味着默认大小是传递的数组的大小.将参数replace=FALSE(默认值)传递给sample(...)可确保无需替换即可进行采样,从而实现按行随机播放.

By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.

按列随机播放:

> df3 <- df1[,sample(ncol(df1))]
> df3
  c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0

这篇关于如何将数据帧按行和按列随机化(或置换)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆