通过随机采样其他列数据来创建新列 [英] Create new column by random sampling of other columns data

查看:66
本文介绍了通过随机采样其他列数据来创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过从其余列中随机采样数据来创建新列.

I'd like to create a new column by randomly sampling data from the remaining columns.

考虑如下具有"N"列的数据框:

Consider a dataframe with "N" columns as follows:

|---------------------|------------------|---------------------|
|      Column 1       |     Column 2     |      Column N       |
|---------------------|------------------|---------------------|
|          0.37       |         0.8      |          0.0        |
|---------------------|------------------|---------------------|
|          0.0        |         0.0      |          0.8        |
|---------------------|------------------|---------------------|

结果数据框应该看起来像

The resulting dataframe should look like

|---------------------|------------------|---------------------|---------------|
|      Column 1       |     Column 2     |      Column N       |     Sampled   |
|---------------------|------------------|---------------------|---------------|
|          0.37       |         0.8      |          0.0        |       0.8     |
|---------------------|------------------|---------------------|---------------|
|          0.0        |         0.0      |          B          |        B      |
|---------------------|------------------|---------------------|---------------|
|          A          |         5        |          0.8        |        A      |
|---------------------|------------------|---------------------|---------------|

通过随机选择"N"列的相应条目之一来创建采样"列的条目.例如,从第2列中选择了"0.8",从第N列中选择了"B",依此类推.

The "Sampled" column's entries are created by randomly choosing one of the corresponding entries of the "N" columns. For example, "0.8" was chosen from Column 2, "B" from Column N, and so on.

df.sample(axis=1)只需选择一列并返回它.这不是我想要的.

df.sample(axis=1) simply chooses one column and returns it. This is NOT what I want.

最快的方法是什么?该方法必须高效,因为原始数据帧很大,有很多行和列.

What would be the fastest way to achieve this? The method needs to be efficient as the original dataframe is big with lots of rows and columns.

推荐答案

熊猫基地lookup + sample

s=df.columns.to_series().sample(len(df),replace = True)
df['New']=df.lookup(df.index,s)
df
Out[177]: 
  Column1  Column2 ColumnN  New
0    0.37      0.8     0.0  0.8
1     0.0      0.0       B    B
2       A      5.0     0.8    A

这篇关于通过随机采样其他列数据来创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆