从数据框列中随机选择行 [英] Randomly selecting rows from dataframe column

查看：141 发布时间：2020/10/17 2:49:09 python pandas dataframe

本文介绍了从数据框列中随机选择行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于给定的dataframe列，我想随机选择大约60％并添加到新列，将剩余的40％添加到另一列，将40％列乘以（-1），然后创建一个新列像这样将它们合并在一起：

For a given dataframe column, I would like to randomly select roughly 60% and add to a new column, add the remaining 40% to another column, multiply the 40% column by (-1), and create a new column that merges these back together like so:

dict0 = {'x1': [1,2,3,4,5,6]}
data = pd.DataFrame(dict0)### 

dict1 = {'x1': [1,2,3,4,5,6],'x2': [1,'nan',3,'nan',5,6],'x3': ['nan',2,'nan',4,'nan','nan']}
data = pd.DataFrame(dict1)### 


dict2 = {'x1': [1,2,3,4,5,6],'x2': [1,'nan',3,'nan',5,6],'x3': ['nan',-2,'nan',-4,'nan','nan']}
data = pd.DataFrame(dict2)### 

dict3 = {'x1': [1,2,3,4,5,6],'x2': [1,'nan',3,'nan',5,6],'x3': ['nan',-2,'nan',-   4,'nan','nan'],,'x4': [1,-2,3,-4,5,6]}
data = pd.DataFrame(dict3)###

推荐答案

虽然第一个答案提出了一种优雅的解决方案，但它扩展了规定的要求选择大约60％行。问题在于它不能保证60/40的分配。使用概率，选择的样本可能很容易全部为 1 或全部为 -1 ，实际上选择了所有或否行，而不是大约60％。

While the first answer proposes an elegant solution, it stretches the stated requirement to select roughly 60% of the rows. The problem is that it doesn't guarantee a 60/40 distribution. Using probabilities, the selected samples could by chance easily be all 1 or all -1, in effect selecting all or no rows, not roughly 60%.

使用较大的数据框，发生这种情况的机会明显减少，但是它永远不会为零，并且在使用提供的示例数据进行尝试时会立即可见。

The chance of this to occur obviously decreases with larger dataframes, but it's never zero and is immediately visible when trying it with the provided example data.

如果这与您相关，请看一下这段代码，保证行比率为60/40。

If this is relevant to you, take a look at this code, which does guarantee a 60/40 ratio of rows.

indices = np.random.choice(len(data), size=int(0.4 * len(data)), replace=False)
data['x4'] = np.where(data.index.isin(indices), -1 * data['x1'], data['x1'])

更新：一个回答，提出 df.sample 。实际上，它可以使您更加优雅地表达上述内容：

Update: One answer to your follow-up question proposes df.sample. Indeed, it lets you express the above much more elegantly:

indices = data.sample(frac=0.4).index
data['x4'] = np.where(data.index.isin(indices), -data['x1'], data['x1'])

这篇关于从数据框列中随机选择行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从数据框列中随机选择行 [英] Randomly selecting rows from dataframe column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从数据框列中随机选择行 [英] Randomly selecting rows from dataframe column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭