如何替换大于某个阈值的数据帧的每一行中的前n个元素 [英] How to replace the first n elements in each row of a dataframe that are larger than a certain threshold

查看:105
本文介绍了如何替换大于某个阈值的数据帧的每一行中的前n个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的数据框只包含数字(下面显示的仅仅是为了演示的目的)。我的目标是在数据帧的每一行中替换大于某个值 val 的第一个 n



举个例子:



我的数据框可能如下所示:

  c1 c2 c3 c4 
0 38 10 1 8
1 44 12 17 46
2 13 6 2 7
3 9 16 13 26

如果我现在选择 n = 2 (替换次数)和 val = 10 ,我想要的输出如下所示:

  c1 c2 c3 c4 
0 0 10 1 8
1 0 0 17 46
2 0 6 2 7
3 9 0 0 26

在第一行中,只有一个值大于 val 所以只有一个被替换,在第二行所有的值都大于 val ,但只有前两个可以替换。第3行和第4行的模拟(请注意,不仅前两列受到影响,而且可以是任何列中的前两列)。



A直接和非常丑的实现可能如下所示:

  import numpy as np 
import pandas as pd

np.random.seed(1)

col1 = [np.random.randint(1,50)for x in xrange(4)]
col2 = [np。对于xrange(4)中的ti,random.randint(1,50)]
col3 = [np.random.randint(1,50)for x in xrange(4)]
col4 = [np。对于xrange(4)中的ti的random.randint(1,50)]

df = pd.DataFrame({'c1':col1,'c2':col2,'c3':col3,' c4':col4})

val = 10
n = 2

for ind,row in df.iterrows():

re = 0

为indi,vali为枚举(行):
如果vali> val:
df.iloc [ind,indi] = 0
re + = 1
如果re == n:
break

这可行,但我相信有更有效的方法来做到这一点。有任何想法吗?

解决方案

您可以编写自己的一些奇怪的功能,并使用 axis = 1

 code> def f(x,n,m):
y = x.copy()
y [y [y> m] .iloc [:n] .index] = 0
return y

在[380]中:df
输出[380]:
c1 c2 c3 c4
0 38 10 1 8
1 44 12 17 46
2 13 6 2 7
3 9 16 13 26

在[381]:df .apply(f,axis = 1,n = 2,m = 10)
输出[381]:
c1 c2 c3 c4
0 0 10 1 8
1 0 0 17 46
2 0 6 2 7
3 9 0 0 26

注意: y = x.copy()需要制作该系列的副本。如果你需要改变你的价值观,你可以省略那一行。您需要额外的 y ,因为切片时您将获得不是原始对象的副本。


I have a huge dataframe that contains only numbers (the one I show below is just for demonstration purposes). My goal is to replace in each row of the dataframe the first n numbers that are larger than a certain value val by 0.

To give an example:

My dataframe could look like this:

   c1  c2  c3  c4
0  38  10   1   8
1  44  12  17  46
2  13   6   2   7
3   9  16  13  26

If I now choose n = 2 (number of replacements) and val = 10, my desired output would look like this:

   c1  c2  c3  c4
0   0  10   1   8
1   0   0  17  46
2   0   6   2   7
3   9   0   0  26

In the first row, only one value is larger than val so only one gets replaced, in the second row all values are larger than val but only the first two can be replaced. Analog for rows 3 and 4 (please note that not only the first two columns are affected but the first two values in a row which can be in any column).

A straightforward and very ugly implementation could look like this:

import numpy as np
import pandas as pd

np.random.seed(1)

col1 = [np.random.randint(1, 50) for ti in xrange(4)]
col2 = [np.random.randint(1, 50) for ti in xrange(4)]
col3 = [np.random.randint(1, 50) for ti in xrange(4)]
col4 = [np.random.randint(1, 50) for ti in xrange(4)]

df = pd.DataFrame({'c1': col1, 'c2': col2, 'c3': col3, 'c4': col4})

val = 10
n = 2

for ind, row in df.iterrows():
    # number of replacements
    re = 0

    for indi, vali in enumerate(row):
        if vali > val:
            df.iloc[ind, indi] = 0
            re += 1
            if re == n:
                break

That works but I am sure that there are much more efficient ways of doing this. Any ideas?

解决方案

You could write your own a bit weird function and use apply with axis=1:

def f(x, n, m):
    y = x.copy()
    y[y[y > m].iloc[:n].index] = 0
    return y

In [380]: df
Out[380]:
   c1  c2  c3  c4
0  38  10   1   8
1  44  12  17  46
2  13   6   2   7
3   9  16  13  26

In [381]: df.apply(f, axis=1, n=2, m=10)
Out[381]:
   c1  c2  c3  c4
0   0  10   1   8
1   0   0  17  46
2   0   6   2   7
3   9   0   0  26

Note: y = x.copy() needs to make a copy of the series. If you need to change your values inplace you could omit that line. You need extra y because with slicing you'll get a copy not the original object.

这篇关于如何替换大于某个阈值的数据帧的每一行中的前n个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆