如何在数据框中创建新列,这将是其他列和条件函数的功能,而无需使用for循环遍历行? [英] How to create a new column in dataframe, which will be a function of other columns and conditionals without iteratng over the rows with a for loop?

查看:38
本文介绍了如何在数据框中创建新列,这将是其他列和条件函数的功能,而无需使用for循环遍历行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相对较大的数据框(8737行和16列,包括所有变量类型,字符串,整数,布尔值等),我想根据一个方程式和一些条件创建一个新列.基本上,我想遍历一个特定的列,获取其值,然后乘以,求和等.创建一个新值,然后检查是否满足某些条件(> =或<到设置值).如果满足条件,那么我需要保留计算的输出,否则分配一个固定值.

I have a relatively large data frame (8737 rows and 16 columns of all variable types, strings, integers, booleans etc.) and I want to create a new column based on an equation and some conditionals. Basically, I want to iterate over one particular column, take its values and after multiplications, sums etc. create a new value which then I check if it satisfies some conditions (>= or < to a set value). If it satisfies the conditionals then I need to keep the output of the calculation, else assign a fixed value.

我这样做是通过for循环遍历整个数据集,这需要花费大量时间.我是python的新手,除了在没有for循环的情况下交替使用现有列之外,我在网上找不到任何类似的问题解决方案.

I am doing that by looping over the entire dataset with a for loop, which takes a huge amount of time. I am quite new to python and couldn't quite find any similar problem solution online, other than alternating existing columns without a for loop.

为了简单起见,我将这个数据帧称为df_test:

Lets say for the sake of simplicity I have this data frame called df_test:

          A         B         C          D    S
0  0.001568  0.321316 -0.269841   3.232037  5.0
1  1.926186 -1.111863 -0.387165   5.541699  NaN
2  2.110923 -0.403940 -0.029895  -9.688968  NaN
3  0.609391  1.697205 -1.827488  -1.273713  NaN
4 -0.577739  0.394475 -1.524400  16.505185  NaN
5  0.456884 -1.238733  0.453586  -4.868735  NaN

其中S是我需要计算的列,从设置值开始. S的下一个值我需要是S的上一个值,再加上诸如此类的一些计算:

where S is the column I need to calculate, starting from a set value. Next value of S I need to be the previous value of S plus some calculation like so:

df_test.S[1]=df_test.S[0]+df_test.D[1]*abs(df_test.C[1])*0.5

然后,应按条件评估此值.如果它大于等于例如10,则为它分配10(而不是计算),如果它小于或等于5,则为其分配5.

Then this value should be evaluated by a conditional. If it is larger than equal to, for example 10, then assign 10 to it (instead of the calculation) and if its less or equal to 5 then assign 5 to it.

我在数据集上使用了for循环,并为每个元素运行了所需的方程式.基本上它是这样的:

I use a for loop over the data set and for every element I run the equation that I need. Basically it works like this:

for i in range (1,df_test.shape[0]):
    df_test.S[i]=df_test.S[i-1]+df_test.D[i]*abs(df_test.C[i])*0.5
    if df_test.S[i]<5:
        df_test.S[i]=5
    elif df_test.S[i]>10:
        df_test.S[i]=10

此用于8737行的代码大约需要20分钟才能完成.

This code for 8737 rows takes around 20 mins to complete.

如果您需要任何说明,请问我.预先谢谢你.

If you need any clarifications, please ask me. Thank you in advance.

推荐答案

您可以分两步轻松地做到这一点:

You can do that really easily in two steps:

df.loc[1:, 'S'] = df.loc[1:, "D"] * 0.5 * df.loc[1:, "C"].abs()  # Computes the numerical expression you want
df["S"] = df["S"].cumsum() # Add the previous to the current item of S


# Then compute your `if` condition
df.loc[df["S"] < 5, 'S'] = 5
df.loc[df["S"] > 10, 'S'] = 10

==>没有for循环.

这篇关于如何在数据框中创建新列,这将是其他列和条件函数的功能,而无需使用for循环遍历行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆