有条件地设置 pandas 数据框列值 [英] Conditionally set pandas dataframe column values
问题描述
此问题与以下要求完全相同,但又有一个问题,
This question is exactly as the following request, with one more twist,
- Pandas: Replacing column values in dataframe
- Conditional Substitution of values in pandas dataframe columns
因此,我想设置或有条件地设置pandas dataframe列值.增加的复杂性是,我无需使用字符串常量(df['data1']
)来寻址数据帧列,而是需要使用变量(df[var_for_data1]
)来寻址它们,因为构造了我的df
列名.
So, I want to set, or conditionally set pandas dataframe column values. The added complexity is, instead of addressing the dataframe columns with string constant (df['data1']
), I need to address them with variables (df[var_for_data1]
), becaus my df
column names are constructed.
以下是简化了的示例来解释我想要的内容:
Here is the much simplified example to explain what I want:
df = pd.DataFrame({'data1': np.random.randn(100),'data2': np.random.randn(100)})
print(df.head())
Col = 'data1'
print(df[Col].head())
df.data1 = df.data1 +.1
print(df[Col].head())
# so far so good, now how to do above with variable dataframe column name `Col`
#df.Col = df.Col + .1
问题出在代码中,到目前为止,现在还不错,现在如何在上面使用可变数据框列名Col
进行操作.
The question is in the code, so far so good, now how to do above with variable dataframe column name Col
.
下一个问题是如何向上述分配中添加条件,比如说要这样做if df.data1 >=.25 and df.data1 <= .35:
.当然,可以使用可变数据框列名称Col
来表达它.
The next question is how to add a condition to the above assignment, say to do it if df.data1 >=.25 and df.data1 <= .35:
. Of course, expressing it using the variable dataframe column name Col
.
推荐答案
您可以使用方括号使用字符串而不是属性来访问列名,我也强烈建议您放弃使用按属性访问列的习惯因为这会导致混乱的行为,例如,如果您具有列名sum
而您执行df.sum
则会返回方法sum
而不是列'sum'
的地址.
You can use square brackets to access a column name using the string rather than as an attribute, I also strongly recommend that you ditch this habit of accessing columns by attribute as this can lead to confusing behaviour such as if you have a column name sum
and you do df.sum
will return the address of the method sum
rather than the column 'sum'
.
所以df[Col] = df[Col] + 1
就可以工作.
关于第二个问题,要将数组与标量值进行比较,请分别对and
,or
和not
使用按位运算符&
,|
和~
,它们将返回一个数组布尔值,要使用多个条件,由于运算符优先级,您需要将条件包装在括号中,因为&
的优先级高于比较运算符.
Regarding your 2nd question, to compare an array against a scalar value use the bitwise operators &
, |
and ~
for and
, or
and not
respectively these will return an array of boolean values, to use more than 1 condition you need to wrap the conditions in parentheses due to operator precedence as &
has higher precedence than the comparison operators.
所以:
df[(df[col] >=.25) & (df[col] <= .35)]
应该起作用,这会将df只屏蔽同时满足两个条件的行
should work, this will mask the df to only the rows where both conditions are met
这篇关于有条件地设置 pandas 数据框列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!