pandas :如何将函数应用于不同的列 [英] Pandas: How to apply a function to different columns
问题描述
假设这是我的功能:
def function(x):
return x.str.lower()
这是我的DataFrame(df)
And this is my DataFrame (df)
A B C D
0 1.67430 BAR 0.34380 FOO
1 2.16323 FOO -2.04643 BAR
2 0.19911 BAR -0.45805 FOO
3 0.91864 BAR -0.00718 BAR
4 1.33683 FOO 0.53429 FOO
5 0.97684 BAR -0.77363 BAR
我想将该功能仅应用于列B
和D
. (将其应用于完整的DataFrame并不是答案,因为这会在数字列中产生NaN值.)
I want to apply the function to just columns B
and D
. (Applying it to the full DataFrame isn't the answer as that produces NaN values in the numeric columns).
这是我的基本想法:df.apply(function, axis=1)
但是我无法理解如何选择不同的列来应用该功能.我尝试了通过数字位置,名称等进行所有索引的方式.
But I cannot fathom how to select distinct columns to apply the function to. I've tried all manner of indexing by numeric position, name, etc.
我花了很多时间阅读有关此内容的信息.这不是以下任何一项的直接副本:
I've spent quite a bit of time reading around this. This isn't a direct duplicate of any of these:
推荐答案
通过忽略axis
参数,只需从df中选择列,我们将按列而不是按行进行操作,这将随着您拥有的数量的增加而变大.行比列多:
Just subselect the columns from the df, by neglecting the axis
param we operate column-wise rather than row-wise which will be significantly as you have more rows than columns here:
df[['B','D']].apply(function)
这将针对每个列运行函数
this will run your func against each column
In [186]:
df[['B','D']].apply(function)
Out[186]:
B D
0 bar foo
1 foo bar
2 bar foo
3 bar bar
4 foo foo
5 bar bar
您还可以过滤df以仅获取字符串dtype列:
You can also filter the df to just get the string dtype columns:
In [189]:
df.select_dtypes(include=['object']).apply(function)
Out[189]:
B D
0 bar foo
1 foo bar
2 bar foo
3 bar bar
4 foo foo
5 bar bar
时间
按列与按列:
In [194]:
%timeit df.select_dtypes(include=['object']).apply(function, axis=1)
%timeit df.select_dtypes(include=['object']).apply(function)
100 loops, best of 3: 3.42 ms per loop
100 loops, best of 3: 2.37 ms per loop
但是对于较大的dfs(逐行),第一种方法的伸缩性会好得多
However for significantly larger dfs (row-wise) the first method will scale much better
这篇关于 pandas :如何将函数应用于不同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!