如何使用 pandas 对与给定条件匹配的列中的值求和? [英] How do I sum values in a column that match a given condition using pandas?
问题描述
假设我有一个这样的列:
a b1 51 72 31 32 5
例如,我想总结 b
的值,其中 a = 1
.这会给我 5 + 7 + 3 = 15
.
我如何在熊猫中做到这一点?
这里的基本思想是选择要求和的数据,然后对它们求和.可以通过几种不同的方式来选择数据,其中一些如下所示.
布尔索引
可以说最常见的选择值的方法是使用 布尔索引.
使用此方法,您可以找出列 'a' 等于 1
的位置,然后对列 'b' 的相应行求和.您可以使用 loc
来处理行和列的索引:
布尔索引可以扩展到其他列.例如,如果 df
还包含一个列 'c' 并且我们想对 'b' 中的行求和,其中 'a' 是 1,'c' 是 2,我们会写:
df.loc[(df['a'] == 1) &(df['c'] == 2), 'b'].sum()
查询
另一种选择数据的方法是使用 query
筛选您感兴趣的行,选择列 'b' 然后求和:
同样,该方法可以扩展为对数据进行更复杂的选择:
df.query("a == 1 and c == 2")['b'].sum()
请注意,这比布尔索引方法要简洁一些.
Groupby
另一种方法是使用 groupby
根据列 'a' 中的值将 DataFrame 拆分为多个部分.然后,您可以对每个部分求和并提取 1 相加的值:
这种方法可能比使用布尔索引慢,但如果您想检查列 a
中其他值的总和,它很有用:
Suppose I have a column like so:
a b
1 5
1 7
2 3
1 3
2 5
I want to sum up the values for b
where a = 1
, for example. This would give me 5 + 7 + 3 = 15
.
How do I do this in pandas?
The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.
Boolean indexing
Arguably the most common way to select the values is to use Boolean indexing.
With this method, you find out where column 'a' is equal to 1
and then sum the corresponding rows of column 'b'. You can use loc
to handle the indexing of rows and columns:
>>> df.loc[df['a'] == 1, 'b'].sum()
15
The Boolean indexing can be extended to other columns. For example if df
also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:
df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()
Query
Another way to select the data is to use query
to filter the rows you're interested in, select column 'b' and then sum:
>>> df.query("a == 1")['b'].sum()
15
Again, the method can be extended to make more complicated selections of the data:
df.query("a == 1 and c == 2")['b'].sum()
Note this is a little more concise than the Boolean indexing approach.
Groupby
The alternative approach is to use groupby
to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:
>>> df.groupby('a')['b'].sum()[1]
15
This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a
:
>>> df.groupby('a')['b'].sum()
a
1 15
2 8
这篇关于如何使用 pandas 对与给定条件匹配的列中的值求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!