合并大 pandas 中的行 [英] Combining rows in pandas
问题描述
我有一个DataFrame,其索引名为city_id
城市的索引,格式为[city],[state]
(例如,new york,ny
的列中包含整数计数.问题是我在同一城市有多个行,并且想要通过添加列值来折叠共享city_id
的行.我查看了groupby()
,但如何将其应用于此问题尚不是很明显.
I have a DataFrame with an index called city_id
of cities in the format [city],[state]
(e.g., new york,ny
containing integer counts in the columns. The problem is that I have multiple rows for the same city, and I want to collapse the rows sharing a city_id
by adding their column values. I looked at groupby()
but it wasn't immediately obvious how to apply it to this problem.
一个例子:我想改变这个:
An example: I'd like to change this:
city_id val1 val2 val3
houston,tx 1 2 0
houston,tx 0 0 1
houston,tx 2 1 1
对此:
city_id val1 val2 val3
houston,tx 3 3 2
如果有大约10-20k行.
if there are ~10-20k rows.
推荐答案
从
>>> df
val1 val2 val3
city_id
houston,tx 1 2 0
houston,tx 0 0 1
houston,tx 2 1 1
somewhere,ew 4 3 7
我可能会
>>> df.groupby(df.index).sum()
val1 val2 val3
city_id
houston,tx 3 3 2
somewhere,ew 4 3 7
或
>>> df.reset_index().groupby("city_id").sum()
val1 val2 val3
city_id
houston,tx 3 3 2
somewhere,ew 4 3 7
第一种方法将索引值(在这种情况下为city_id
值)传递给groupby
,并告诉其将其用作组键,第二种方法重置索引,然后选择city_id
列.有关更多示例,请参见文档的本节.请注意,DataFrameGroupBy
对象中还有许多其他方法:
The first approach passes the index values (in this case, the city_id
values) to groupby
and tells it to use those as the group keys, and the second resets the index and then selects the city_id
column. See this section of the docs for more examples. Note that there are lots of other methods in the DataFrameGroupBy
objects, too:
>>> df.groupby(df.index)
<pandas.core.groupby.DataFrameGroupBy object at 0x1045a1790>
>>> df.groupby(df.index).max()
val1 val2 val3
city_id
houston,tx 2 2 1
somewhere,ew 4 3 7
>>> df.groupby(df.index).mean()
val1 val2 val3
city_id
houston,tx 1 1 0.666667
somewhere,ew 4 3 7.000000
这篇关于合并大 pandas 中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!