列出 pandas 数据框中每组的唯一值计数 [英] Listing unique value counts per groups in pandas dataframe
问题描述
我是熊猫和python的新手。
I am new to pandas and python.
我试图将项目按一列分组,并按组列出数据框中的信息。
I am trying to group items by one column and list the information from the data frame per group.
我的数据框:
B C D E F
1 Honda USA 2000 Washington New
2 Honda USA 2001 Salt Lake Used
3 Ford Canada 2005 Washington New
4 Toyota USA 2010 Ney York Used
5 Honda USA 2001 Salt Lake Used
6 Honda Canada 2011 Salt Lake Crashed
7 Ford Italy 2014 Rome New
我正在尝试按以下方式对数据框进行分组列 B
并列出多少 C
, D
, E
, F
列值在组 B
中。例如,我们看到在 B
列中有4个 Honda
,我将其分组在一起。然后我要列出以下信息-美国(3),加拿大(1),2000(1),2001(2),2011(1),华盛顿(1),盐湖(3) ,New(1),Used(2),Crashed(1)
并在B列中对每个组(汽车品牌)进行相同操作:
I am trying to group my dataframe by column B
and list how many C
, D
, E
, F
column values are in group B
. For example we see that in column B
there are 4 Honda
which I am grouping it together. Then I want to list the following information - USA(3), Canada(1), 2000(1),2001(2), 2011(1), Washington(1), Salt Lake(3), New(1), Used(2), Crashed(1)
and do the same per every group ( car make ) in column B:
Car Country Year City Condition
1 Honda(4) USA(3) 2000(1) Washington(1) New(1)
Canada(1) 2001(2) Salt Lake(3) Used(2)
2011(1) Crashed(1)
2 Ford(2) Canada(1) 2005(5) Washington(1) New(2)
Italy(1) 2014(1) Rome(1)
...
到目前为止我已经尝试过:
What I've tried so far:
df.groupby(['B'])
哪个给了我< pandas.core.groupby.generic。 DataFrameGroupBy对象位于0x11d559080>
此时,我还不是确定将列 B
分组后,我应该如何继续前进以取得预期的结果。
At this point, I am not sure how I should code moving on forward getting the desired results after grouping the column B
.
谢谢
推荐答案
您需要带有自定义函数的lambda函数,以便分别使用 Series.value_counts
然后将index的值与 Series
的计数值连接在一起:
You need lambda function with custom function for processing each column separately with Series.value_counts
and then join values of index to values of counts of Series
together:
def f(x):
x = x.value_counts()
y = x.index.astype(str) + '(' + x.astype(str) + ')'
return y.reset_index(drop=True)
df1 = df.groupby(['B']).apply(lambda x: x.apply(f)).reset_index(drop=True)
print (df1)
B C D E F
0 Ford(2) Italy(1) 2014(1) Washington(1) New(2)
1 NaN Canada(1) 2005(1) Rome(1) NaN
2 Honda(4) USA(3) 2001(2) Salt Lake(3) Used(2)
3 NaN Canada(1) 2011(1) Washington(1) Crashed(1)
4 NaN NaN 2000(1) NaN New(1)
5 Toyota(1) USA(1) 2010(1) Ney York(1) Used(1)
这篇关于列出 pandas 数据框中每组的唯一值计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!