从 pandas 系列列表中获取唯一值 [英] Get unique values from pandas series of lists

查看:72
本文介绍了从 pandas 系列列表中获取唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在DataFrame中有一列,其中包含类别列表.例如:

I have a column in DataFrame containing list of categories. For example:

0                                                    [Pizza]
1                                 [Mexican, Bars, Nightlife]
2                                  [American, New, Barbeque]
3                                                     [Thai]
4          [Desserts, Asian, Fusion, Mexican, Hawaiian, F...
6                                           [Thai, Barbeque]
7                           [Asian, Fusion, Korean, Mexican]
8          [Barbeque, Bars, Pubs, American, Traditional, ...
9                       [Diners, Burgers, Breakfast, Brunch]
11                                [Pakistani, Halal, Indian]

我正在尝试做两件事:

1)获得唯一的类别-我的方法是设置一个空集合,依次遍历序列并附加每个列表.

1) Get unique categories - My approach is have a empty set, iterate through series and append each list.

我的代码:

unique_categories = {'Pizza'}
for lst in restaurant_review_df['categories_arr']:
    unique_categories = unique_categories | set(lst)

这为我提供了列中所有列表中包含的一组唯一类别.

This give me a set of unique categories contained in all the lists in the column.

2)生成类别计数的饼图,每个餐厅可以属于多个类别.例如:餐厅11属于巴基斯坦,印度和清真食品类别.我的方法是再次遍历类别,然后再进行一系列迭代以获取计数.

2) Generate pie plot of category counts and each restaurant can belong to multiple categories. For example: restaurant 11 belongs to Pakistani, Indian and Halal categories. My approach is again iterate through categories and one more iteration through series to get counts.

有没有更简单或更优雅的方法?

Are there simpler or elegant ways of doing this?

谢谢.

推荐答案

使用带有explode

的pandas 0.25.0+更新

Update using pandas 0.25.0+ with explode

df['category'].explode().value_counts()

输出:

Barbeque       3
Mexican        3
Fusion         2
Thai           2
American       2
Bars           2
Asian          2
Hawaiian       1
New            1
Brunch         1
Pizza          1
Traditional    1
Pubs           1
Korean         1
Pakistani      1
Burgers        1
Diners         1
Indian         1
Desserts       1
Halal          1
Nightlife      1
Breakfast      1
Name: Places, dtype: int64

并进行绘图:

df['category'].explode().value_counts().plot.pie(figsize=(8,8))

输出:

适用于0.25.0之前的较旧版本的熊猫 试试:

For older verions of pandas before 0.25.0 Try:

df['category'].apply(pd.Series).stack().value_counts()

输出:

Mexican        3
Barbeque       3
Thai           2
Fusion         2
American       2
Bars           2
Asian          2
Pubs           1
Burgers        1
Traditional    1
Brunch         1
Indian         1
Korean         1
Halal          1
Pakistani      1
Hawaiian       1
Diners         1
Pizza          1
Nightlife      1
New            1
Desserts       1
Breakfast      1
dtype: int64

使用绘图:

df['category'].apply(pd.Series).stack().value_counts().plot.pie()

输出:

每个@coldspeed的评论

Per @coldspeed's comments

from itertools import chain
from collections import Counter

pd.DataFrame.from_dict(Counter(chain(*df['category'])), orient='index').sort_values(0, ascending=False)

输出:

Barbeque     3
Mexican      3
Bars         2
American     2
Thai         2
Asian        2
Fusion       2
Pizza        1
Diners       1
Halal        1
Pakistani    1
Brunch       1
Breakfast    1
Burgers      1
Hawaiian     1
Traditional  1
Pubs         1
Korean       1
Desserts     1
New          1
Nightlife    1
Indian       1

这篇关于从 pandas 系列列表中获取唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆