如何计算作为字符串列表的 pandas 列中的值? [英] How do I count the values from a pandas column which is a list of strings?

查看:43
本文介绍了如何计算作为字符串列表的 pandas 列中的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框列,它是一个字符串列表:

df['colors']0 ['蓝色','绿色','棕色']1 []2 ['绿色','红色','蓝色']3 ['紫色']4 ['棕色']

我想得到的是:

'blue' 2'绿色' 2'棕色' 2'红色' 1'紫色' 1[] 1

在不知道自己在做什么的情况下,我什至数了整列中的字符

b 5[ 5] 5

我认为这很酷,但解决这个问题的方法让我望而却步

解决方案

解决方案

最佳选择:df.colors.explode().dropna().value_counts().

但是,如果您还想对空列表进行计数 ([]),请使用类似的 Method-1.B/CQuang Hoang 在评论中提出的建议.

您可以使用以下两种方法中的任何一种.

  • Method-1:单独使用 Pandas 方法 ⭐⭐⭐<块引用>

    爆炸 -->dropna -->value_counts

  • 方法 2:使用 list.extend -->pd.Series.value_counts

## Method-1# A. 如果你不想为空 [] 计数df.colors.explode().dropna().value_counts()# B. 如果你想对空 [] 进行计数(归类为 NaN)df.colors.explode().value_counts(dropna=False) # 返回 [] 作为 Nan# C. 如果你想对空 [] 进行计数(归类为 [])df.colors.explode().fillna('[]').value_counts() # 返回 [] 作为 []## 方法 2颜色 = []_ = [colors.extend(e) for e in df.colors if len(e)>0]pd.Series(colors).value_counts()

输出:

green 2蓝色 2棕色 2红色 1紫色 1# NaN 1 ## 对于 Method-1.B# [] 1 ## 对于 Method-1.C数据类型:int64

虚拟数据

将pandas导入为pddf = pd.DataFrame({'colors':[['blue','green','brown'],[],['绿色','红色','蓝色'],['紫色的'],['棕色的']]})

I have a dataframe column which is a list of strings:

df['colors']

0              ['blue','green','brown']
1              []
2              ['green','red','blue']
3              ['purple']
4              ['brown']

What I'm trying to get is:

'blue' 2
'green' 2
'brown' 2
'red' 1
'purple' 1
[] 1

Without knowing what I'm doing I even managed to count the characters in the entire column

b 5
[ 5
] 5 

etc.

which I think was pretty cool, but the solution to this escapes me

解决方案

Solution

Best option: df.colors.explode().dropna().value_counts().

However, if you also want to have counts for empty lists ([]), use Method-1.B/C similar to what was suggested by Quang Hoang in the comments.

You can use any of the following two methods.

  • Method-1: Use pandas methods alone ⭐⭐⭐

    explode --> dropna --> value_counts

  • Method-2: Use list.extend --> pd.Series.value_counts

## Method-1
# A. If you don't want counts for empty []
df.colors.explode().dropna().value_counts() 

# B. If you want counts for empty [] (classified as NaN)
df.colors.explode().value_counts(dropna=False) # returns [] as Nan

# C. If you want counts for empty [] (classified as [])
df.colors.explode().fillna('[]').value_counts() # returns [] as []

## Method-2
colors = []
_ = [colors.extend(e) for e in df.colors if len(e)>0]
pd.Series(colors).value_counts()

Output:

green     2
blue      2
brown     2
red       1
purple    1
# NaN     1  ## For Method-1.B
# []      1  ## For Method-1.C
dtype: int64

Dummy Data

import pandas as pd

df = pd.DataFrame({'colors':[['blue','green','brown'],
                             [],
                             ['green','red','blue'],
                             ['purple'],
                             ['brown']]})

这篇关于如何计算作为字符串列表的 pandas 列中的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆