如何计算作为字符串列表的 pandas 列中的值? [英] How do I count the values from a pandas column which is a list of strings?
问题描述
我有一个数据框列,它是一个字符串列表:
df['colors']0 ['蓝色','绿色','棕色']1 []2 ['绿色','红色','蓝色']3 ['紫色']4 ['棕色']
我想得到的是:
'blue' 2'绿色' 2'棕色' 2'红色' 1'紫色' 1[] 1
在不知道自己在做什么的情况下,我什至数了整列中的字符
b 5[ 5] 5
等
我认为这很酷,但解决这个问题的方法让我望而却步
解决方案
最佳选择:df.colors.explode().dropna().value_counts()
.
但是,如果您还想对空列表进行计数 ([]
),请使用类似的 Method-1.B/C
Quang Hoang 在评论中提出的建议.
您可以使用以下两种方法中的任何一种.
- Method-1:单独使用 Pandas 方法 ⭐⭐⭐<块引用>
爆炸 -->dropna -->value_counts
- 方法 2:使用
list.extend -->pd.Series.value_counts
## Method-1# A. 如果你不想为空 [] 计数df.colors.explode().dropna().value_counts()# B. 如果你想对空 [] 进行计数(归类为 NaN)df.colors.explode().value_counts(dropna=False) # 返回 [] 作为 Nan# C. 如果你想对空 [] 进行计数(归类为 [])df.colors.explode().fillna('[]').value_counts() # 返回 [] 作为 []## 方法 2颜色 = []_ = [colors.extend(e) for e in df.colors if len(e)>0]pd.Series(colors).value_counts()
输出:
green 2蓝色 2棕色 2红色 1紫色 1# NaN 1 ## 对于 Method-1.B# [] 1 ## 对于 Method-1.C数据类型:int64
虚拟数据
将pandas导入为pddf = pd.DataFrame({'colors':[['blue','green','brown'],[],['绿色','红色','蓝色'],['紫色的'],['棕色的']]})
I have a dataframe column which is a list of strings:
df['colors']
0 ['blue','green','brown']
1 []
2 ['green','red','blue']
3 ['purple']
4 ['brown']
What I'm trying to get is:
'blue' 2
'green' 2
'brown' 2
'red' 1
'purple' 1
[] 1
Without knowing what I'm doing I even managed to count the characters in the entire column
b 5
[ 5
] 5
etc.
which I think was pretty cool, but the solution to this escapes me
Solution
Best option: df.colors.explode().dropna().value_counts()
.
However, if you also want to have counts for empty lists ([]
), use Method-1.B/C
similar to what was suggested by Quang Hoang in the comments.
You can use any of the following two methods.
- Method-1: Use pandas methods alone ⭐⭐⭐
explode --> dropna --> value_counts
- Method-2: Use
list.extend --> pd.Series.value_counts
## Method-1
# A. If you don't want counts for empty []
df.colors.explode().dropna().value_counts()
# B. If you want counts for empty [] (classified as NaN)
df.colors.explode().value_counts(dropna=False) # returns [] as Nan
# C. If you want counts for empty [] (classified as [])
df.colors.explode().fillna('[]').value_counts() # returns [] as []
## Method-2
colors = []
_ = [colors.extend(e) for e in df.colors if len(e)>0]
pd.Series(colors).value_counts()
Output:
green 2
blue 2
brown 2
red 1
purple 1
# NaN 1 ## For Method-1.B
# [] 1 ## For Method-1.C
dtype: int64
Dummy Data
import pandas as pd
df = pd.DataFrame({'colors':[['blue','green','brown'],
[],
['green','red','blue'],
['purple'],
['brown']]})
这篇关于如何计算作为字符串列表的 pandas 列中的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!