在 pandas 数据帧中计算某些词的出现次数 [英] Count occurrences of certain words in pandas dataframe
本文介绍了在 pandas 数据帧中计算某些词的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想计算一个数据帧中某些字的出现次数。我知道使用str.contains
I want to count number of occurrences of certain words in a data frame. I know using "str.contains"
a = df2[df2['col1'].str.contains("sample")].groupby('col2').size()
n = a.apply(lambda x: 1).sum()
目前我正在使用上述代码。有没有一种匹配正则表达式并获得事件计数的方法?在我的情况下,我有一个大数据框,我想匹配大约100个字符串。
Currently I'm using the above code. Is there a method to match regular expression and get the count of occurrences? In my case I have a large dataframe and I want to match around 100 strings.
推荐答案
str .contains
方法接受正则表达式:
The str.contains
method accepts a regular expression:
Definition: df.words.str.contains(self, pat, case=True, flags=0, na=nan)
Docstring:
Check whether given pattern is contained in each string in the array
Parameters
----------
pat : string
Character sequence or regular expression
case : boolean, default True
If True, case sensitive
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
na : default NaN, fill value for missing values.
例如:
In [11]: df = pd.DataFrame(['hello', 'world'], columns=['words'])
In [12]: df
Out[12]:
words
0 hello
1 world
In [13]: df.words.str.contains(r'[hw]')
Out[13]:
0 True
1 True
Name: words, dtype: bool
In [14]: df.words.str.contains(r'he|wo')
Out[14]:
0 True
1 True
Name: words, dtype: bool
要计算出现的结果,您可以将此布尔值系列:
To count the occurences you can just sum this boolean Series:
In [15]: df.words.str.contains(r'he|wo').sum()
Out[15]: 2
In [16]: df.words.str.contains(r'he').sum()
Out[16]: 1
这篇关于在 pandas 数据帧中计算某些词的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文