分割一列字符串并计算 pandas 的单词数 [英] Splitting a column of strings and counting the number of words with pandas
问题描述
id string
0 31672;0
1 31965;0
2 0;78464
3 51462
4 31931;0
我有那张桌子。我想将字符串表除以';',然后将其存储到新列中。最后一列的存储格式如下
Hi, I have that table. i would like to split the string table by the ';', and store it to the new column. the final column shold be like this
id string word_count
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
如果有人知道如何使用python会很好。
it would be nice if someone knows how to do it with python.
推荐答案
选项1
使用 str.split
+ str.len
-
df['word_count'] = df['string'].str.split(';').str.len()
df
string word_count
id
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
选项2
带有 str.count
-的聪明(高效,节省空间的解决方案)
Option 2
The clever (efficient, less space consuming) solution with str.count
-
df['word_count'] = df['string'].str.count(';') + 1
df
string word_count
id
0 31672;0 2
1 31965;0 2
2 0;78464 2
3 51462 1
4 31931;0 2
注意-这样即使是空字符串,也要将字数设为1(在这种情况下,请坚持使用选项1)。
Caveat - this would ascribe a word count of 1 even for an empty string (in which case, stick with option 1).
如果希望每个单词都占据一个新列,有一种使用列出
的快速简单的方法,将拆分加载到新的数据框中,并使用<$ c将新的数据框与原始数据连接起来$ c> concat -
If you want each word occupying a new column, there's a quick and simple way using tolist
, loading the splits into a new dataframe, and concatenating the new dataframe with the original using concat
-
v = pd.DataFrame(df['string'].str.split(';').tolist())\
.rename(columns=lambda x: x + 1)\
.add_prefix('string_')
pd.concat([df, v], 1)
string word_count string_1 string_2
id
0 31672;0 2 31672 0
1 31965;0 2 31965 0
2 0;78464 2 0 78464
3 51462 1 51462 None
4 31931;0 2 31931 0
这篇关于分割一列字符串并计算 pandas 的单词数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!