使用另一列的值的len()添加数据框架列 [英] Adding a data frame column with len() of another column's values
问题描述
我有一个问题,试图在另一列中获取字符串值的字符数列,并没有弄清楚如何有效地执行。
I'm having a problem trying to get a character count column of the string values in another column, and haven't figured out how to do it efficiently.
for index in range(len(df)):
df['char_length'][index] = len(df['string'][index]))
这显然是首先创建一列空值,然后重写它,这需要很长时间数据集。那么最有效的方式就是获得一些如
This apparently involves first creating a column of nulls and then rewriting it, and it takes a really long time on my data set. So what's the most effective way of getting something like
'string' 'char_length'
abcd 4
abcde 5
我已经检查了很多,但是我没有能力
I've checked around quite a bit, but I haven't been able to figure it out.
推荐答案
熊猫有一个矢量化字符串方法: str.len()
。要创建新列,您可以写:
Pandas has a vectorised string method for this: str.len()
. To create the new column you can write:
df['char_length'] = df['string'].str.len()
例如:
>>> df
string
0 abcd
1 abcde
>>> df['char_length'] = df['string'].str.len()
>>> df
string char_length
0 abcd 4
1 abcde 5
这应该比循环使用Python for
循环的DataFrame快得多。
This should be considerably faster than looping over the DataFrame with a Python for
loop.
许多其他熟悉的字符串方法Python已被引入熊猫。例如, lower
(用于转换为小写字母), count
用于计算特定子字符串的出现次数,替换
用于与另一个子串进行交换。
Many other familiar string methods from Python have been introduced to Pandas. For example, lower
(for converting to lowercase letters), count
for counting occurrences of a particular substring, and replace
for swapping one substring with another.
这篇关于使用另一列的值的len()添加数据框架列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!