使用另一列的值的len()添加数据框架列 [英] Adding a data frame column with len() of another column's values

查看:138
本文介绍了使用另一列的值的len()添加数据框架列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,试图在另一列中获取字符串值的字符数列,并没有弄清楚如何有效地执行。

I'm having a problem trying to get a character count column of the string values in another column, and haven't figured out how to do it efficiently.

for index in range(len(df)):
    df['char_length'][index] = len(df['string'][index]))

这显然是首先创建一列空值,然后重写它,这需要很长时间数据集。那么最有效的方式就是获得一些如

This apparently involves first creating a column of nulls and then rewriting it, and it takes a really long time on my data set. So what's the most effective way of getting something like

'string'     'char_length'
abcd          4
abcde         5

我已经检查了很多,但是我没有能力

I've checked around quite a bit, but I haven't been able to figure it out.

推荐答案

熊猫有一个矢量化字符串方法 str.len()。要创建新列,您可以写:

Pandas has a vectorised string method for this: str.len(). To create the new column you can write:

df['char_length'] = df['string'].str.len()

例如:

>>> df
  string
0   abcd
1  abcde

>>> df['char_length'] = df['string'].str.len()
>>> df
  string  char_length
0   abcd            4
1  abcde            5

这应该比循环使用Python for 循环的DataFrame快得多。

This should be considerably faster than looping over the DataFrame with a Python for loop.

许多其他熟悉的字符串方法Python已被引入熊猫。例如, lower (用于转换为小写字母), count 用于计算特定子字符串的出现次数,替换用于与另一个子串进行交换。

Many other familiar string methods from Python have been introduced to Pandas. For example, lower (for converting to lowercase letters), count for counting occurrences of a particular substring, and replace for swapping one substring with another.

这篇关于使用另一列的值的len()添加数据框架列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆