在Pandas数据框列中找到最长字符串的长度 [英] Find length of longest string in Pandas dataframe column

查看:940
本文介绍了在Pandas数据框列中找到最长字符串的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Pandas DataFrame中找到最长字符串长度的方法是否比下面的示例中显示的更快?

Is there a faster way to find the length of the longest string in a Pandas DataFrame than what's shown in the example below?

import numpy as np
import pandas as pd

x = ['ab', 'bcd', 'dfe', 'efghik']
x = np.repeat(x, 1e7)
df = pd.DataFrame(x, columns=['col1'])

print df.col1.map(lambda x: len(x)).max()
# result --> 6

使用IPython的%timeit计时df.col1.map(lambda x: len(x)).max()大约需要10秒钟.

It takes about 10 seconds to run df.col1.map(lambda x: len(x)).max() when timing it with IPython's %timeit.

推荐答案

DSM的建议似乎是您在不进行一些手动微优化的情况下所能获得的最佳结果:

DSM's suggestion seems to be about the best you're going to get without doing some manual microoptimization:

%timeit -n 100 df.col1.str.len().max()
100 loops, best of 3: 11.7 ms per loop

%timeit -n 100 df.col1.map(lambda x: len(x)).max()
100 loops, best of 3: 16.4 ms per loop

%timeit -n 100 df.col1.map(len).max()
100 loops, best of 3: 10.1 ms per loop

请注意,显式使用str.len()方法似乎并没有多大改进.如果您不熟悉IPython,这是非常方便的%timeit语法的来源,那么我强烈建议您试一下一下以快速测试类似的东西.

Note that explicitly using the str.len() method doesn't seem to be much of an improvement. If you're not familiar with IPython, which is where that very convenient %timeit syntax comes from, I'd definitely suggest giving it a shot for quick testing of things like this.

更新添加了屏幕截图:

这篇关于在Pandas数据框列中找到最长字符串的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆