在Pandas数据框列中找到最长字符串的长度 [英] Find length of longest string in Pandas dataframe column
问题描述
在Pandas DataFrame中找到最长字符串长度的方法是否比下面的示例中显示的更快?
Is there a faster way to find the length of the longest string in a Pandas DataFrame than what's shown in the example below?
import numpy as np
import pandas as pd
x = ['ab', 'bcd', 'dfe', 'efghik']
x = np.repeat(x, 1e7)
df = pd.DataFrame(x, columns=['col1'])
print df.col1.map(lambda x: len(x)).max()
# result --> 6
使用IPython的%timeit
计时df.col1.map(lambda x: len(x)).max()
大约需要10秒钟.
It takes about 10 seconds to run df.col1.map(lambda x: len(x)).max()
when timing it with IPython's %timeit
.
推荐答案
DSM的建议似乎是您在不进行一些手动微优化的情况下所能获得的最佳结果:
DSM's suggestion seems to be about the best you're going to get without doing some manual microoptimization:
%timeit -n 100 df.col1.str.len().max()
100 loops, best of 3: 11.7 ms per loop
%timeit -n 100 df.col1.map(lambda x: len(x)).max()
100 loops, best of 3: 16.4 ms per loop
%timeit -n 100 df.col1.map(len).max()
100 loops, best of 3: 10.1 ms per loop
请注意,显式使用str.len()
方法似乎并没有多大改进.如果您不熟悉IPython,这是非常方便的%timeit
语法的来源,那么我强烈建议您试一下一下以快速测试类似的东西.
Note that explicitly using the str.len()
method doesn't seem to be much of an improvement. If you're not familiar with IPython, which is where that very convenient %timeit
syntax comes from, I'd definitely suggest giving it a shot for quick testing of things like this.
更新添加了屏幕截图:
这篇关于在Pandas数据框列中找到最长字符串的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!