用Python方式计算Pandas DataFrame列中列表的长度 [英] Pythonic way for calculating length of lists in pandas dataframe column
本文介绍了用Python方式计算Pandas DataFrame列中列表的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个像这样的数据框:
I have a dataframe like this:
CreationDate
2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux]
2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2]
2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik]
我是CreationDate
列中列表的计算长度,并像这样新建一个Length
列:
I am calculation length of lists in the CreationDate
column and making a new Length
column like this:
df['Length'] = df.CreationDate.apply(lambda x: len(x))
哪个给我这个:
CreationDate Length
2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3
2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4
2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4
还有更多的pythonic方法可以做到这一点吗?
Is there a more pythonic way to do this?
推荐答案
您也可以将str
访问器用于某些列表操作.在此示例中,
You can use the str
accessor for some list operations as well. In this example,
df['CreationDate'].str.len()
返回每个列表的长度.请参阅 str.len
的文档
returns the length of each list. See the docs for str.len
.
df['Length'] = df['CreationDate'].str.len()
df
Out:
CreationDate Length
2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3
2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4
2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4
对于这些操作,香草Python通常更快.熊猫可以处理NaN.时间是:
For these operations, vanilla Python is generally faster. pandas handles NaNs though. Here are timings:
ser = pd.Series([random.sample(string.ascii_letters,
random.randint(1, 20)) for _ in range(10**6)])
%timeit ser.apply(lambda x: len(x))
1 loop, best of 3: 425 ms per loop
%timeit ser.str.len()
1 loop, best of 3: 248 ms per loop
%timeit [len(x) for x in ser]
10 loops, best of 3: 84 ms per loop
%timeit pd.Series([len(x) for x in ser], index=ser.index)
1 loop, best of 3: 236 ms per loop
这篇关于用Python方式计算Pandas DataFrame列中列表的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文