迭代 pandas 系列/列的最快方法 [英] fastest way to iterate pandas series/column

查看:60
本文介绍了迭代 pandas 系列/列的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我更习惯于 for 循环,但是一旦您获得大量数据,它们在 Pandas 中会变得很慢.我一直在寻找 iterrows、iter... 等示例,但想知道是否有更快的方法.我现在拥有的是

I'm more used to for loops but they can become slow in pandas once you get large sets of data. I keep finding iterrows, iter..., etc. examples but want to know if there's a faster way. What I currently have now is

newnames = []
names = df['name'].tolist()
for i in names:
  i = i.replace(' ','_')
  newnames.append(i)

然后我可以将 newnames 列表作为 Pandas 列添加到 df 中,或者我应该重写现有的 df['name'] 值吗?不太熟悉熊猫最佳实践,所以我欢迎所有反馈.谢谢

and then I could add the newnames list to the df as a pandas column OR should I rewrite the existing df['name'] values in place? Not too familiar with pandas best practices so I welcome all feedback. Thanks

推荐答案

如果您最终想将新名称添加到 df,您可以直接通过:

If you finally want to add the newnames to df, you could do it directly by:

df['newnames'] = df['name'].str.replace(' ', '_')

如果只是想改变name列,用_替换所有空格,也可以直接在原列上进行(覆盖),如下:

If you just want to change name column to replace all spaces by _, you can also do it directly on the original column (overwrite it), as follows:

df['name'] = df['name'].str.replace(' ', '_')

在这两种方式中,我们都使用 Pandas 的矢量化操作,该操作已经过优化以加快执行速度,而不是使用未经优化且速度较慢的循环.

In both ways, we are doing it using Pandas' vectorized operation which has been optimized for faster execution, instead of using looping which has not been optimized and is slow.

这篇关于迭代 pandas 系列/列的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆