pandas :迭代循环的替代方法 [英] Pandas: Alternative to iterrow loops

查看:98
本文介绍了 pandas :迭代循环的替代方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正在熊猫中运行的小函数,当我运行if x in y语句时会抛出ValueError.我在建议布尔索引,.isin()where()时遇到了类似的问题,但是我无法将任何示例都适合我的情况.任何建议将不胜感激.

I have a small function I'm running in pandas that throws a ValueError when I run an if x in y statement. I saw similar-sounding problems recommending Boolean Indexing, .isin(), and where(), but I wasn't able to adapt any of the examples to my case. Any advice would be very much appreciated.

附加说明:groups是包含数据框外部字符串的列表的列表.我对该函数的目标是查看数据框中的项目位于哪个列表中,然后返回该列表的索引.我在下面的笔记本链接中的第一个版本使用iterrows遍历数据帧,但我知道在大多数情况下这不是最佳选择.

Additional note: groups is a list of lists containing strings outside the dataframe. My goal with the function is see which list an item from the dataframe is in, then return the index of that list. My first version of this in the notebook link below uses iterrows to loop through the dataframe, but I understand that is sub-optimal in most cases.

带有一些虚假数据的Jupyter笔记本: https: //github.com/amoebahlan61/sturdy-chainsaw/blob/master/Grouping%20Test_1.1.ipynb

Jupyter notebook with some fake data: https://github.com/amoebahlan61/sturdy-chainsaw/blob/master/Grouping%20Test_1.1.ipynb

谢谢!

代码:

def groupFinder(item):
    for group in groups:
        if item in group:
            return groups.index(group)

df['groupID2'] = groupFinder(df['item'])


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-808ac3e51e1f> in <module>()
      4             return groups.index(group)
      5 
----> 6 df['groupID2'] = groupFinder(df['item'])

<ipython-input-16-808ac3e51e1f> in groupFinder(item)
      1 def groupFinder(item):
      2     for group in groups:
----> 3         if item in group:
      4             return groups.index(group)
      5 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

解决方案 我遇到了一些熊猫博客文章,还从reddit用户那里得到了一些反馈,这为我提供了一个解决方案,该解决方案通过使用熊猫的apply函数跳过了iterrows的使用.

Solution I came across some pandas blog posts and also got some feedback from a reddit user which gave me a solution that skips using iterrows by using pandas' apply function.

df['groupID2'] = df.item.apply(groupFinder)

感谢大家的帮助和答复.

Thank you everyone for your help and responses.

推荐答案

解决方案

我遇到了一些有关熊猫的博客文章,并且还从reddit用户那里得到了一些反馈,这给了我一个解决方案,该解决方案通过使用熊猫的apply函数来跳过迭代过程.

I came across some pandas blog posts and also got some feedback from a reddit user which gave me a solution that skips using iterrows by using pandas' apply function.

df['groupID2'] = df.item.apply(groupFinder)

感谢大家的帮助和答复.

Thank you everyone for your help and responses.

这篇关于 pandas :迭代循环的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆