pandas :迭代循环的替代方法 [英] Pandas: Alternative to iterrow loops
问题描述
我有一个正在熊猫中运行的小函数,当我运行if x in y
语句时会抛出ValueError.我在建议布尔索引,.isin()
和where()
时遇到了类似的问题,但是我无法将任何示例都适合我的情况.任何建议将不胜感激.
I have a small function I'm running in pandas that throws a ValueError when I run an if x in y
statement. I saw similar-sounding problems recommending Boolean Indexing, .isin()
, and where()
, but I wasn't able to adapt any of the examples to my case. Any advice would be very much appreciated.
附加说明:groups
是包含数据框外部字符串的列表的列表.我对该函数的目标是查看数据框中的项目位于哪个列表中,然后返回该列表的索引.我在下面的笔记本链接中的第一个版本使用iterrows
遍历数据帧,但我知道在大多数情况下这不是最佳选择.
Additional note: groups
is a list of lists containing strings outside the dataframe. My goal with the function is see which list an item from the dataframe is in, then return the index of that list. My first version of this in the notebook link below uses iterrows
to loop through the dataframe, but I understand that is sub-optimal in most cases.
带有一些虚假数据的Jupyter笔记本: https: //github.com/amoebahlan61/sturdy-chainsaw/blob/master/Grouping%20Test_1.1.ipynb
Jupyter notebook with some fake data: https://github.com/amoebahlan61/sturdy-chainsaw/blob/master/Grouping%20Test_1.1.ipynb
谢谢!
代码:
def groupFinder(item):
for group in groups:
if item in group:
return groups.index(group)
df['groupID2'] = groupFinder(df['item'])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-808ac3e51e1f> in <module>()
4 return groups.index(group)
5
----> 6 df['groupID2'] = groupFinder(df['item'])
<ipython-input-16-808ac3e51e1f> in groupFinder(item)
1 def groupFinder(item):
2 for group in groups:
----> 3 if item in group:
4 return groups.index(group)
5
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
解决方案
我遇到了一些熊猫博客文章,还从reddit用户那里得到了一些反馈,这为我提供了一个解决方案,该解决方案通过使用熊猫的apply
函数跳过了iterrows
的使用.
Solution
I came across some pandas blog posts and also got some feedback from a reddit user which gave me a solution that skips using iterrows
by using pandas' apply
function.
df['groupID2'] = df.item.apply(groupFinder)
感谢大家的帮助和答复.
Thank you everyone for your help and responses.
推荐答案
解决方案
我遇到了一些有关熊猫的博客文章,并且还从reddit用户那里得到了一些反馈,这给了我一个解决方案,该解决方案通过使用熊猫的apply函数来跳过迭代过程.
I came across some pandas blog posts and also got some feedback from a reddit user which gave me a solution that skips using iterrows by using pandas' apply function.
df['groupID2'] = df.item.apply(groupFinder)
感谢大家的帮助和答复.
Thank you everyone for your help and responses.
这篇关于 pandas :迭代循环的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!