AttributeError:使用 pandas eval,"PandasExprVisitor"对象没有属性"visit_Ellipsis" [英] AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis', using pandas eval

查看:107
本文介绍了AttributeError:使用 pandas eval,"PandasExprVisitor"对象没有属性"visit_Ellipsis"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列的表格:

s

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object

请注意,其元素为字符串:

s[0]
'[133, 115, 3, 1]'

我正在尝试使用pd.eval将此字符串解析为列表的一列.这适用于此示例数据.

I'm trying to use pd.eval to parse this string into a column of lists. This works for this sample data.

pd.eval(s)

array([[133, 115, 3, 1],
       [114, 115, 2, 3],
       [51, 59, 1, 1]], dtype=object)

但是,在更大的数据(约10K)上,这失败了!

However, on much larger data (order of 10K), this fails miserably!

len(s)
300000

pd.eval(s)
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

我在这里想念什么?函数或我的数据有问题吗?

What am I missing here? Is there something wrong with the function or my data?

推荐答案

TL; DR
v0.21开始,这是一个错误,并且是GitHub上的一个未解决问题.参见 GH16289 .

TL;DR
As of v0.21, this is a bug, and an open issue on GitHub. See GH16289.

为什么会出现此错误?
(很有可能)是 pd.eval 的错误,该错误无法解析超过100行的序列.这是一个例子.

Why am I getting this error?
This (in all probability) is pd.eval's fault, which cannot parse series with more than 100 rows. Here's an example.

len(s)
300000

pd.eval(s.head(100))  # returns a parsed result

pd.eval(s.head(101))
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

无论解析器或引擎如何,此问题仍然存在.

This issue persists, regardless of the parser or the engine.

此错误是什么意思?
当传递的行超过100行时,pd.eval在该Series的__repr__上进行操作,而不是在其中包含的对象上进行操作(这是导致此错误的原因). __repr__被截断的行,将其替换为...(省略号).该省略号被引擎误解为Ellipsis对象-

What does this error mean?
When a series with more than 100 rows is passed, pd.eval operates on the __repr__ of the Series, rather than the objects contained within it (which is the cause of this bug). The __repr__ truncated rows, replacing them with a ... (ellipsis). This ellipsis is misinterpreted by the engine as an Ellipsis object -

...
Ellipsis

pd.eval('...')
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

正是此错误的原因.

我该怎么做才能使它正常工作?
目前,尚无解决方案(问题截至2017年12月28日仍未解决),但是有两种解决方法.

What can I do to make this to work?
Right now, there isn't a solution (the issue is still open as of 12/28/2017), however, there are a couple of workarounds.

选项1
ast.literal_eval
如果您可以保证没有任何格式错误的字符串,则此选项应立即可用.

Option 1
ast.literal_eval
This option should work out of the box if you can guarantee that you do not have any malformed strings.

from ast import literal_eval

s.apply(literal_eval)

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object 

如果存在格式错误的数据,则需要编写一些错误处理代码.您可以使用功能-

If there is a possibility of malformed data, you'll need to write a little error handling code. You can do that with a function -

def safe_parse(x):
    try:
        return literal_eval(x)
    except (SyntaxError, ValueError):
        return np.nan # replace with any suitable placeholder value

将此功能传递给apply-

s.apply(safe_parse)

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object

ast可用于任意数量的行,速度慢,但可靠.您还可以将pd.json.loads用于JSON数据,并采用与literal_eval相同的思想.

ast works for any number of rows, and is slow, but reliable. You can also use pd.json.loads for JSON data, applying the same ideas as with literal_eval.

选项2
yaml.load
解析简单数据的另一种不错的选择,我不久前从@ayhan那里选择了.

Option 2
yaml.load
Another great option for parsing simple data, I picked this up from @ayhan a while ago.

import yaml
s.apply(yaml.load)

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object

我还没有在更复杂的结构上对此进行过测试,但这对于几乎所有基本的数据字符串表示形式都适用.

I haven't tested this on more complex structures, but this should work for almost any basic string representation of data.

您可以在此处找到有关PyYAML的文档.向下滚动一点,您将找到有关load功能的更多详细信息.

You can find the documentation for PyYAML here. Scroll down a bit and you'll find more details on the load function.

注意

s = pd.read_csv(converters=literal_eval, squeeze=True)

converters参数将在读取的列上应用传递给该列的函数,因此您不必稍后进行解析.

Where the converters argument will apply that function passed on the column as it is read, so you don't have to deal with parsing later.

继续上面的要点,如果您正在使用数据框,请传递dict-

Continuing the point above, if you're working with a dataframe, pass a dict -

df =  pd.read_csv(converters={'col' : literal_eval})

col是需要解析的列 您还可以传递pd.json.loads(用于json数据)或pd.eval(如果您具有100行或更少的行).

Where col is the column that needs to be parsed You can also pass pd.json.loads (for json data), or pd.eval (if you have 100 rows or less).

感谢MaxU和Moondra发现此问题.

Credits to MaxU and Moondra for uncovering this issue.

这篇关于AttributeError:使用 pandas eval,"PandasExprVisitor"对象没有属性"visit_Ellipsis"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆