在将制表符分隔的文本文件读取到Pandas数据框中时发生RunTimeError [英] RunTimeError while reading tab separated text file into Pandas dataframe

查看:111
本文介绍了在将制表符分隔的文本文件读取到Pandas数据框中时发生RunTimeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将一个制表符分隔的文本文件读入pandas数据帧中,在读取此错误时遇到了运行时错误.我遍历了与该错误有关的帖子,所有这些都暗示着一个规则,即不应修改字典在我的情况下,我要做的就是读取文件.此问题与迭代和更改dict的错误有什么联系?

I am reading a tab separated text file into pandas dataframe.I am getting a runtime error while reading this.I have gone through the posts related to this error and all of them are alluding to the rule that one should not modify dicts while iterating over them.In my case all I am doing is reading a file.How is this problem connected to an error of iterating and changing dicts ?

>>> import pandas as pd
>>> df=pd.read_csv("dummy_data.txt",header=None,chunksize=10000,error_bad_lines=False,warn_bad_lines=True,engine='c',sep="\t",encoding="latin-1")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    df=pd.read_csv("dummy_data.txt",header=None,chunksize=10000,error_bad_lines=False,warn_bad_lines=True,engine='c',sep="\t",encoding="latin-1")
  File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/parsers.py", line 431, in _read
    compression = _infer_compression(filepath_or_buffer, compression)
  File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/common.py", line 270, in _infer_compression
    filepath_or_buffer = _stringify_path(filepath_or_buffer)
  File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/common.py", line 157, in _stringify_path
    from py.path import local as LocalPath
  File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/py/__init__.py", line 148, in <module>
    'Syslog'             : '._log.log:Syslog',
  File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/py/_vendored_packages/apipkg.py", line 63, in initpkg
    for module in sys.modules.values():
RuntimeError: dictionary changed size during iteration

修改1: 通过交互模式读取文件时,尝试读取文件时遇到两次相同的错误.第3次运行同一行不会引发任何错误.这种不稳定行为的原因可能是什么?

Edit 1: While reading the file via the interactive mode I encounter the same error twice while trying to read the file.On the 3rd time running the same line doesn't throw any error.What could be the reason for such unstable behavior ?

>>> df=pd.read_csv("product_name.txt",header=None,chunksize=10000,error_bad_lines=False,warn_bad_lines=True,engine='c',sep="\t",encoding="latin-1")

修改2: 要复制此错误,请链接到1000行数据集: 到数据集的S3链接

编辑3 :找到了一个具有类似问题的链接:熊猫CSV文件,偶尔有多余的列 但是在我看来,其中提到的标志(error_bad_lines)似乎不起作用.

Edit 3: Found a link with a similar issue:Pandas CSV file with occasional extra column But the flags mentioned in it (error_bad_lines) doesn't seem to work in my case.

>>> df = pd.read_csv("unclean.csv", error_bad_lines=False, header=None)

我已经开发了一个脚本来将虚拟数据(在Edit 2中提到)加载到熊猫数据帧,然后将其保存到hdf5文件中.我运行了20次该脚本,但一次都没有遇到RuntimeError.在交互模式下读取文件会暴露RuntimeError和不稳定的行为.对于python脚本与交互模式而言,行为不同的原因可能是什么.我正在使用Pandas == 0.22.0和Python == 3.5.2和表格== 3.4.4

Edit 4: I have developed a script to load the dummy data (mentioned in Edit 2) to a pandas dataframe and then save it to a hdf5 file.I ran this script 20 times and not once did I encounter a RuntimeError.On the other hand while trying to read the file on the interactive mode exposes a RuntimeError and a unstable behaviour.What could be the reason for a different behaviour for python script Vs interactive mode.I am using Pandas ==0.22.0 and Python==3.5.2 and tables==3.4.4

import pandas as pd
import tables

df=pd.read_csv("dummy.txt",header=None,error_bad_lines=False,warn_bad_lines=False,engine='c',sep="\t",encoding="latin-1",names=["product_name_id","current_product_name_id","product_n","active_f","create_d","create_user_n","change_d","change_user_n","ft_timestamp"])

df.to_hdf(path_or_buf="/home/avadhut/data_files/dummy_data.h5",key="dummy",mode="a",format="table")

df=pd.read_hdf("/home/avadhut/data_files/dummy_data.h5",key="dummy")
print(df.head(100))

推荐答案

在默认的python解释器上运行代码,看看错误是否仍然存在.这应该是bpython的错误,因为我无法默认复制问题python解释器

Run your code on the default python interpreter and see if the error persists.It should be a bug with bpython as I am not able to replicate the issue on default python interpreter

这篇关于在将制表符分隔的文本文件读取到Pandas数据框中时发生RunTimeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆