在将制表符分隔的文本文件读取到Pandas数据框中时发生RunTimeError [英] RunTimeError while reading tab separated text file into Pandas dataframe
问题描述
我正在将一个制表符分隔的文本文件读入pandas数据帧中,在读取此错误时遇到了运行时错误.我遍历了与该错误有关的帖子,所有这些都暗示着一个规则,即不应修改字典在我的情况下,我要做的就是读取文件.此问题与迭代和更改dict的错误有什么联系?
I am reading a tab separated text file into pandas dataframe.I am getting a runtime error while reading this.I have gone through the posts related to this error and all of them are alluding to the rule that one should not modify dicts while iterating over them.In my case all I am doing is reading a file.How is this problem connected to an error of iterating and changing dicts ?
>>> import pandas as pd
>>> df=pd.read_csv("dummy_data.txt",header=None,chunksize=10000,error_bad_lines=False,warn_bad_lines=True,engine='c',sep="\t",encoding="latin-1")
Traceback (most recent call last):
File "<input>", line 1, in <module>
df=pd.read_csv("dummy_data.txt",header=None,chunksize=10000,error_bad_lines=False,warn_bad_lines=True,engine='c',sep="\t",encoding="latin-1")
File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/parsers.py", line 431, in _read
compression = _infer_compression(filepath_or_buffer, compression)
File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/common.py", line 270, in _infer_compression
filepath_or_buffer = _stringify_path(filepath_or_buffer)
File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/pandas/io/common.py", line 157, in _stringify_path
from py.path import local as LocalPath
File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/py/__init__.py", line 148, in <module>
'Syslog' : '._log.log:Syslog',
File "/home/avadhut/.virtualenvs/avadhut_virtual/lib/python3.5/site-packages/py/_vendored_packages/apipkg.py", line 63, in initpkg
for module in sys.modules.values():
RuntimeError: dictionary changed size during iteration
修改1: 通过交互模式读取文件时,尝试读取文件时遇到两次相同的错误.第3次运行同一行不会引发任何错误.这种不稳定行为的原因可能是什么?
Edit 1: While reading the file via the interactive mode I encounter the same error twice while trying to read the file.On the 3rd time running the same line doesn't throw any error.What could be the reason for such unstable behavior ?
>>> df=pd.read_csv("product_name.txt",header=None,chunksize=10000,error_bad_lines=False,warn_bad_lines=True,engine='c',sep="\t",encoding="latin-1")
修改2: 要复制此错误,请链接到1000行数据集: 到数据集的S3链接
编辑3 :找到了一个具有类似问题的链接:熊猫CSV文件,偶尔有多余的列一个> 但是在我看来,其中提到的标志(error_bad_lines)似乎不起作用.
Edit 3: Found a link with a similar issue:Pandas CSV file with occasional extra column But the flags mentioned in it (error_bad_lines) doesn't seem to work in my case.
>>> df = pd.read_csv("unclean.csv", error_bad_lines=False, header=None)
我已经开发了一个脚本来将虚拟数据(在Edit 2中提到)加载到熊猫数据帧,然后将其保存到hdf5文件中.我运行了20次该脚本,但一次都没有遇到RuntimeError.在交互模式下读取文件会暴露RuntimeError和不稳定的行为.对于python脚本与交互模式而言,行为不同的原因可能是什么.我正在使用Pandas == 0.22.0和Python == 3.5.2和表格== 3.4.4
Edit 4: I have developed a script to load the dummy data (mentioned in Edit 2) to a pandas dataframe and then save it to a hdf5 file.I ran this script 20 times and not once did I encounter a RuntimeError.On the other hand while trying to read the file on the interactive mode exposes a RuntimeError and a unstable behaviour.What could be the reason for a different behaviour for python script Vs interactive mode.I am using Pandas ==0.22.0 and Python==3.5.2 and tables==3.4.4
import pandas as pd
import tables
df=pd.read_csv("dummy.txt",header=None,error_bad_lines=False,warn_bad_lines=False,engine='c',sep="\t",encoding="latin-1",names=["product_name_id","current_product_name_id","product_n","active_f","create_d","create_user_n","change_d","change_user_n","ft_timestamp"])
df.to_hdf(path_or_buf="/home/avadhut/data_files/dummy_data.h5",key="dummy",mode="a",format="table")
df=pd.read_hdf("/home/avadhut/data_files/dummy_data.h5",key="dummy")
print(df.head(100))
推荐答案
在默认的python解释器上运行代码,看看错误是否仍然存在.这应该是bpython的错误,因为我无法默认复制问题python解释器
Run your code on the default python interpreter and see if the error persists.It should be a bug with bpython as I am not able to replicate the issue on default python interpreter
这篇关于在将制表符分隔的文本文件读取到Pandas数据框中时发生RunTimeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!