文件名带有重音时使用pandas.read_csv进行编码 [英] Encoding with pandas.read_csv when file name has accents

查看:62
本文介绍了文件名带有重音时使用pandas.read_csv进行编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用熊猫加载CSV,但是如果文件名带有重音符,则会遇到问题.显然,这是一个编码问题,但是尽管read_csv允许您为文件中的文本设置编码,但是我不知道如何正确地编码文件名.

I'm trying to load a CSV with pandas, but am running into a problem if the file name has accents. It's clearly an encoding problem, but although read_csv lets you set encoding for text within the file, I can't figure out how to encode the file name properly.

input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province)
self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)

CSV文件是Anzoátegui.csv.遇到错误时,

The CSV file is Anzoátegui.csv. When I'm getting errors,

input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv

错误代码:

OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

那么也许是将我的字符串转换为字节?我也尝试使用io.StringIO(input_file),它将正确的文件名作为列标题放在空的DataFrame上:

So maybe it's converting my string to bytes? I tried using io.StringIO(input_file) as well, which puts the correct file name as a column header on an empty DataFrame:

Empty DataFrame
Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv]
Index: []

有关如何获取此文件的任何想法?不幸的是,我不能只删除重音符号,因为我必须与需要正确名称的软件进行交互,而且我要格式化的文件很多(而不仅仅是一个文件).谢谢!

Any ideas on how to get this file to load? Unfortunately I can't just strip out accents, as I have to interface with software that requires the proper name, and I have a ton of files to format (not just the one). Thanks!

完全错误

Traceback (most recent call last):
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
    result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
    result = eval(compiled, updated_globals, frame.f_locals)
  File "<string>", line 1, in <module>
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
    self._make_engine(self.engine)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
  File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

推荐答案

好吧,我在依赖地狱中有些失落,但是事实证明,此问题已在熊猫0.14.0中修复.安装更新的版本以获取带有重音符号的文件以正确导入.

Ok folks, I got a little lost in dependency hell, but it turns out that this issue was fixed in pandas 0.14.0. Install the updated version to get files named with accents to import correctly.

在github上的评论.

感谢您的输入!

这篇关于文件名带有重音时使用pandas.read_csv进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆