pandas to_csv:ASCII无法编码字符 [英] pandas to_csv: ascii can't encode character

查看：241 发布时间：2020/5/24 0:12:22 python pandas unicode utf-8

本文介绍了 pandas to_csv:ASCII无法编码字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试读取数据帧并将其写入管道分隔的文件中.一些字符是非罗马字母(`，ç，ñ等).但是当我尝试将重音符号写为ASCII时，它就中断了.

I'm trying to read and write a dataframe to a pipe-delimited file. Some of the characters are non-Roman letters (`, ç, ñ, etc.). But it breaks when I try to write out the accents as ASCII.

df = pd.read_csv('filename.txt',sep='|', encoding='utf-8')
<do stuff>
newdf.to_csv('output.txt', sep='|', index=False, encoding='ascii')

-------

  File "<ipython-input-63-ae528ab37b8f>", line 21, in <module>
    newdf.to_csv(filename,sep='|',index=False, encoding='ascii')

  File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1344, in to_csv
    formatter.save()

  File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\formats\format.py", line 1551, in save
    self._save()

  File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\formats\format.py", line 1652, in _save
    self._save_chunk(start_i, end_i)

  File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\formats\format.py", line 1678, in _save_chunk
    lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)

  File "pandas\lib.pyx", line 1075, in pandas.lib.write_csv_rows (pandas\lib.c:19767)

UnicodeEncodeError: 'ascii' codec can't encode character '\xb4' in position 7: ordinal not in range(128)

如果我将to_csv更改为utf-8编码，那么我将无法正确读取文件:

If I change to_csv to have utf-8 encoding, then I can't read the file in properly:

newdf.to_csv('output.txt',sep='|',index=False,encoding='utf-8')
pd.read_csv('output.txt', sep='|')

> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 2: invalid start byte

我的目标是要使用竖线分隔文件来保留重音符号和特殊字符.

My goal is to have a pipe-delimited file that retains the accents and special characters.

此外，是否有一种简单的方法来确定read_csv中断了哪一行?现在，我不知道如何得到它来告诉我坏角色.

Also, is there an easy way to figure out which line read_csv is breaking on? Right now I don't know how to get it to show me the bad character(s).

推荐答案

您有一些非ASCII字符，因此无法按照您的尝试进行编码.我只会按照注释中的建议使用utf-8.

You have some characters that are not ASCII and therefore cannot be encoded as you are trying to do. I would just use utf-8 as suggested in a comment.

要检查导致问题的行，您可以尝试执行以下操作:

To check which lines are causing the issue you can try something like this:

def is_not_ascii(string):
    return string is not None and any([ord(s) >= 128 for s in string])

df[df[col].apply(is_not_ascii)]

您需要指定要测试的列col.

You'll need to specify the column col you are testing.

这篇关于 pandas to_csv:ASCII无法编码字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas to_csv:ASCII无法编码字符 [英] pandas to_csv: ascii can't encode character

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas to_csv:ASCII无法编码字符 [英] pandas to_csv: ascii can&#39;t encode character

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

pandas to_csv:ASCII无法编码字符 [英] pandas to_csv: ascii can't encode character

登录关闭