使用重定向时python3中的UnicodeEncodeError [英] UnicodeEncodeError in python3 when redirection is used

查看:67
本文介绍了使用重定向时python3中的UnicodeEncodeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做的是:从pdf文件中提取文本信息,然后将其重定向到txt文件.

What I want to do: extract text information from a pdf file and redirect that to a txt file.

我做了什么:

pip install pdfminor

pdf2txt.py file.pdf > output.txt

我得到了什么:

UnicodeEncodeError:'gbk'编解码器无法在位置0:非法的多字节序列中对字符'\ u2022'进行编码

UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 0: illegal multibyte sequence

我的观察:

\ u2022 是项目符号.

pdf2txt.py 无需重定向即可正常工作:项目符号字符已正确写入stdout.

pdf2txt.py works well without redirection: the bullet point character is written to stdout without any error.

我的问题:

为什么重定向会导致python错误?据我所知,重定向是操作系统.作业,只是在程序完成后复制内容.

Why does redirection cause a python error? As far as I know, redirection is a O.S. job, and it is simply copying things after the program is finished.

如何解决此错误?我不能对 pdf2txt.py 进行任何修改,因为它不是我的代码.

How can I fix this error? I cannot do any modification to pdf2txt.py as it's not my code.

推荐答案

重定向会导致错误,因为Python使用的默认编码不支持您尝试输出的字符之一.在您的情况下,您尝试输出项目符号使用 GBK编解码器进行.这可能意味着您正在使用中文版本的Windows.

Redirection causes an error because the default encoding used by Python does not support one of the characters you're trying to output. In your case you're trying to output the bullet character using the GBK codec. This probably means you're using a Chinese version of Windows.

Python 3.6或更高版本可以很好地输出到Windows的终端窗口,因为使用Unicode会完全绕过字符编码.仅在将输出重定向到文件时,才必须将Unicode编码为字节流.

A version of Python 3.6 or later will work fine outputting to the terminal window on Windows, because character encoding is bypassed completely using Unicode. It's only when redirecting the output to a file that the Unicode must be encoded to a byte stream.

您可以设置环境变量 PYTHONIOENCODING 更改用于stdio的编码.如果您使用UTF-8,将保证可以与任何Unicode字符一起使用.

You can set the environment variable PYTHONIOENCODING to change the encoding used for stdio. If you use UTF-8 it will be guaranteed to work with any Unicode character.

set PYTHONIOENCODING=utf-8
pdf2txt.py file.pdf > output.txt

这篇关于使用重定向时python3中的UnicodeEncodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆