python-以不同的编码读取各种文件 [英] python - Reading all kinds of files in different encodings

查看:452
本文介绍了python-以不同的编码读取各种文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我构建了一个 Python隐写器,它在图像中隐藏了UTF-8文本,为此工作正常.我想知道是否可以在图像中编码完整的文件.为此,程序需要读取所有类型的文件.问题在于,并非所有文件都使用UTF-8进行编码,因此,您必须使用以下命令读取它们:

I built a Python steganographer that hides UTF-8 text in images and it works fine for it. I was wondering if I could encode complete files in images. For this, the program needs to read all kinds of files. The problem is that not all files are encoded with UTF-8 and therefore, you have to read them with:

file = open('somefile.docx', encoding='utf-8', errors='surrogateescape')

,如果将其复制到新文件并阅读它们,则表明文件不可解密.我需要一种方法来读取所有类型的文件,然后再写入它们,以便它们仍然可以工作.您有办法在Python 3中做到这一点吗?

and if you copy it to a new file and read them then it says that the files are not decipherable. I need a way to read all kinds of files and later write them so that they still work. Do you have a way to do this in Python 3?

谢谢.

推荐答案

更改视图.您不会在图像中隐藏UTF-8文本" .您在图像中隐藏了字节.

Change your view. You don't "hide UTF-8 text in images". You hide bytes in images.

这些字节可能是-完全是偶然-可以解释为UTF-8编码的文本.但实际上,它们可以是任何东西.

These bytes could be - purely accidentally - interpretable as UTF-8-encoded text. But in reality they could be anything.

使用open("...", encoding="...")以文本形式读取文件具有将文件的字节解码为字符串的隐藏步骤.当您要在程序中将文件内容视为字符串时,这很方便.

Reading a file as text with open("...", encoding="...") has the hidden step of decoding the bytes of the file into string. This is convenient when you want to treat the file contents as string in your program.

跳过该隐藏的解码步骤,并以字节为单位读取文件:open("...", "rb").

Skip that hidden decoding step and read the file as bytes: open("...", "rb").

这篇关于python-以不同的编码读取各种文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆