写入文件时导致此垃圾的原因 [英] What is causing this garbage when writing to a file

查看:77
本文介绍了写入文件时导致此垃圾的原因的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚这种情况下正在发生什么。我使用的是Windows 7 64位,并且正在Python中尝试Unicode。

I am trying to figure out what is happening in this situation. I am on Windows 7 64-bit and I was experimenting with Unicode in Python.

使用以下Python代码

With the following Python code

#aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
#aaaaaa

x = [u'\xa3']

f = open('file_garbage.txt', 'w+')
for s in x:
    if s in f.read():
        continue
    else:
        f.write(s.encode('utf-8'))
f.close()

我没有收到错误消息,并且file_garbage.txt包含

I get no error message and file_garbage.txt contains

£

当我将其他项目添加到x像这样

when I add another item to x like so

#aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
#aaaaaa

x = [u'\xa3',
     u'\xa3']

f = open('file_garbage.txt', 'w+')
for s in x:
    if s in f.read():
        continue
    else:
        f.write(s.encode('utf-8'))
f.close()

我收到UnicodeDecodeError

I get a UnicodeDecodeError

Traceback (most recent call last):
  File "file_garbage.py", line 9, in <module>
    if s in f.read():
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 2: ordinal not in range(128)

file_garbage.txt将包含大约250行这样的字节

file_garbage.txt will contain either around 250 lines of bytes like this

c2a3 4b02 e0a6 5400 6161 6161 6161 6161
6161 6161 6161 6161 6161 6161 6161 6161
6161 6161 6161 6161 6161 610d 0a23 6161
6161 6161 0d0a 0d0a 7820 3d20 5b75 275c
7861 3327 2c0d 0a20 2020 2020 7527 5c78
6133 275d 0d0a 0d0a 6620 3d20 6f70 656e
2827 6669 6c65 5f67 6172 6261 6765 2e74

像这样的垃圾

£Kà¦éaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
#aaaaaa

x = [u'\xa3',
     u'\xa3']

f = open('file_garbage.txt', 'w+')
for s in x:
    if s in f.read():
        continue
    else:
        f.write(s.encode('utf-8'))
f.close()
 Python Character Mapping Codec cp1252 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT' with gencodec.py.

iÿÿÿÿNt

后面是一堆ENQ,DC2,SOH,STX ,NUL符号和链接:

followed by a bunch of ENQ, DC2, SOH, STX, NUL symbols and links to:

 C:\Python27\lib\encodings\cp1252.py

垃圾图片:

我想这与编码和/或我处理文件的方式有关,但是我对正在发生的事情感到困惑确切以及为什么结果似乎有所不同。

I am guessing that this is a problem to do with encoding and/or the way I am dealing with files, but I am confused about what is happening exactly and why the results seem to differ.

仅当文件顶部的那些看似随机的一对注释​​字符串时才会生成垃圾,否则总是生成字节。

The garbage seems to only be generated if those seemingly random couple of comment strings at the top of the file but the bytes will always be generated otherwise.

如果有帮助,我的系统编码设置如下:

If it helps, my system encodings are set as follows:

sys.stdout.encoding            :  cp850
sys.stdout.isatty()            :  True
locale.getpreferredencoding()  :  cp1252
sys.getfilesystemencoding()    :  mbcs


推荐答案

该文件可能由于未正确关闭而被破坏。我从未见过这种特殊的行为,但这在可能范围之内。尝试更改代码以在中使用

It is possible that the file is being corrupted because it is not closed properly. I've never seen this particular behavior but it's within the realm of possibility. Try changing your code to use with:

with open('file_garbage.txt', 'w+') as f:
    # do your stuff here

导致异常的原因是 x 包含unicode字符串,但是当您读取 f 时,您正在读取的是字节。当您尝试检查f.read()中的时,它尝试将unicode字符串与文件中的字节进行比较,但失败,因为文件中的字节可以不能解释为unicode。您需要将文件的内容解码回unicode。

The cause of the exception is that x contains unicode strings, but when you read in f you are reading in bytes. When you try to check s in f.read(), it tries to compare the unicode string to the bytes in the file, and fails because the bytes in the file can't be interpreted as unicode. You need to decode the contents of the file back into unicode.

您的代码还存在其他一些问题,这些问题在此问题范围之外。对于初学者来说,在这样的循环中使用 f.read()无效,因为第一次读取将读取整个文件,而随后的读取将不返回任何内容。相反,请先将文件读取(并解码)成一个变量,然后再对该变量进行比较。另外,我不确定以 w + 模式读写文件是否可以满足您的需求。 (我实际上不确定您希望代码做什么。)如文档 w + 会截断该文件,因此您将无法通过添加到已有文件中来对其进行更新。

Your code has a few other problems that are somewhat outside the scope of this question. For starters, using f.read() in a loop like that won't work, because the first read will read the whole file, and subsequent reads will return nothing. Instead, read (and decode) the file into a variable first, then do your comparison against that variable. Also, I'm not sure if reading and writing the file in w+ mode will do what you want. (I'm not actually sure what you want your code to do.) As documented, w+ truncates the file, so you won't be able to "update" it by adding to what's already there.

这篇关于写入文件时导致此垃圾的原因的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆