Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同 [英] Open() and codecs.open() in Python 2.7 behave strangely different

查看:805
本文介绍了Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件的第一行unicode字符和所有其他行在ASCII。
我尝试将第一行读作一个变量,将所有其他行作为另一个读取。但是,当我使用下面的代码:

 # -  *  -  coding:utf-8  -  *  -  
import codecs
import os
filename ='1.txt'
f = codecs.open(filename,'r3',encoding ='utf-8')
print f
)names_f = f.readline()。split('')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close ()
print'现在完全不同:'
g = open(filename,'r')
names_g = g.readline()。split('')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()


我得到以下输出:

 < ;打开文件'1.txt',模式'rb'在0x01235230> 
28

7

现在对于完全不同的东西:

<打开文件'1.txt',mode'r 'at 0x017875A0>

28

77

如果我不不使用readlines(),整个文件读取,不仅在codecs.open()和open()中的前7行。

为什么会发生这种情况?
为什么codecs.open()以二进制模式读取文件,尽管加了'r'参数?

Upd:这是原始文件: a href =http://www1.datafilehost.com/d/0792d687 =nofollow> http://www1.datafilehost.com/d/0792d687


<因为您使用 .readline() 第一个,所以 codecs.open()文件已经填充了线缓冲区;后续对 .readlines()的调用只返回缓冲行。



如果你再次调用 .readlines() ,剩下的行会被返回:

 >>> f = codecs.open(filename,'r3',encoding ='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

解决方法是不要混合 .readline() .readlines()

 data_f = f.readlines()
names_f = data_f.pop (0).split('')#取第一行。

这种行为真的是一个错误; Python的开发人员都知道,请参阅问题8260

另一个选项是使用 io .open() 而不是 codecs.open(); Python 3使用 io 库来实现内置的 open()函数,并且更强大并且比编解码器模块多功能。


I have a text file with first line of unicode characters and all other lines in ASCII. I try to read the first line as one variable, and all other lines as another. However, when I use the following code:

# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()

I get the following output:

<open file '1.txt', mode 'rb' at 0x01235230>
28

7

And now for something completely differerent:

<open file '1.txt', mode 'r' at 0x017875A0>

28

77

If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().

Why does such thing happen? And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?

Upd: This is original file: http://www1.datafilehost.com/d/0792d687

解决方案

Because you used .readline() first, the codecs.open() file has filled a linebuffer; the subsequent call to .readlines() returns only the buffered lines.

If you call .readlines() again, the rest of the lines are returned:

>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

The work-around is to not mix .readline() and .readlines():

f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ')  # take the first line.

This behaviour is really a bug; the Python devs are aware of it, see issue 8260.

The other option is to use io.open() instead of codecs.open(); the io library is what Python 3 uses to implement the built-in open() function and is a lot more robust and versatile than the codecs module.

这篇关于Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆