Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同 [英] Open() and codecs.open() in Python 2.7 behave strangely different
问题描述
我尝试将第一行读作一个变量,将所有其他行作为另一个读取。但是,当我使用下面的代码:
# - * - coding:utf-8 - * -
import codecs
import os
filename ='1.txt'
f = codecs.open(filename,'r3',encoding ='utf-8')
print f
)names_f = f.readline()。split('')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close ()
print'现在完全不同:'
g = open(filename,'r')
names_g = g.readline()。split('')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
我得到以下输出:
< ;打开文件'1.txt',模式'rb'在0x01235230>
28
7
现在对于完全不同的东西:
<打开文件'1.txt',mode'r 'at 0x017875A0>
28
77
如果我不不使用readlines(),整个文件读取,不仅在codecs.open()和open()中的前7行。
为什么会发生这种情况?
为什么codecs.open()以二进制模式读取文件,尽管加了'r'参数?
Upd:这是原始文件: a href =http://www1.datafilehost.com/d/0792d687 =nofollow> http://www1.datafilehost.com/d/0792d687
<因为您使用.readline()
第一个,所以codecs.open()
文件已经填充了线缓冲区;后续对.readlines()
的调用只返回缓冲行。
如果你再次调用
.readlines()
,剩下的行会被返回:
>>> f = codecs.open(filename,'r3',encoding ='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
解决方法是不要混合
data_f = f.readlines().readline()
和.readlines()
:
names_f = data_f.pop (0).split('')#取第一行。
这种行为真的是一个错误; Python的开发人员都知道,请参阅问题8260 。
另一个选项是使用 io .open()
而不是 codecs.open()
; Python 3使用 io
库来实现内置的 open()
函数,并且更强大并且比编解码器
模块多功能。
I have a text file with first line of unicode characters and all other lines in ASCII. I try to read the first line as one variable, and all other lines as another. However, when I use the following code:
# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
I get the following output:
<open file '1.txt', mode 'rb' at 0x01235230>
28
7
And now for something completely differerent:
<open file '1.txt', mode 'r' at 0x017875A0>
28
77
If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().
Why does such thing happen? And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687
Because you used .readline()
first, the codecs.open()
file has filled a linebuffer; the subsequent call to .readlines()
returns only the buffered lines.
If you call .readlines()
again, the rest of the lines are returned:
>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
The work-around is to not mix .readline()
and .readlines()
:
f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use io.open()
instead of codecs.open()
; the io
library is what Python 3 uses to implement the built-in open()
function and is a lot more robust and versatile than the codecs
module.
这篇关于Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!