如何从python中的文件中读取自定义分隔符终止的记录? [英] How to read records terminated by custom separator from file in python?
问题描述
我想在python中为文件中的行做,其中行尾被重新定义为我想要的任何字符串。另一种说法是我想从文件而不是行读取记录;我希望它与读取行同样快捷方便。
I would like a way to do for line in file
in python, where the end of line is redefined to be any string that I want. Another way of saying that is I want to read records from file rather than lines; I want it to be equally fast and convenient to do as reading lines.
这是相当于设置perl的 $ / $ c的python $ c>输入记录分隔符,或在java中使用
中的行,这不一定必须使用 Scanner
。对于文件(特别是,迭代器可能不是文件对象)。只是等同于避免将太多数据读入内存的东西。
This is the python equivalent to setting perl's $/
input record separator, or using Scanner
in java. This doesn't necessarily have to use for line in file
(in particular, the iterator may not be a file object). Just something equivalent which avoids reading too much data into memory.
推荐答案
在Python 2.x 文件
对象或Python 3.3 io
类中没有任何内容可以让您指定自定义 readline
的分隔符。 (文件中的行最终使用与
readline
相同的代码。)
There is nothing in the Python 2.x file
object, or the Python 3.3 io
classes, that lets you specify a custom delimiter for readline
. (The for line in file
is ultimately using the same code as readline
.)
但是自己构建它很容易。例如:
But it's pretty easy to build it yourself. For example:
def delimited(file, delimiter='\n', bufsize=4096):
buf = ''
while True:
newbuf = file.read(bufsize)
if not newbuf:
yield buf
return
buf += newbuf
lines = buf.split(delimiter)
for line in lines[:-1]:
yield line
buf = lines[-1]
这是一个愚蠢的例子:
Here's a stupid example of it in action:
>>> s = io.StringIO('abcZZZdefZZZghiZZZjklZZZmnoZZZpqr')
>>> d = delimited(s, 'ZZZ', bufsize=2)
>>> list(d)
['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr']
如果你想让二进制文件和文本文件都正确,特别是在3.x中,它有点棘手。但是如果只需要为一个或另一个(以及一种语言或另一种语言)工作,你可以忽略它。
If you want to get it right for both binary and text files, especially in 3.x, it's a bit trickier. But if it only has to work for one or the other (and one language or the other), you can ignore that.
同样,如果你使用的是Python 3 .x(或使用Python 2.x中的 io
对象),并希望使用已在 BufferedIOBase中维护的缓冲区
而不是只是在缓冲区顶部放一个缓冲区,这比较棘手。 io
文档确实解释了如何做所有事情...但我不知道任何简单的例子,所以你真的必须阅读该页面的至少一半并浏览其余部分。 (当然,你可以直接使用原始文件......但如果你想找到unicode分隔符则不能......)
Likewise, if you're using Python 3.x (or using io
objects in Python 2.x), and want to make use of the buffers that are already being maintained in a BufferedIOBase
instead of just putting a buffer on top of the buffer, that's trickier. The io
docs do explain how to do everything… but I don't know of any simple examples, so you're really going to have to read at least half of that page and skim the rest. (Of course, you could just use the raw files directly… but not if you want to find unicode delimiters…)
这篇关于如何从python中的文件中读取自定义分隔符终止的记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!