Python将\r识别为行分隔符 [英] Python recognizing \r as a line delimiter
问题描述
我在Ubuntu盒子上运行的Python脚本中使用 import fileinput
。
我在命令行上运行脚本,其中包含 python myscript.py firstinputfile.txt secondinputfile.txt
和内部 myscript.py
我使用作为fileinput.input()
中的行来迭代线。我遇到的问题是 firstinputfile.txt
和 secondinputfile.txt
都使用Macintosh( \ r
)行结尾,并且 fileinput.input()
似乎不能识别 \r
作为行分隔符。有什么办法强制 fileinput
来识别 \r
作为行分隔符?
我已经考虑过预处理 firstinputfile.txt
和 secondinputfile.txt
使用 \\\
行结束,但是犹豫有两个原因:i)我真的不想发出额外的文件来管理和ii)我仍然希望输入
fileinput
来自文件参数(而不是 stdin
在管道命令之后),所以我可以使用 fileinput.filename()
和 fileinput.filelineno()
。
有什么建议?
事实证明 fileinput.input()支持可选的 openhook
参数:
lockquote>
你可以通过
openhook参数来控制打开文件的方式来打开文件input.input()或FileInput()。钩子
必须是一个带有两个参数,文件名和模式的函数,
返回一个相应打开的类文件对象。这个模块已经提供了两个有用的钩子
。
另外,通用换行支持文档建议可以打开一个文件来支持Windows / Unix / Macintosh换行符<$
使用'U'或'rU'模式打开文件以通用换行模式打开读取
的文件。所有三行结束约定将被
转换为由诸如read()和readline()等各种文件
方法返回的字符串中的\ n。
所以,你可以编写一个小函数作为 openhook
参数传递,这个参数将以一种方式打开文件它支持通用的换行符:
$ $ $ $ $ $ $ $ $ $ $ $ $ $ def $ univ_file_read(name,mode)
警告忽略传递给这个模式参数函数
return open(name,'rU')
然后,对于在fileinput.input()中的行,
:
使用:
pre $ 用于fileinput.input(openhook = univ_file_read)中的行:
这对我来说似乎已经成功了, \r
现在被识别为行分隔符。
I'm using import fileinput
in a Python script running on an Ubuntu box.
I'm running the script on the command line with something along the lines of python myscript.py firstinputfile.txt secondinputfile.txt
and inside myscript.py
I am using for line in fileinput.input()
to iterate over the lines. The problem I'm running into is that firstinputfile.txt
and secondinputfile.txt
both use Macintosh (\r
) line endings, and fileinput.input()
does not seem to be recognizing \r
as a line delimiter.
Is there any way to force fileinput
to recognize \r
as a line delimiter?
I've considered preprocessing firstinputfile.txt
and secondinputfile.txt
to use \n
line endings, but am hesitant for two reasons: i) I don't really want to emit additional files to manage and ii) I still want the input to fileinput
to come from file arguments (not stdin
after piping commands) so I can use fileinput.filename()
and fileinput.filelineno()
.
Any suggestions?
It turns out fileinput.input() supports an optional openhook
parameter:
You can control how files are opened by providing an opening hook via the openhook parameter to fileinput.input() or FileInput(). The hook must be a function that takes two arguments, filename and mode, and returns an accordingly opened file-like object. Two useful hooks are already provided by this module.
Furthermore, the universal newline support document suggests that a file can be open to support Windows/Unix/Macintosh newlines with the rU
mode:
Opening a file with the mode 'U' or 'rU' will open a file for reading in universal newline mode. All three line ending conventions will be translated to a "\n" in the strings returned by the various file methods such as read() and readline().
So, you can write a little function to pass as the openhook
argument that will open the file in a manner which supports universal newlines:
def univ_file_read(name, mode):
# WARNING: ignores mode argument passed to this function
return open(name, 'rU')
Then, instead of:
for line in fileinput.input():
Use:
for line in fileinput.input(openhook=univ_file_read):
This seems to have done the trick for me, and \r
is being recognized as a line delimiter now.
这篇关于Python将\r识别为行分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!