从CSV文件中删除换行符 [英] Removing newline from a csv file
问题描述
我正在尝试在python中处理csv文件,该文件在每行/每行的中间都有^ M字符,这是换行符.我无法以"rU"以外的任何方式打开文件.
I am trying to process a csv file in python that has ^M character in the middle of each row/line which is a newline. I cant open the file in any mode other than 'rU'.
如果我确实以"rU"模式打开文件,它将读取换行符并拆分文件(创建换行符),并为我提供两倍的行数.
If I do open the file in the 'rU' mode, it reads in the newline and splits the file (creating a newline) and gives me twice the number of rows.
我想完全删除换行符.怎么样?
I want to remove the newline altogether. How?
推荐答案
请注意,作为文档说:
csvfile 可以是任何支持迭代器协议的对象,并且每次调用其
next()
方法时都返回一个字符串-文件对象和列表对象均适用.
csvfile can be any object which supports the iterator protocol and returns a string each time its
next()
method is called — file objects and list objects are both suitable.
因此,您始终可以在文件上粘贴过滤器,然后再将其交给reader
或DictReader
.代替这个:
So, you can always stick a filter on the file before handing it to your reader
or DictReader
. Instead of this:
with open('myfile.csv', 'rU') as myfile:
for row in csv.reader(myfile):
执行此操作:
with open('myfile.csv', 'rU') as myfile:
filtered = (line.replace('\r', '') for line in myfile)
for row in csv.reader(filtered):
'\r'
是^M
的Python(和C)拼写方式.因此,通过用空字符串替换每个^M
字符,无论它们出现在何处,都可以将所有^M
字符剥离掉.
That '\r'
is the Python (and C) way of spelling ^M
. So, this just strips all ^M
characters out, no matter where they appear, by replacing each one with an empty string.
我想我想永久修改文件而不是过滤文件.
I guess I want to modify the file permanently as opposed to filtering it.
首先,如果要在运行Python脚本之前修改文件,为什么不从Python外部进行修改呢? sed
,tr
,许多文本编辑器等都可以为您完成此操作.这是一个GNU sed示例:
First, if you want to modify the file before running your Python script on it, why not do that from outside of Python? sed
, tr
, many text editors, etc. can all do this for you. Here's a GNU sed example:
gsed -i'' 's/\r//g' myfile.csv
但是,如果您想使用Python编写代码,它就没有那么冗长了,您可能会发现它更具可读性,所以:
But if you want to do it in Python, it's not that much more verbose, and you might find it more readable, so:
首先,如果要从中间插入或删除文件,则无法真正就地修改文件.通常的解决方案是编写一个新文件,然后将新文件移到旧文件上(仅限Unix)或删除旧文件(跨平台).
First, you can't really modify a file in-place if you want to insert or delete from the middle. The usual solution is to write a new file, and either move the new file over the old one (Unix only) or delete the old one (cross-platform).
跨平台版本:
os.rename('myfile.csv', 'myfile.csv.bak')
with open('myfile.csv.bak', 'rU') as infile, open('myfile.csv', 'wU') as outfile:
for line in infile:
outfile.write(line.replace('\r'))
os.remove('myfile.csv.bak')
不太笨拙,但仅适用于Unix的版本:
The less-clunky, but Unix-only, version:
temp = tempfile.NamedTemporaryFile(delete=False)
with open('myfile.csv', 'rU') as myfile, closing(temp):
for line in myfile:
temp.write(line.replace('\r'))
os.rename(tempfile.name, 'myfile.csv')
这篇关于从CSV文件中删除换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!