从CSV文件中删除换行符 [英] Removing newline from a csv file

查看:1760
本文介绍了从CSV文件中删除换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在python中处理csv文件,该文件在每行/每行的中间都有^ M字符,这是换行符.我无法以"rU"以外的任何方式打开文件.

I am trying to process a csv file in python that has ^M character in the middle of each row/line which is a newline. I cant open the file in any mode other than 'rU'.

如果我确实以"rU"模式打开文件,它将读取换行符并拆分文件(创建换行符),并为我提供两倍的行数.

If I do open the file in the 'rU' mode, it reads in the newline and splits the file (creating a newline) and gives me twice the number of rows.

我想完全删除换行符.怎么样?

I want to remove the newline altogether. How?

推荐答案

请注意,作为文档说:

csvfile 可以是任何支持迭代器协议的对象,并且每次调用其next()方法时都返回一个字符串-文件对象和列表对象均适用.

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.

因此,您始终可以在文件上粘贴过滤器,然后再将其交给readerDictReader.代替这个:

So, you can always stick a filter on the file before handing it to your reader or DictReader. Instead of this:

with open('myfile.csv', 'rU') as myfile:
    for row in csv.reader(myfile):

执行此操作:

with open('myfile.csv', 'rU') as myfile:
    filtered = (line.replace('\r', '') for line in myfile)
    for row in csv.reader(filtered):

'\r'^M的Python(和C)拼写方式.因此,通过用空字符串替换每个^M字符,无论它们出现在何处,都可以将所有^M字符剥离掉.

That '\r' is the Python (and C) way of spelling ^M. So, this just strips all ^M characters out, no matter where they appear, by replacing each one with an empty string.

我想我想永久修改文件而不是过滤文件.

I guess I want to modify the file permanently as opposed to filtering it.

首先,如果要在运行Python脚本之前修改文件,为什么不从Python外部进行修改呢? sedtr,许多文本编辑器等都可以为您完成此操作.这是一个GNU sed示例:

First, if you want to modify the file before running your Python script on it, why not do that from outside of Python? sed, tr, many text editors, etc. can all do this for you. Here's a GNU sed example:

gsed -i'' 's/\r//g' myfile.csv

但是,如果您想使用Python编写代码,它就没有那么冗长了,您可能会发现它更具可读性,所以:

But if you want to do it in Python, it's not that much more verbose, and you might find it more readable, so:

首先,如果要从中间插入或删除文件,则无法真正就地修改文件.通常的解决方案是编写一个新文件,然后将新文件移到旧文件上(仅限Unix)或删除旧文件(跨平台).

First, you can't really modify a file in-place if you want to insert or delete from the middle. The usual solution is to write a new file, and either move the new file over the old one (Unix only) or delete the old one (cross-platform).

跨平台版本:

os.rename('myfile.csv', 'myfile.csv.bak')
with open('myfile.csv.bak', 'rU') as infile, open('myfile.csv', 'wU') as outfile:
    for line in infile:
        outfile.write(line.replace('\r'))
os.remove('myfile.csv.bak')

不太笨拙,但仅适用于Unix的版本:

The less-clunky, but Unix-only, version:

temp = tempfile.NamedTemporaryFile(delete=False)
with open('myfile.csv', 'rU') as myfile, closing(temp):
    for line in myfile:
        temp.write(line.replace('\r'))
os.rename(tempfile.name, 'myfile.csv')

这篇关于从CSV文件中删除换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆