在使用Python解析的csv文件中处理额外的换行符(回车)? [英] Handling extra newlines (carriage returns) in csv files parsed with Python?
问题描述
我有一个CSV文件,其字段包含换行符,例如:
A,B,C,D,E, F
123,456,tree
,very,bla,indigo
在这种情况下,第二行的第三个字段是tree \\\
我试过以下:
import csv
catalog = csv.reader(open('test.csv','rU'),delimiter =,,dialect = csv.excel_tab)$ b b在目录中的行:
printLength:,len(row),row
,结果我是这样的:
长度:6 ['A','B','C' D','E','F']
长度:3 ['123','456','tree']
长度:4 ['','very','bla' 'indigo']
有没有人知道如何快速删除多余的换行符?
谢谢!
假设你有这个Excel电子表格:
注意:
- C2中的多行单元格;
- 在C1和D3中嵌入了逗号;
- 空白单元格和在D4中有空格的单元格。
Excel,您将得到此csv文件:
A1,B1,C1,+逗号,D1
pre>
, B2,line 1
line 2,D2
,, C3,D3,+ comma
,,, D4 space
你可能想把它读入Python,空白单元格仍然有意义,并且嵌入的逗号被正确处理。
这样:
,'rU')as csvIN:
outCSV =(在csv.reader中的行(csvIN,dialect ='excel'))
在outCSV中的行:
print(Length:,len(row),row)
正确生成4x4列表矩阵表示在Excel中:
长度:4 ['A1','B1','C1,+逗号','D1 ']
长度:4 ['','B2','line 1 \\\
line 2','D2']
长度:4 ['','','C3','D3 ,+ comma']
长度:4 ['','','','D4 space']
b $ b您发布的示例CSV文件缺少围绕字段的引号,带有额外的换行符,表示该换行符的含义不明确。是新行还是多行字段?
因此,您只能解释此csv文件:
A ,B,C,D,E,F
123,456,tree
,very,bla,indigo
$ b b
作为一维列表,如下所示:
with open(test.csv rb'')cbvb:$ c
$ c
outCSV = [field.strip()for csv.reader(csvIN,delimiter =',') code>其中生成此一维列表:
['A','B','C','D','E','F','123','456','tree','very','bla' indigo']
然后可以根据需要将其解释和重组到任何子组。
python中的惯用重组方法使用 zip 像这样:
>>> zip(* [iter(outCSV)] * 6)
[('A','B','C','D','E','F'), ','tree','very','bla','indigo')]
<如果你想要列表的列表,这也是惯用的:>>范围(0,len(outCSV),6)中的i的[outCSV [i:i + 6]]
[['A','B','C','D','E' 'F'],['123','456','tree','very','bla','indigo']]
如果您可以更改CSV文件的创建方式,那么解释它的含义会更加模糊。
I have a CSV file that has fields that contain newlines e.g.:
A, B, C, D, E, F 123, 456, tree , very, bla, indigo
(In this case third field in the second row is "tree\n"
I tried the following:
import csv catalog = csv.reader(open('test.csv', 'rU'), delimiter=",", dialect=csv.excel_tab) for row in catalog: print "Length: ", len(row), row
and the result I got was this:
Length: 6 ['A', ' B', ' C', ' D', ' E', ' F'] Length: 3 ['123', ' 456', ' tree'] Length: 4 [' ', ' very', ' bla', ' indigo']
Does anyone have any idea how I can quickly remove extraneous newlines?
Thanks!
解决方案Suppose you have this Excel spreadsheet:
Note:
- the multi-line cell in C2;
- embedded comma in C1 and D3;
- blank cells, and cell with a space in D4.
Saving that as CSV in Excel, you will get this csv file:
A1,B1,"C1,+comma",D1 ,B2,"line 1 line 2",D2 ,,C3,"D3,+comma" ,,,D4 space
Assumably, you will want to read that into Python with the blank cells still having meaning and the embedded comma treated correctly.
So, this:
with open("test.csv", 'rU') as csvIN: outCSV=(line for line in csv.reader(csvIN, dialect='excel')) for row in outCSV: print("Length: ", len(row), row)
correctly produces the 4x4 List of List matrix represented in Excel:
Length: 4 ['A1', 'B1', 'C1,+comma', 'D1'] Length: 4 ['', 'B2', 'line 1\nline 2', 'D2'] Length: 4 ['', '', 'C3', 'D3,+comma'] Length: 4 ['', '', '', 'D4 space']
The example CSV file you posted lacks quotes around the field with an 'extra newline' rendering the meaning of that newline ambiguous. Is it a new row or a multi-line field?
Therefor, you can only interpret this csv file:
A, B, C, D, E, F 123, 456, tree , very, bla, indigo
as a one dimension list like so:
with open("test.csv", 'rU') as csvIN: outCSV=[field.strip() for row in csv.reader(csvIN, delimiter=',') for field in row if field]
Which produces this one dimensional list:
['A', 'B', 'C', 'D', 'E', 'F', '123', '456', 'tree', 'very', 'bla', 'indigo']
This can then be interpreted and regrouped into any sub grouping as you wish.
The idiomatic regrouping method in python uses zip like so:
>>> zip(*[iter(outCSV)]*6) [('A', 'B', 'C', 'D', 'E', 'F'), ('123', '456', 'tree', 'very', 'bla', 'indigo')]
Or, if you want a list of lists, this is also idiomatic:
>>> [outCSV[i:i+6] for i in range(0, len(outCSV),6)] [['A', 'B', 'C', 'D', 'E', 'F'], ['123', '456', 'tree', 'very', 'bla', 'indigo']]
If you can change how your CSV file is created, it will be less ambiguous to interpret.
这篇关于在使用Python解析的csv文件中处理额外的换行符(回车)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!