在使用Python解析的csv文件中处理额外的换行符（回车）？ [英] Handling extra newlines (carriage returns) in csv files parsed with Python?

查看：3657 发布时间：2017/2/24 17:26:35 python csv newline

本文介绍了在使用Python解析的csv文件中处理额外的换行符（回车）？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个CSV文件，其字段包含换行符，例如：

  A，B，C，D，E， F 
 123，456，tree 
，very，bla，indigo

在这种情况下，第二行的第三个字段是tree \\\

我试过以下：

  import csv 
 catalog = csv.reader（open（'test.csv'，'rU'），delimiter =，，dialect = csv.excel_tab）$ b b在目录中的行：
 printLength：，len（row），row

，结果我是这样的：

 长度：6 ['A'，'B'，'C' D'，'E'，'F'] 
长度：3 ['123'，'456'，'tree'] 
长度：4 [''，'very'，'bla' 'indigo']

有没有人知道如何快速删除多余的换行符？

谢谢！

解决方案

假设你有这个Excel电子表格：

注意：

C2中的多行单元格;

在C1和D3中嵌入了逗号;

空白单元格和在D4中有空格的单元格。

Excel，您将得到此csv文件：

  A1，B1，C1，+逗号，D1 
， B2，line 1 
 line 2，D2 
 ,, C3，D3，+ comma
 ,,, D4 space 
  pre> 
 
 你可能想把它读入Python，空白单元格仍然有意义，并且嵌入的逗号被正确处理。 
 
 
 这样：
  ，'rU'）as csvIN：
 outCSV =（在csv.reader中的行（csvIN，dialect ='excel'））
 
在outCSV中的行：
 print（Length：，len（row），row）
  
正确生成4x4列表矩阵表示在Excel中：
 长度：4 ['A1'，'B1'，'C1，+逗号'，'D1 '] 
长度：4 [''，'B2'，'line 1 \\\
line 2'，'D2'] 
长度：4 [''，''，'C3'，'D3 ，+ comma'] 
长度：4 [''，''，''，'D4 space'] 
  
 
 b $ b 
您发布的示例CSV文件缺少围绕字段的引号，带有额外的换行符，表示该换行符的含义不明确。是新行还是多行字段？ 
 
 
 因此，您只能解释此csv文件：
  A ，B，C，D，E，F 
 123，456，tree 
，very，bla，indigo 
  
 $ b b 
 
作为一维列表，如下所示：
  with open（test.csv rb''）cbvb：$ c 
 $ c 
 outCSV = [field.strip（）for csv.reader（csvIN，delimiter ='，'） code> 
其中生成此一维列表：
  ['A'，'B'，'C'，'D'，'E'，'F'，'123'，'456'，'tree'，'very'，'bla' indigo'] 
  
然后可以根据需要将其解释和重组到任何子组。 
 
 
  python中的惯用重组方法使用 zip 像这样：
 >>> zip（* [iter（outCSV）] * 6）
 [（'A'，'B'，'C'，'D'，'E'，'F'）， '，'tree'，'very'，'bla'，'indigo'）] 
  
 
 
 <如果你想要列表的列表，这也是惯用的：
 >>范围（0，len（outCSV），6）中的i的[outCSV [i：i + 6]] 
 [['A'，'B'，'C'，'D'，'E' 'F']，['123'，'456'，'tree'，'very'，'bla'，'indigo']] 
  
如果您可以更改CSV文件的创建方式，那么解释它的含义会更加模糊。 
 
I have a CSV file that has fields that contain newlines e.g.:
A, B, C, D, E, F
123, 456, tree
, very, bla, indigo
(In this case third field in the second row is "tree\n"

I tried the following:
import csv
catalog = csv.reader(open('test.csv', 'rU'), delimiter=",", dialect=csv.excel_tab)
for row in catalog:
    print "Length: ", len(row), row
and the result I got was this:
Length:  6 ['A', ' B', ' C', ' D', ' E', ' F']
Length:  3 ['123', ' 456', ' tree']
Length:  4 ['   ', ' very', ' bla', ' indigo']
Does anyone have any idea how I can quickly remove extraneous newlines?

Thanks!
 解决方案 
Suppose you have this Excel spreadsheet:



Note:

the multi-line cell in C2; 
embedded comma in C1 and D3; 
blank cells, and cell with a space in D4.
Saving that as CSV in Excel, you will get this csv file:
A1,B1,"C1,+comma",D1
,B2,"line 1
line 2",D2
,,C3,"D3,+comma"
,,,D4 space
Assumably, you will want to read that into Python with the blank cells still having meaning and the embedded comma treated correctly. 

So, this:  
with open("test.csv", 'rU') as csvIN:
    outCSV=(line for line in csv.reader(csvIN, dialect='excel'))

    for row in outCSV:
        print("Length: ", len(row), row) 
correctly produces the 4x4 List of List matrix represented in Excel:
Length:  4 ['A1', 'B1', 'C1,+comma', 'D1']
Length:  4 ['', 'B2', 'line 1\nline 2', 'D2']
Length:  4 ['', '', 'C3', 'D3,+comma']
Length:  4 ['', '', '', 'D4 space']
The example CSV file you posted lacks quotes around the field with an 'extra newline' rendering the meaning of that newline ambiguous. Is it a new row or a multi-line field? 

Therefor, you can only interpret this csv file:
A, B, C, D, E, F
123, 456, tree
, very, bla, indigo
as a one dimension list like so:
with open("test.csv", 'rU') as csvIN:
   outCSV=[field.strip() for row in csv.reader(csvIN, delimiter=',') 
              for field in row if field]
Which produces this one dimensional list:
['A', 'B', 'C', 'D', 'E', 'F', '123', '456', 'tree', 'very', 'bla', 'indigo']
This can then be interpreted and regrouped into any sub grouping as you wish. 

The idiomatic regrouping method in python uses zip like so:
>>> zip(*[iter(outCSV)]*6)
[('A', 'B', 'C', 'D', 'E', 'F'), ('123', '456', 'tree', 'very', 'bla', 'indigo')]
Or, if you want a list of lists, this is also idiomatic:  
>>> [outCSV[i:i+6] for i in range(0, len(outCSV),6)]
[['A', 'B', 'C', 'D', 'E', 'F'], ['123', '456', 'tree', 'very', 'bla', 'indigo']]
If you can change how your CSV file is created, it will be less ambiguous to interpret. 

                        这篇关于在使用Python解析的csv文件中处理额外的换行符（回车）？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在使用Python解析的csv文件中处理额外的换行符（回车）？ [英] Handling extra newlines (carriage returns) in csv files parsed with Python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在使用Python解析的csv文件中处理额外的换行符（回车）？ [英] Handling extra newlines (carriage returns) in csv files parsed with Python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭