双引号元素在csv无法读取与 pandas [英] double quoted elements in csv cant read with pandas

查看：958 发布时间：2017/2/24 22:38:48 python csv pandas

本文介绍了双引号元素在csv无法读取与 pandas 的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个输入文件，其中每个值都存储为字符串。
它在一个csv文件内，每个条目都在双引号内。

示例文件：

 column1，column2，column3，column4，column5，column6
AM，07，1，SD ，PR，SD，SD，PR，SD，SD，SD，CR，
AM，08 SD
AM，01，2，SD，SD，SD

只有六列。我需要输入哪些选项来读取pandas read_csv以正确读取？

我目前正在尝试：

  import pandas as pd 
 df = pd.read_csv（file，quotechar =''）

但是这给我错误信息：
CParserError：错误标记数据C错误：第3行中的第6个字段，第14行

这显然意味着它忽略了，并将每个逗号分析为一个字段。
然而，对于第3行，第3到第6列应该是带有逗号的字符串。（1,2,3，PR，SD，SD，PR，SD，SD，PR，SD，SD）

<我得到了pandas.read_csv来正确解析这个？

谢谢。

解决方案

p>这将工作。它回到python解析器（因为你有非常规的分隔符，例如它们是逗号，有时是空格）。如果你只有逗号，它会使用c解析器，并且更快。

 在[1]：import csv 
 
 In [2]：！cat test.csv 
column1，column2，column3，column4，column5，column6
AM ，07，1，SD，SD，CR
AM，08，1,2,3，PR，SD，SD ，SD，SD，SD，SD，SD，PR，SD，SD 3]：pd.read_csv（'test.csv'，sep ='，\s +'，quoting = csv.QUOTE_ALL）
 pandas / io / parsers.py：637：ParserWarning： '引擎，因为'c'引擎不支持regex分隔符;你可以通过指定engine ='python'来避免这个警告。 
 ParserWarning）
 Out [3]：
column1，column2column3column4column5column6
AM07 SDSDCR
AM081,2,3PR，SD，SDPR，SD，SDPR，SD，SD
AM012SDSDSD

I have an input file where every value is stored as a string. It is inside a csv file with each entry inside double quotes.

Example file:

"column1","column2", "column3", "column4", "column5", "column6"
"AM", "07", "1", "SD", "SD", "CR"
"AM", "08", "1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD"
"AM", "01", "2", "SD", "SD", "SD"

There are only six columns. What options do I need to enter to pandas read_csv to read this correctly?

I currently am trying:

import pandas as pd
df = pd.read_csv(file, quotechar='"')

but this gives me the error message: CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 14

Which obviously means that it is ignoring the '"' and parsing every comma as a field. However, for line 3, columns 3 through 6 should be strings with commas in them. ("1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD")

How do I get pandas.read_csv to parse this correctly?

Thanks.

解决方案

This will work. It falls back to the python parser (as you have non-regular separators, e.g. they are comma and sometimes space). If you only have commas it would use the c-parser and be much faster.

In [1]: import csv

In [2]: !cat test.csv
"column1","column2", "column3", "column4", "column5", "column6"
"AM", "07", "1", "SD", "SD", "CR"
"AM", "08", "1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD"
"AM", "01", "2", "SD", "SD", "SD"

In [3]: pd.read_csv('test.csv',sep=',\s+',quoting=csv.QUOTE_ALL)
pandas/io/parsers.py:637: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
  ParserWarning)
Out[3]: 
     "column1","column2" "column3"   "column4"   "column5"   "column6"
"AM"                "07"       "1"        "SD"        "SD"        "CR"
"AM"                "08"   "1,2,3"  "PR,SD,SD"  "PR,SD,SD"  "PR,SD,SD"
"AM"                "01"       "2"        "SD"        "SD"        "SD"

这篇关于双引号元素在csv无法读取与 pandas 的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

双引号元素在csv无法读取与 pandas [英] double quoted elements in csv cant read with pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

双引号元素在csv无法读取与 pandas [英] double quoted elements in csv cant read with pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭