pandas 读csv跳过了几行 [英] Pandas read csv skips some lines

查看:98
本文介绍了 pandas 读csv跳过了几行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

遵循我的一个旧的问题。我终于确定会发生什么情况。

Following an old question of mine. I finally identified what happens.

我有一个csv文件,其中包含了Sperator \t 并进行读取使用以下命令:

I have a csv-file which has the sperator \t and reading it with the following command:

df = pd.read_csv(r'C:\..\file.csv', sep='\t', encoding='unicode_escape')

例如,长度为:800.000

the length for example is: 800.000

问题是原始文件大约有1.400.000行,而且我也知道问题出在哪里,其中一列(比方说columnA)具有以下条目:

The problem is the original file has around 1.400.000 lines, and I also know where the issue occures, one column (let's say columnA) has the following entry:

"HILFE FüR DIE Alten

您知道发生了什么吗?删除该行时,我得到正确的行数(长度),python在这里做什么?

Do you have any idea what is happening? When I delete that row I get the correct number of lines (length), what is python doing here?

推荐答案

根据熊猫文档 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html


sep:str,默认为','
要使用的分隔符。如果sep为None,则C引擎无法自动检测到分隔符,但Python解析引擎可以,这意味着将使用后者,并通过Python的内置嗅探器工具csv.Sniffer自动检测到分隔符。此外,超过1个字符且与 s +不同的分隔符将被解释为正则表达式,并且还将强制使用Python解析引擎。注意,正则表达式定界符易于忽略引用的数据。正则表达式示例: \r\t。

sep : str, default ‘,’ Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

这可能是带有双引号符号的问题。
尝试以下操作:

It may be issue with double quotes symbol. Try this instead:

df = pd.read_csv(r'C:\..\file.csv', sep='\\t', encoding='unicode_escape', engine='python')

或此:

df = pd.read_csv(r'C:\..\file.csv', sep=r'\t', encoding='unicode_escape')

这篇关于 pandas 读csv跳过了几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆