pandas 读csv忽略换行符 [英] pandas read csv ignore newline

查看:178
本文介绍了 pandas 读csv忽略换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集(对于在那里的compbio人来说,这是一个FASTA),上面充斥着换行符,而不是数据的定界符.

i have a dataset (for compbio people out there, it's a FASTA) that is littered with newlines, that don't act as a delimiter of the data.

使用任何熊猫读取功能,有没有办法让熊猫在导入时忽略换行符?

Is there a way for pandas to ignore newlines when importing, using any of the pandas read functions?

样本数据:

> ERR899297.10000174TGTAATATTGCCTGTAGCGGGAGTTGTTGTCTCAGGATCAGCATTATATATCTCAATTGCATGAATCATCGTATTAATGCTATCAAGATCAGCCGATTCT

>ERR899297.10000174 TGTAATATTGCCTGTAGCGGGAGTTGTTGTCTCAGGATCAGCATTATATATCTCAATTGCATGAATCATCGTATTAATGC TATCAAGATCAGCCGATTCT

每个条目均以>"分隔数据由换行符分隔(仅限于全球,但实际上并未得到尊重)每行80个字符)

every entry is delimited by the ">" data is split by newlines (limited to, but not actually respected worldwide with 80 chars per line)

推荐答案

您需要有另一个标志,该标志会在您确实要更改元组时告诉熊猫.

You need to have another sign which will tell pandas when you do actually want to change of tuple.

例如,我在这里创建一个文件,其中新行由竖线(|)编码:

Here for example I create a file where the new line is encoded by a pipe (|) :

csv = """
col1,col2, col3, col4|
first_col_first_line,2nd_col_first_line,
3rd_col_first_line

de,4rd_col_first_line|
"""
with open("test.csv", "w") as f:
    f.writelines(csv)

然后使用C引擎读取它,并将管道精确地用作换行符:

Then you read it with the C engine and precise the pipe as the lineterminator :

import pandas as pd
pd.read_csv("test.csv",lineterminator="|", engine="c")

这给了我:

这篇关于 pandas 读csv忽略换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆