使用 pandas read_csv读取此制表符分隔的文件时,行丢失 [英] Rows are lost when reading this tab-separated file with pandas read_csv

查看:96
本文介绍了使用 pandas read_csv读取此制表符分隔的文件时,行丢失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.text文件,格式如下,其中字段(索引号,名称和消息)由\t分隔(制表符分隔):

I have a .text file with following format, where fields (index number, name and message) are separated by \t (tab-separated):

712 ben     Battle of the Books
713 james   i used to be in TOM
714 tomy    i was in BOB once
715 ben Tournaments of Minds
716 tommy    Also the Lion in the upcoming school play
717 tommy   Can you guess
718 tommy    P
...

我用read_csv读取的

进入了数据框:

which I read with read_csv into a data frame:

 chat = pd.read_csv("f.text", sep = "\t", header = None, usecols = [2])

但是数据帧仅具有9812行,而普通文件具有超过12428行(仅21空行).这很奇怪.你有什么主意吗?谢谢.

But the data frame just has 9812 rows while the ordinary file has more than 12428 rows (just 21 empty lines). It is quite weird. Do you have any idea? Thanks.

推荐答案

我认为您需要添加参数quoting:

I think you need add parameter quoting:

import csv

chat = pd.read_csv("f.text",sep = "\t", header = None, usecols = [2], quoting=csv.QUOTE_NONE)

这篇关于使用 pandas read_csv读取此制表符分隔的文件时,行丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆