使用 Pandas 读取制表符分隔的字段，某些行具有多个制表符 [英] Read tab-delimited fields with pandas, some lines with more than one tabs

查看：118 发布时间：2021/6/13 20:19:38 pandas

本文介绍了使用 Pandas 读取制表符分隔的字段，某些行具有多个制表符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 Pandas 读取制表符分隔的 txt 文件.该文件如下所示:

I am trying to read a tab separated txt file using Pandas. The file looks like this:

14.38   14.21   0.8951  5.386   3.312   2.462   4.956   1<p>
14.69   14.49   0.8799  5.563   3.259   3.586   5.219   1<p>
14.11   14.12   0.8911  5.422   3.302   2.723  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   &nbsp;&nbsp;&nbsp;       5        1<p>

有些行有额外的制表符.如果我使用 read_csv 或 read_fwf，并指定 sep='\t'.我得到的结果如下:

Some lines have extra tabs. If I used read_csv or read_fwf, and specify sep='\t'. I got results look like this:

0   15.26\t14.84\t0.871\t5.763\t3.312\t2.221\t5.22\t1<p>
1   14.88\t14.57\t0.8811\t5.554\t3.333\t1.018\t4.9 <p>

对于我可以指定哪些参数来处理这个问题，您有什么建议吗?谢谢.

Do you have any suggestions as for what parameters I could specify to deal with this problems? Thanks.

解决方案:

使用 pd.read_csv(filename, delim_whitespace=True)

use pd.read_csv(filename, delim_whitespace=True)

推荐答案

Pandas read_csv 非常通用，你可以用它和 delim_whitespace = True 来处理可变数量的空格.

Pandas read_csv is very versatile, you can use it with delim_whitespace = True to handle variable number of whitespaces.

df = pd.read_csv(filename, delim_whitespace=True)

选项 2:使用分隔符参数

Option 2: Use separator argument

df = pd.read_csv(filename, sep='\t+')

这篇关于使用 Pandas 读取制表符分隔的字段，某些行具有多个制表符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Pandas 读取制表符分隔的字段，某些行具有多个制表符 [英] Read tab-delimited fields with pandas, some lines with more than one tabs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Pandas 读取制表符分隔的字段，某些行具有多个制表符 [英] Read tab-delimited fields with pandas, some lines with more than one tabs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭