对于不规则的分隔符，如何使 Pandas read_csv 中的分隔符更灵活，wrt 空格? [英] How to make separator in pandas read_csv more flexible wrt whitespace, for irregular separators?

查看：31 发布时间：2021/12/3 8:42:49 python csv pandas dataframe whitespace

本文介绍了对于不规则的分隔符，如何使 Pandas read_csv 中的分隔符更灵活，wrt 空格?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要使用 read_csv 方法通过从文件中读取数据来创建数据框.然而，分隔符不是很规则:一些列用制表符 ( ) 分隔，其他的用空格分隔.此外，某些列可以由 2 个或 3 个或更多空格分隔，甚至可以由空格和制表符的组合分隔(例如 3 个空格、两个制表符然后是 1 个空格).

有没有办法告诉pandas正确处理这些文件?

顺便说一句，如果我使用Python，我就没有这个问题.我用:

 用于文件中的行(file_name):fld = line.split()

它完美无缺.它不关心字段之间是否有 2 或 3 个空格.即使是空格和制表符的组合也不会造成任何问题.大熊猫也可以吗?

解决方案

来自文档，您可以使用正则表达式或 delim_whitespace:

<预><代码>>>>将熊猫导入为 pd>>>对于打开的行(whitespace.csv"):... 打印 repr(line)...'a b c 1 2 ''d e f 3 4 '>>>pd.read_csv("whitespace.csv", header=None, delimiter=r"s+")0 1 2 3 40 a b c 1 21 d f 3 4>>>pd.read_csv("whitespace.csv", header=None, delim_whitespace=True)0 1 2 3 40 a b c 1 21 d f 3 4

I need to create a data frame by reading in data from a file, using read_csv method. However, the separators are not very regular: some columns are separated by tabs ( ), other are separated by spaces. Moreover, some columns can be separated by 2 or 3 or more spaces or even by a combination of spaces and tabs (for example 3 spaces, two tabs and then 1 space).

Is there a way to tell pandas to treat these files properly?

By the way, I do not have this problem if I use Python. I use:

for line in file(file_name):
   fld = line.split()

And it works perfect. It does not care if there are 2 or 3 spaces between the fields. Even combinations of spaces and tabs do not cause any problem. Can pandas do the same?

解决方案

From the documentation, you can use either a regex or delim_whitespace:

>>> import pandas as pd
>>> for line in open("whitespace.csv"):
...     print repr(line)
...     
'a	  b	c 1 2
'
'd	  e	f 3 4
'
>>> pd.read_csv("whitespace.csv", header=None, delimiter=r"s+")
   0  1  2  3  4
0  a  b  c  1  2
1  d  e  f  3  4
>>> pd.read_csv("whitespace.csv", header=None, delim_whitespace=True)
   0  1  2  3  4
0  a  b  c  1  2
1  d  e  f  3  4

这篇关于对于不规则的分隔符，如何使 Pandas read_csv 中的分隔符更灵活，wrt 空格?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对于不规则的分隔符，如何使 Pandas read_csv 中的分隔符更灵活，wrt 空格? [英] How to make separator in pandas read_csv more flexible wrt whitespace, for irregular separators?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

对于不规则的分隔符，如何使 Pandas read_csv 中的分隔符更灵活，wrt 空格? [英] How to make separator in pandas read_csv more flexible wrt whitespace, for irregular separators?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭