在 read_csv 中为 Pandas 索引列指定转换器 [英] Specify converter for Pandas index column in read_csv

查看:55
本文介绍了在 read_csv 中为 Pandas 索引列指定转换器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取索引列中包含十六进制数字的 CSV 文件:

I am attempting to read in a CSV file with hexadecimal numbers in the index column:

InputBits, V0, V1, V2, V3
7A, 0.000594457716, 0.000620631282, 0.000569834178, 0.000625374384, 
7B, 0.000601155649, 0.000624282078, 0.000575955914, 0.000632111367, 
7C, 0.000606026872, 0.000629149805, 0.000582689823, 0.000634561234, 
7D, 0.000612115902, 0.000634625998, 0.000584526357, 0.000638235952, 
7E, 0.000615769413, 0.000637668328, 0.000590648093, 0.00064987256, 
7F, 0.000620640637, 0.000643144494, 0.000594933308, 0.000650485013, 

我可以使用以下代码:

df = pd.read_csv('data.csv', index_col=False,
                 converters={'InputBits': lambda x: int(x, 16)})
df.set_index('InputBits', inplace=True)

问题是这看起来不必要地笨重.有没有办法做一些等同于以下的事情?

The problem is that this seems unnecessarily clunky. Is there a way to do something equivalent to the following?

df = pd.read_csv('data.csv', converters={'InputBits': lambda x: int(x, 16)})

这失败了,因为 InputBits 现在是带有

This fails because InputBits is now the first data column with

ValueError: invalid literal for int() with base 16: ' 0.000594457716'

推荐答案

正如@root 在这里指出的那样,这个例子中的问题是标题与列名和列值的不对齐,这些列名和列值都有一个尾随逗号.事实上,文档 处理了这种特定情况:

As @root pointed out here, the issue in this example is the misalignment of the header with the column names and the column values, which all have a trailing comma. In fact, the documentation deals with this specific scenario:

如果你有一个格式错误的文件,每行末尾都有分隔符,你可以考虑 index_col=False 强制熊猫使用第一列作为索引(行名)

If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names)

这里的解决方案是先运行

The solution here was first to run

sed -i 's/, \r$//' data.csv

去掉最后的逗号(和 Windows 行尾).然后,预期的命令几乎开箱即用:

to get rid of the final commas (and Windows line endings). Then, the expected command works almost out of the box:

pd.read_csv('data.csv', index_col='InputBits',
             converters={'InputBits': lambda x: int(x, 16)})

这篇关于在 read_csv 中为 Pandas 索引列指定转换器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆