在 read_csv 中为 Pandas 索引列指定转换器 [英] Specify converter for Pandas index column in read_csv
问题描述
我正在尝试读取索引列中包含十六进制数字的 CSV 文件:
I am attempting to read in a CSV file with hexadecimal numbers in the index column:
InputBits, V0, V1, V2, V3
7A, 0.000594457716, 0.000620631282, 0.000569834178, 0.000625374384,
7B, 0.000601155649, 0.000624282078, 0.000575955914, 0.000632111367,
7C, 0.000606026872, 0.000629149805, 0.000582689823, 0.000634561234,
7D, 0.000612115902, 0.000634625998, 0.000584526357, 0.000638235952,
7E, 0.000615769413, 0.000637668328, 0.000590648093, 0.00064987256,
7F, 0.000620640637, 0.000643144494, 0.000594933308, 0.000650485013,
我可以使用以下代码:
df = pd.read_csv('data.csv', index_col=False,
converters={'InputBits': lambda x: int(x, 16)})
df.set_index('InputBits', inplace=True)
问题是这看起来不必要地笨重.有没有办法做一些等同于以下的事情?
The problem is that this seems unnecessarily clunky. Is there a way to do something equivalent to the following?
df = pd.read_csv('data.csv', converters={'InputBits': lambda x: int(x, 16)})
这失败了,因为 InputBits
现在是带有
This fails because InputBits
is now the first data column with
ValueError: invalid literal for int() with base 16: ' 0.000594457716'
推荐答案
正如@root 在这里指出的那样,这个例子中的问题是标题与列名和列值的不对齐,这些列名和列值都有一个尾随逗号.事实上,文档 处理了这种特定情况:
As @root pointed out here, the issue in this example is the misalignment of the header with the column names and the column values, which all have a trailing comma. In fact, the documentation deals with this specific scenario:
如果你有一个格式错误的文件,每行末尾都有分隔符,你可以考虑 index_col=False 强制熊猫不使用第一列作为索引(行名)
If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names)
这里的解决方案是先运行
The solution here was first to run
sed -i 's/, \r$//' data.csv
去掉最后的逗号(和 Windows 行尾).然后,预期的命令几乎开箱即用:
to get rid of the final commas (and Windows line endings). Then, the expected command works almost out of the box:
pd.read_csv('data.csv', index_col='InputBits',
converters={'InputBits': lambda x: int(x, 16)})
这篇关于在 read_csv 中为 Pandas 索引列指定转换器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!