如何使用pandas.read_csv()将索引数据读取为字符串? [英] How to read index data as string with pandas.read_csv()?

查看:1477
本文介绍了如何使用pandas.read_csv()将索引数据读取为字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将csv文件作为带有pandas的DataFrame读取,我想将索引行读为字符串。但是,由于索引的行没有任何字符,因此pandas将此数据作为整数处理。如何读作字符串?

I'm trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn't have any characters, pandas handles this data as integer. How to read as string?

这是我的csv文件和代码:

Here are my csv file and code:

[sample.csv]    
    uid,f1,f2,f3
    01,0.1,1,10
    02,0.2,2,20
    03,0.3,3,30

[code]
df = pd.read_csv('sample.csv', index_col="uid" dtype=float)
print df.index.values

结果:df.index是整数,而不是字符串:

The result: df.index is integer, not string:

>>> [1 2 3]

但我想把df.index作为字符串:

But I want to get df.index as string:

>>> ['01', '02', '03']

还有一个附加条件:其余的索引数据必须是数值,它们实际上太多了,我不能用特定的列名指出它们。

And an additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.

推荐答案

传递 dtype param以指定dtype:

pass dtype param to specify the dtype:

In [159]:
import pandas as pd
import io
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
df.set_index('uid', inplace=True)
df.index

Out[159]:
Index(['01', '02', '03'], dtype='object', name='uid')

所以在你的情况下以下应该工作

df = pd.read_csv('sample.csv', dtype={'uid':str})
df.set_index('uid', inplace=True)

单行等价物没有工作,由于仍然出色的 pandas bug ,这里有dtype在被视为索引的cols上忽略param **:

The one-line equivalent doesn't work, due to a still-outstanding pandas bug here where the dtype param is ignored on cols that are to be treated as the index**:

df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')

你可以动态如果我们假设第一列是索引列,请执行此操作:

You can dynamically do this if we assume the first column is the index column:

In [171]:
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
df = pd.read_csv(io.StringIO(t), dtype=dtypes)
df.set_index('uid', inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 01 to 03
Data columns (total 3 columns):
f1    3 non-null float64
f2    3 non-null float64
f3    3 non-null float64
dtypes: float64(3)
memory usage: 96.0+ bytes

In [172]:
df.index

Out[172]:
Index(['01', '02', '03'], dtype='object', name='uid')

这里我们只读取标题行以获取列名:

Here we read just the header row to get the column names:

cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()

然后我们生成带有所需dtypes的列名的dict:

we then generate dict of the column names with the desired dtypes:

index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str

我们得到索引名称,假设它是第一个条目,然后从其余的cols创建一个dict并分配 float 作为所需的dtype并添加指定的索引col键入 str ,然后您可以将此作为 dtype 参数传递给 read_csv

we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign float as the desired dtype and add the index col specifying the type to be str, you can then pass this as the dtype param to read_csv

这篇关于如何使用pandas.read_csv()将索引数据读取为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆