使用 pandas 读取csv时设置列类型 [英] Setting column types while reading csv with pandas

查看:232
本文介绍了使用 pandas 读取csv时设置列类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试使用以下格式将 csv 文件读取到 pandas 数据框中

Trying to read csv file into pandas dataframe with the following formatting

dp = pd.read_csv('products.csv', header = 0,  dtype = {'name': str,'review': str,
                                                      'rating': int,'word_count': dict}, engine = 'c')
print dp.shape
for col in dp.columns:
    print 'column', col,':', type(col[0])
print type(dp['rating'][0])
dp.head(3)

这是输出:

(183531, 4)
column name : <type 'str'>
column review : <type 'str'>
column rating : <type 'str'>
column word_count : <type 'str'>
<type 'numpy.int64'>

我可以理解 pandas 可能很难将给定.但是"rating"列的内容如何既是str又是numpy.int64 ???

I can sort of understand that pandas might be finding it difficult to convert a string representation of a dictionary into a dictionary given this and this. But how can the content of the "rating" column be both str and numpy.int64???

顺便说一句,未指定引擎或标头之类的调整不会更改任何内容.

By the way, tweaks like not specifying an engine or header do not change anything.

感谢和问候

推荐答案

在循环中,您正在做

for col in dp.columns:
    print 'column', col,':', type(col[0])

,您会正确地看到str作为输出,因为col[0]是列名的第一个字母,即字符串.

and you are correctly seeing str as the output everywhere because col[0] is the first letter of the name of the column, which is a string.

例如,如果运行此循环:

For example, if you run this loop:

for col in dp.columns:
    print 'column', col,':', col[0]

您将看到每个列名称的字符串的第一个字母被打印出来-这就是col[0].

you will see the first letter of the string of each column name is printed out - this is what col[0] is.

您的循环仅对列名称进行迭代,而不对系列数据进行迭代.

Your loop only iterates on the column names, not on the series data.

您真正想要的是循环检查每一列数据的类型(而不是其标题或标题的一部分).

What you really want is to check the type of each column's data (not its header or part of its header) in a loop.

请改为执行此操作以获取列数据(非标题数据)的类型:

So do this instead to get the types of the column data (non-header data):

for col in dp.columns:
    print 'column', col,':', type(dp[col][0])

这类似于您分别打印rating列的类型时所做的操作.

This is similar to what you did when printing the type of the rating column separately.

这篇关于使用 pandas 读取csv时设置列类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆