使用 pandas 库将非数字转换为数值 [英] converting non-numeric to numeric value using Panda libraries

查看:91
本文介绍了使用 pandas 库将非数字转换为数值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习熊猫,并且遇到了一个有趣的问题.所以我有一个这样的数据框:

I am learning Pandas and I came to an interesting question. So I have a Dataframe like this:

COL1    COL2      COL3
a     9/8/2016     2
b     12/4/2016    23
         ...
n     1/1/2015     21

COL1是一个字符串,Col2是一个时间戳,Col3是一个数字.现在,我需要对此数据帧进行一些分析,并将所有非数字数据转换为数字数据.我尝试使用 DictVectorizer()来转换COL1和2到数值,但首先我不确定这是否是执行此类操作的最佳方法,其次我不知道如何处理时间戳. 当我使用DictVectorizer时,输出将是:

COL1 is a String, Col2 is a timestamp and Col3 is a number. Now I need to do some analysis on this Dataframe and I want to convert all the non-numeric data to numeric. I tried using DictVectorizer() to convert COL1 and 2 to numeric but first of all I am not sure if this is the best way doing such a thing and second I don't know what to do with the timestamp. When I use DictVectorizer the output would be like:

{u'COL3: {0:2, 1:23 , ...,n:21}, 'COL1': {0: u'a', 1:'b', ... , n:'n'}, 'COL2': {0: u'9/8/2016' , 1: u'12/4/2016' , ... , n:u'1/1/2016'}}

但是从我学到的东西应该是这样的,或者至少我知道我需要这样的东西:

but from what I learned it should be like this or at least I know I need something like this:

 {COL1:'a', COL2: '9/8/2016' , COL3: 2  and so on}   

所以,问题: 1-什么是将非数字(包括日期)转换为数值以在sklearn库中使用的最佳方法 2-使用DictVectorize()的正确方法是什么

so, questions: 1-what is the best way of converting non- numeric (including date) to numeric values to use in sklearn libraries 2- what is the right way of using DictVectorize()

任何帮助将不胜感激.

Any help would be appreciated.

推荐答案

要将非数字数据编码为数字,可以使用scikit-learn的

To encode non-numeric data to numeric you can use scikit-learn's LabelEncoder. It will encode each category such as COL1's a, b, c to integers.

假设df是您的数据帧,请尝试:

Assuming df is your dataframe, try:

from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
enc.fit(df['COL1'])
df['COL1'] = enc.transform(df['col1'])

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆