使用 pandas 库将非数字转换为数值 [英] converting non-numeric to numeric value using Panda libraries
问题描述
我正在学习熊猫,并且遇到了一个有趣的问题.所以我有一个这样的数据框:
I am learning Pandas and I came to an interesting question. So I have a Dataframe like this:
COL1 COL2 COL3
a 9/8/2016 2
b 12/4/2016 23
...
n 1/1/2015 21
COL1是一个字符串,Col2是一个时间戳,Col3是一个数字.现在,我需要对此数据帧进行一些分析,并将所有非数字数据转换为数字数据.我尝试使用 DictVectorizer()来转换COL1和2到数值,但首先我不确定这是否是执行此类操作的最佳方法,其次我不知道如何处理时间戳. 当我使用DictVectorizer时,输出将是:
COL1 is a String, Col2 is a timestamp and Col3 is a number. Now I need to do some analysis on this Dataframe and I want to convert all the non-numeric data to numeric. I tried using DictVectorizer() to convert COL1 and 2 to numeric but first of all I am not sure if this is the best way doing such a thing and second I don't know what to do with the timestamp. When I use DictVectorizer the output would be like:
{u'COL3: {0:2, 1:23 , ...,n:21}, 'COL1': {0: u'a', 1:'b', ... , n:'n'}, 'COL2': {0: u'9/8/2016' , 1: u'12/4/2016' , ... , n:u'1/1/2016'}}
但是从我学到的东西应该是这样的,或者至少我知道我需要这样的东西:
but from what I learned it should be like this or at least I know I need something like this:
{COL1:'a', COL2: '9/8/2016' , COL3: 2 and so on}
所以,问题: 1-什么是将非数字(包括日期)转换为数值以在sklearn库中使用的最佳方法 2-使用DictVectorize()的正确方法是什么
so, questions: 1-what is the best way of converting non- numeric (including date) to numeric values to use in sklearn libraries 2- what is the right way of using DictVectorize()
任何帮助将不胜感激.
Any help would be appreciated.
推荐答案
要将非数字数据编码为数字,可以使用scikit-learn的
To encode non-numeric data to numeric you can use scikit-learn's LabelEncoder. It will encode each category such as COL1's a
, b
, c
to integers.
假设df是您的数据帧,请尝试:
Assuming df is your dataframe, try:
from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
enc.fit(df['COL1'])
df['COL1'] = enc.transform(df['col1'])
-
enc.fit()
创建相应的整数值. -
enc.transform()
将编码应用于df值. enc.fit()
creates the corresponding integer values.enc.transform()
applies the encoding to the df values.
对于第二列,使用Pandas to_datetime()函数可以解决问题,就像提到的@ quinn-weber一样,请尝试:
For the second column, using Pandas to_datetime() function should do the trick, like @quinn-weber mentioned, try:
df['COL2'] = pd.to_datetime(df['COL2'])
这篇关于使用 pandas 库将非数字转换为数值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!