解码 pandas 数据框 [英] Decode pandas dataframe

查看:130
本文介绍了解码 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个编码的数据帧。我使用scitkit-learn的labelEncoder对其进行编码,创建了机器学习模型并进行了一些预测。但是现在我无法解码输出的pandas数据框中的值。我用doc的inverse_transform尝试了几次,但仍然每次都会收到类似

的错误

  ValueError:具有数组的真值一个以上的要素是模棱两可的。使用a.any()或a.all()`

这就是我的数据框的样子:

  0 147 14931 9 0 0 1 0 0 0 4 ... 0 0 242 677 0 94 192 27 169 20 
1 146 14955 15 1 0 0 0 0 0 0 ... 0 1 63 42 0 94 192 27 169 20
2 145 15161 25 1 0 0 0 1 0 5 ... 0 0 242 677 0 94 192 27169 20

这是我在必要时如何编码的代码:

  labelEncoder = preprocessing.LabelEncoder()
for b.columns:
b [col] = labelEncoder.fit_transform(b [ col])

列名是不必要的。我还使用了lambda函数进行了尝试,该函数在此处的另一个问题中有所显示,但仍然无法正常工作。我在做什么错?谢谢您的帮助!



编辑:
在实施Vivek Kumars代码后,出现以下错误:

  KeyError:'Predicted_Values'

那就是我添加到数据框中的一列,只是用来表示预测值。
我可以通过以下方式进行操作:

  b = pd.concat([X_test,y_test],axis = 1)#功能和实际预测值
b ['Predicted_Values'] = y_predict

我从将要在y轴上的数据框中删除该列,然后选择适合估算器:

  from sklearn.cross_validation import train_test_split 
X = b.drop(['Activity_Profile'],axis = 1)
y = b ['Activity_Profile']
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.3,random_state = 0)
模型= tree.DecisionTreeClassifier()
模型= model.fit(X_train,y_train)


解决方案

您可以在这里查看我的答案,以了解LabelEncoder用于多列的正确用法:-



为什么Sklearn预处理LabelEncoder inverse_transform仅适用于一列吗?



原因是LabelEncoder仅支持单一维度作为输入。因此,对于每个列,您需要有一个不同的labelEncoder对象,然后可以将其仅用于逆变换该特定列。



您可以使用labelencoder对象的字典convertig多列。像这样:

  labelencoder_dict = {} 
for b.columns:
labelEncoder =预处理.LabelEncoder()
b [col] = labelEncoder.fit_transform(b [col])
labelencoder_dict [col] = labelEncoder

在解码时,您可以只使用:

 来代替b.columns中的col: b $ bb [col] = labelencoder_dict [col] .inverse_transform(b [col])

更新: -



现在,您已将要使用的列添加为 y ,以下是对它进行解码的方法(假设您已将 Predicted_Values列添加到数据框中):

  b中的col列:
#如果col!='Predicted_valu‌es',则在此处跳过预测列

b [col] = labelencoder_dict [col] .inverse_transform(b [col])

#使用预测数据
b ['Predicted_valu‌es'] = labelencoder_dic上的原始y(Activity_Profile)编码器t [’Activity_Profile’]。inverse_transfo‌rm(
b [’Predicted_valu‌es’])


i have a encoded dataframe. I encode it with the labelEncoder from scitkit-learn, create a machine learning model and done some predictions. But now i cannot decode the values in the pandas dataframe for the outputs. I tried it several times with inverse_transform from the doc but still i get everytime errors like

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

Thats what my dataframe look like:

    0   147 14931   9   0   0   1   0   0   0   4   ... 0   0   242 677 0   94  192 27  169 20
    1   146 14955   15  1   0   0   0   0   0   0   ... 0   1   63  42  0   94  192 27  169 20
    2   145 15161   25  1   0   0   0   1   0   5   ... 0   0   242 677 0   94  192 27  169 20

Thats the code how i encode it if it is necessary:

labelEncoder = preprocessing.LabelEncoder()
for col in b.columns:
    b[col] = labelEncoder.fit_transform(b[col])

The column names are unnecessary. I also tried it with the lambda function, which is shown in another question here but still it doesnt work. What im doing wrong? Thanks for help!

Edit: After Vivek Kumars Code implementation i get the following error:

KeyError: 'Predicted_Values'

Thats a column i added to the dataframe just to represent the predicted values. I do that in the following way:

b = pd.concat([X_test, y_test], axis=1)  # features and actual predicted values
b['Predicted_Values'] = y_predict

Thats how i drop the column from the dataframe that will be on the y-axis and choose fit the estimator:

from sklearn.cross_validation import train_test_split
X = b.drop(['Activity_Profile'],axis=1)
y = b['Activity_Profile']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=0)
model = tree.DecisionTreeClassifier()
model = model.fit(X_train, y_train)

解决方案

You can look at my answer here to know the proper usage of LabelEncoder for multiple columns:-

Why does sklearn preprocessing LabelEncoder inverse_transform apply from only one column?

The explanation is that LabelEncoder only supports single dimension as input. So for each column, you need to have a different labelEncoder object which can then be used to inverse transform that particular column only.

You can use a dictionary of labelencoder objects for convertig multiple columns. Something like this:

labelencoder_dict = {}
for col in b.columns:
    labelEncoder = preprocessing.LabelEncoder()
    b[col] = labelEncoder.fit_transform(b[col])
    labelencoder_dict[col]=labelEncoder

While decoding, you can just use:

for col in b.columns:
    b[col] = labelencoder_dict[col].inverse_transform(b[col])

Update:-

Now that you have added the column which you are using as y, here's how you can decode it (assuming you have added the 'Predicted_Values' column to the dataframe):

for col in b.columns:
    # Skip the predicted column here
    if col != 'Predicted_valu‌​es':
        b[col] = labelencoder_dict[col].inverse_transform(b[col])

# Use the original `y (Activity_Profile)` encoder on predicted data
b['Predicted_valu‌​es'] = labelencoder_dict['Activity_Profile'].inverse_transfo‌​rm(
                                                      b['Predicted_valu‌​es']) 

这篇关于解码 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆