解码 pandas 数据框 [英] Decode pandas dataframe

查看：130 发布时间：2020/10/19 20:04:32 python pandas scikit-learn decode

本文介绍了解码 pandas 数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个编码的数据帧。我使用scitkit-learn的labelEncoder对其进行编码，创建了机器学习模型并进行了一些预测。但是现在我无法解码输出的pandas数据框中的值。我用doc的inverse_transform尝试了几次，但仍然每次都会收到类似

的错误

  ValueError：具有数组的真值一个以上的要素是模棱两可的。使用a.any（）或a.all（）`

这就是我的数据框的样子：

  0 147 14931 9 0 0 1 0 0 0 4 ... 0 0 242 677 0 94 192 27 169 20 
 1 146 14955 15 1 0 0 0 0 0 0 ... 0 1 63 42 0 94 192 27 169 20 
 2 145 15161 25 1 0 0 0 1 0 5 ... 0 0 242 677 0 94 192 27169 20

这是我在必要时如何编码的代码：

  labelEncoder = preprocessing.LabelEncoder（）
 for b.columns：
b [col] = labelEncoder.fit_transform（b [ col]）

列名是不必要的。我还使用了lambda函数进行了尝试，该函数在此处的另一个问题中有所显示，但仍然无法正常工作。我在做什么错？谢谢您的帮助！

编辑：
在实施Vivek Kumars代码后，出现以下错误：

  KeyError：'Predicted_Values'

那就是我添加到数据框中的一列，只是用来表示预测值。
我可以通过以下方式进行操作：

  b = pd.concat（[X_test，y_test]，axis = 1）＃功能和实际预测值
b ['Predicted_Values'] = y_predict

我从将要在y轴上的数据框中删除该列，然后选择适合估算器：

  from sklearn.cross_validation import train_test_split 
 X = b.drop（['Activity_Profile']，axis = 1）
y = b ['Activity_Profile'] 
 X_train，X_test，y_train，y_test = train_test_split（X，y， test_size = 0.3，random_state = 0）
模型= tree.DecisionTreeClassifier（）
模型= model.fit（X_train，y_train）

解决方案

您可以在这里查看我的答案，以了解LabelEncoder用于多列的正确用法：-

为什么Sklearn预处理LabelEncoder inverse_transform仅适用于一列吗？

原因是LabelEncoder仅支持单一维度作为输入。因此，对于每个列，您需要有一个不同的labelEncoder对象，然后可以将其仅用于逆变换该特定列。

您可以使用labelencoder对象的字典convertig多列。像这样：

  labelencoder_dict = {} 
 for b.columns：
 labelEncoder =预处理.LabelEncoder（）
b [col] = labelEncoder.fit_transform（b [col]）
 labelencoder_dict [col] = labelEncoder

在解码时，您可以只使用：

 来代替b.columns中的col： b $ bb [col] = labelencoder_dict [col] .inverse_transform（b [col]）

更新： -

现在，您已将要使用的列添加为 y ，以下是对它进行解码的方法（假设您已将 Predicted_Values列添加到数据框中）：

  b中的col列：
＃如果col！='Predicted_valu‌es'，则在此处跳过预测列
：
b [col] = labelencoder_dict [col] .inverse_transform（b [col]）
 
＃使用预测数据
b ['Predicted_valu‌es'] = labelencoder_dic上的原始y（Activity_Profile）编码器t [’Activity_Profile’]。inverse_transfo‌rm（
 b [’Predicted_valu‌es’]）

i have a encoded dataframe. I encode it with the labelEncoder from scitkit-learn, create a machine learning model and done some predictions. But now i cannot decode the values in the pandas dataframe for the outputs. I tried it several times with inverse_transform from the doc but still i get everytime errors like

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

Thats what my dataframe look like:

    0   147 14931   9   0   0   1   0   0   0   4   ... 0   0   242 677 0   94  192 27  169 20
    1   146 14955   15  1   0   0   0   0   0   0   ... 0   1   63  42  0   94  192 27  169 20
    2   145 15161   25  1   0   0   0   1   0   5   ... 0   0   242 677 0   94  192 27  169 20

Thats the code how i encode it if it is necessary:

labelEncoder = preprocessing.LabelEncoder()
for col in b.columns:
    b[col] = labelEncoder.fit_transform(b[col])

The column names are unnecessary. I also tried it with the lambda function, which is shown in another question here but still it doesnt work. What im doing wrong? Thanks for help!

Edit: After Vivek Kumars Code implementation i get the following error:

KeyError: 'Predicted_Values'

Thats a column i added to the dataframe just to represent the predicted values. I do that in the following way:

b = pd.concat([X_test, y_test], axis=1)  # features and actual predicted values
b['Predicted_Values'] = y_predict

Thats how i drop the column from the dataframe that will be on the y-axis and choose fit the estimator:

from sklearn.cross_validation import train_test_split
X = b.drop(['Activity_Profile'],axis=1)
y = b['Activity_Profile']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=0)
model = tree.DecisionTreeClassifier()
model = model.fit(X_train, y_train)

解决方案

You can look at my answer here to know the proper usage of LabelEncoder for multiple columns:-

Why does sklearn preprocessing LabelEncoder inverse_transform apply from only one column?

The explanation is that LabelEncoder only supports single dimension as input. So for each column, you need to have a different labelEncoder object which can then be used to inverse transform that particular column only.

You can use a dictionary of labelencoder objects for convertig multiple columns. Something like this:

labelencoder_dict = {}
for col in b.columns:
    labelEncoder = preprocessing.LabelEncoder()
    b[col] = labelEncoder.fit_transform(b[col])
    labelencoder_dict[col]=labelEncoder

While decoding, you can just use:

for col in b.columns:
    b[col] = labelencoder_dict[col].inverse_transform(b[col])

Update:-

Now that you have added the column which you are using as y, here's how you can decode it (assuming you have added the 'Predicted_Values' column to the dataframe):

for col in b.columns:
    # Skip the predicted column here
    if col != 'Predicted_valu‌es':
        b[col] = labelencoder_dict[col].inverse_transform(b[col])

# Use the original `y (Activity_Profile)` encoder on predicted data
b['Predicted_valu‌es'] = labelencoder_dict['Activity_Profile'].inverse_transfo‌rm(
                                                      b['Predicted_valu‌es'])

这篇关于解码 pandas 数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解码 pandas 数据框 [英] Decode pandas dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解码 pandas 数据框 [英] Decode pandas dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭