解码 pandas 数据框 [英] Decode pandas dataframe
问题描述
ValueError:具有数组的真值一个以上的要素是模棱两可的。使用a.any()或a.all()`
这就是我的数据框的样子:
0 147 14931 9 0 0 1 0 0 0 4 ... 0 0 242 677 0 94 192 27 169 20
1 146 14955 15 1 0 0 0 0 0 0 ... 0 1 63 42 0 94 192 27 169 20
2 145 15161 25 1 0 0 0 1 0 5 ... 0 0 242 677 0 94 192 27169 20
这是我在必要时如何编码的代码:
labelEncoder = preprocessing.LabelEncoder()
for b.columns:
b [col] = labelEncoder.fit_transform(b [ col])
列名是不必要的。我还使用了lambda函数进行了尝试,该函数在此处的另一个问题中有所显示,但仍然无法正常工作。我在做什么错?谢谢您的帮助!
编辑:
在实施Vivek Kumars代码后,出现以下错误:
KeyError:'Predicted_Values'
那就是我添加到数据框中的一列,只是用来表示预测值。
我可以通过以下方式进行操作:
b = pd.concat([X_test,y_test],axis = 1)#功能和实际预测值
b ['Predicted_Values'] = y_predict
我从将要在y轴上的数据框中删除该列,然后选择适合估算器:
from sklearn.cross_validation import train_test_split
X = b.drop(['Activity_Profile'],axis = 1)
y = b ['Activity_Profile']
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.3,random_state = 0)
模型= tree.DecisionTreeClassifier()
模型= model.fit(X_train,y_train)
您可以在这里查看我的答案,以了解LabelEncoder用于多列的正确用法:-
为什么Sklearn预处理LabelEncoder inverse_transform仅适用于一列吗?
原因是LabelEncoder仅支持单一维度作为输入。因此,对于每个列,您需要有一个不同的labelEncoder对象,然后可以将其仅用于逆变换该特定列。
您可以使用labelencoder对象的字典convertig多列。像这样:
labelencoder_dict = {}
for b.columns:
labelEncoder =预处理.LabelEncoder()
b [col] = labelEncoder.fit_transform(b [col])
labelencoder_dict [col] = labelEncoder
在解码时,您可以只使用:
来代替b.columns中的col: b $ bb [col] = labelencoder_dict [col] .inverse_transform(b [col])
更新: -
现在,您已将要使用的列添加为 y
,以下是对它进行解码的方法(假设您已将 Predicted_Values列添加到数据框中):
b中的col列:
#如果col!='Predicted_values',则在此处跳过预测列
:
b [col] = labelencoder_dict [col] .inverse_transform(b [col])
#使用预测数据
b ['Predicted_values'] = labelencoder_dic上的原始y(Activity_Profile)编码器t [’Activity_Profile’]。inverse_transform(
b [’Predicted_values’])
i have a encoded dataframe. I encode it with the labelEncoder from scitkit-learn, create a machine learning model and done some predictions. But now i cannot decode the values in the pandas dataframe for the outputs. I tried it several times with inverse_transform from the doc but still i get everytime errors like
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`
Thats what my dataframe look like:
0 147 14931 9 0 0 1 0 0 0 4 ... 0 0 242 677 0 94 192 27 169 20
1 146 14955 15 1 0 0 0 0 0 0 ... 0 1 63 42 0 94 192 27 169 20
2 145 15161 25 1 0 0 0 1 0 5 ... 0 0 242 677 0 94 192 27 169 20
Thats the code how i encode it if it is necessary:
labelEncoder = preprocessing.LabelEncoder()
for col in b.columns:
b[col] = labelEncoder.fit_transform(b[col])
The column names are unnecessary. I also tried it with the lambda function, which is shown in another question here but still it doesnt work. What im doing wrong? Thanks for help!
Edit: After Vivek Kumars Code implementation i get the following error:
KeyError: 'Predicted_Values'
Thats a column i added to the dataframe just to represent the predicted values. I do that in the following way:
b = pd.concat([X_test, y_test], axis=1) # features and actual predicted values
b['Predicted_Values'] = y_predict
Thats how i drop the column from the dataframe that will be on the y-axis and choose fit the estimator:
from sklearn.cross_validation import train_test_split
X = b.drop(['Activity_Profile'],axis=1)
y = b['Activity_Profile']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=0)
model = tree.DecisionTreeClassifier()
model = model.fit(X_train, y_train)
You can look at my answer here to know the proper usage of LabelEncoder for multiple columns:-
Why does sklearn preprocessing LabelEncoder inverse_transform apply from only one column?
The explanation is that LabelEncoder only supports single dimension as input. So for each column, you need to have a different labelEncoder object which can then be used to inverse transform that particular column only.
You can use a dictionary of labelencoder objects for convertig multiple columns. Something like this:
labelencoder_dict = {}
for col in b.columns:
labelEncoder = preprocessing.LabelEncoder()
b[col] = labelEncoder.fit_transform(b[col])
labelencoder_dict[col]=labelEncoder
While decoding, you can just use:
for col in b.columns:
b[col] = labelencoder_dict[col].inverse_transform(b[col])
Update:-
Now that you have added the column which you are using as y
, here's how you can decode it (assuming you have added the 'Predicted_Values' column to the dataframe):
for col in b.columns:
# Skip the predicted column here
if col != 'Predicted_values':
b[col] = labelencoder_dict[col].inverse_transform(b[col])
# Use the original `y (Activity_Profile)` encoder on predicted data
b['Predicted_values'] = labelencoder_dict['Activity_Profile'].inverse_transform(
b['Predicted_values'])
这篇关于解码 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!