如何处理ML分类中的字符串数据 [英] How to handles string data in ML classification

查看：97 发布时间：2020/4/25 10:21:34 python machine-learning keras

本文介绍了如何处理ML分类中的字符串数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

你好，我是机器学习的初学者，我以前曾处理过一些二进制的ml任务，这些任务中的数据都是数字的.现在，我面临一个必须找到特定组合可能性的问题.我目前无法透露数据集或代码.我的数据是10列的数据框.我必须在8列上训练我的模型，并预测最后2列的可能性.那是我的标签是最后两列的组合.我面临的问题是，这些列值不是数字.我已经尝试了遇到的所有问题，但是找不到将其转换为数值的任何合适方法.我已经尝试了sklearn的LabelEncoder，该标签可与标签一起使用，但是如果再次使用它会引发内存错误.我尝试了从pandas读取to_numeric的方法，该方法将所有值读取为Nan.值的格式为"2be74fad-4d4".有关如何处理此问题的任何建议将不胜感激.

Hello I am a beginner in Machine Learning, I have previously worked with some binary ml tasks where the data was numerical. Now I am facing an issue where I have to find the probability of a particular combination. I can not disclose the dataset or the code at this point. My data is a dataframe of 10 columns. I have to train my model on 8 columns and predict the possibility of the last 2 columns. That is my labels are a combination of the last 2 columns. What I am facing a problem with is, these column values are not numerical. I have tried everything I came across but can't find any suitable means of converting this to numerical values. I have tried LabelEncoder from sklearn,which works with the labels, but throws memory error if I use it again. I have tried to_numeric from pandas, which reads all the values as Nan. The values are in the form '2be74fad-4d4'. Any suggestions would be highly appreciated about how to handle this issue.

推荐答案

要将分类数据转换为数值，可以在sklearn中尝试以下方法:

To convert categorical data to numerical, you can try these approaches in sklearn:

Label Encoding
Label Binarizer
OneHot Encoding

现在，对于您的问题，可以使用LabelEncoder.但是有一个问题！在其他sklearn模型中，您可以声明一次，然后使用它进行拟合，然后在许多列上进行转换.

Now, for your problem, you can use LabelEncoder. But there is a catch. In other sklearn models, you can declare it once and then use it to fit and then transform on a number of columns.

在LabelEncoding中，您必须在火车数据的一列上fit_transform模型，然后在测试数据的同一列上transform.然后，对下一个类别列进行相同的处理.

In LabelEncoding, you have to fit_transform the model on one column in train data and then transform the same column in test data. Then the same process for the next categorial column.

您可以遍历分类列列表以使其变得简单.请考虑以下代码段:

You can iterate over a list of categorical columns to make it simple. Consider the snippet below:

cat_cols = ['Item_Identifier', 'Item_Fat_Content', 'Item_Type', 'Outlet_Identifier', 
         'Outlet_Size', 'Outlet_Location_Type', 'Outlet_Type', 'Item_Type_Combined']
enc = LabelEncoder()

for col in cat_cols:
    train[col] = train[col].astype('str')
    test[col] = test[col].astype('str')
    train[col] = enc.fit_transform(train[col])
    test[col] = enc.transform(test[col])

这篇关于如何处理ML分类中的字符串数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何处理ML分类中的字符串数据 [英] How to handles string data in ML classification

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何处理ML分类中的字符串数据 [英] How to handles string data in ML classification

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭