“ValueError:无法将字符串转换为浮点数";scikit-learn 中的错误 [英] "ValueError: could not convert string to float" error in scikit-learn

查看:41
本文介绍了“ValueError:无法将字符串转换为浮点数";scikit-learn 中的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行以下脚本:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
dataset = pd.read_csv('data/50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
onehotencoder = OneHotEncoder(categorical_features=3, 
handle_unknown='ignore')
onehotencoder.fit(X)

数据头看起来像:数据

我有这个:

ValueError: 无法将字符串转换为浮点数:'New York'

ValueError: could not convert string to float: 'New York'

我阅读了类似问题的答案,然后打开 scikit-learn 文档,但是你怎么能参见 scikit-learn 作者在字符串中没有空格问题

I read the answers to similar questions and then opened scikit-learn documentations, but how you can see scikit-learn authors doesn't have issues with spaces in strings

我知道我可以使用 sklearn.preprocessing 中的 LabelEncocder 然后使用 OHE,它运行良好,但在这种情况下

I know that I can use LabelEncocder from sklearn.preprocessing and then use OHE and it works well, but in that case

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
warnings.warn(msg, FutureWarning)

按摩发生.

您可以使用完整的csv文件

[[165349.2, 136897.8, 471784.1, 'New York', 192261.83],
[162597.7, 151377.59, 443898.53, 'California', 191792.06],
[153441.51, 101145.55, 407934.54, 'Florida', 191050.39],
[144372.41, 118671.85, 383199.62, 'New York', 182901.99],
[142107.34, 91391.77, 366168.42, 'Florida', 166187.94]]

用于测试此代码的前 5 行.

5 first lines to test this code.

推荐答案

让你受伤的是 categorical_features=3.您不能将 categorical_features 与字符串数据一起使用.删除此选项,您将有好运.此外,您可能需要 fit_transform,而不是 fit.

It is categorical_features=3 that hurts you. You cannot use categorical_features with string data. Remove this option, and luck will be with you. Also, you probably need fit_transform, not fit as such.

onehotencoder = OneHotEncoder(handle_unknown='ignore')
transformed = onehotencoder.fit_transform(X[:, [3]]).toarray()
X1 = np.concatenate([X[:, :2], transformed, X[:, 4:]], axis=1)
#array([[165349.2, 136897.8, 0.0, '0.0, 1.0, 192261.83],
#       [162597.7, 151377.59, 1.0, 0.0, 0.0, 191792.06],
#       [153441.51, 101145.55, 0.0, 1.0, 0.0, 191050.39],
#       [144372.41, 118671.85, 0.0, 0.0, 1.0, 182901.99],
#       [142107.34, 91391.77, 0.0, 1.0, 0.0, 166187.94']])

这篇关于“ValueError:无法将字符串转换为浮点数";scikit-learn 中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆