OneHotEncoder中的active_features_属性 [英] active_features_ attribute in OneHotEncoder

查看:504
本文介绍了OneHotEncoder中的active_features_属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是机器学习的新手,我试图了解OneHotEncoder的功能.我可以将其与LabelEncoder等其他东西区分开.特别是,我发现关于active_features_的文档特别令人困惑.

I am new to machine learning and I am trying to understand what the OneHotEncoder does. I can distinguish it with other things such as LabelEncoder. In particular, I find the documentation on active_features_ particularly confusing.

http://scikit- Learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder

feature_indices_

feature_indices_:
形状数组(n_features)
要素范围的指标.原始数据中的特征i映射到从feature_indices_ [i]到feature_indices_ [i + 1]的特征(然后可能会被active_features_掩盖)

feature_indices_ :
array of shape (n_features,)
Indices to feature ranges. Feature i in the original data is mapped to features from feature_indices_[i] to feature_indices_[i+1] (and then potentially masked by active_features_ afterwards)

这是什么意思,这是什么面具?

What does this mean, what is the mask here for?

谢谢!

推荐答案

OneHotEncoder对分类功能进行编码,(具有分类值的功能),例如,车辆"功能可以具有{{car},"motorcycle","truck",...}集合中的值.如果有人暗示您在这些值之间没有任何顺序(例如,汽车无法与摩托车或卡车相提并论,尽管您使用整数对汽车",摩托车",卡车"}进行了编码,但您想学习估算器,它并不暗示分类特征值之间的任何关系.要将此要素类型转换为二进制或有理数,并仍保持无序值的属性,可以使用一次热编码".这是非常常见的技术:代替原始数据集中的每个分类特征,它会创建n个新的二进制特征,其中n-原始分类特征中的唯一值数量.如果您想知道这n个新的二进制特征在结果数据集中的确切位置-您将必须使用feature_indices_属性,原始数据集中分类特征i的所有新的二进制特征现在都位于new的feature_indices_[i]:feature_indices_[i+1]列中数据集.

OneHotEncoder encodes categorical feature, (Feature which values are categorical) e.g feature "vehicle" can have value from set {"car", "motorcycle", "truck", ...}. This feature type is used when one implies that you don't have any order between those values, e.g. car is not comparable with motorcycle or truck, though you are encoding set "car", "motorcycle", "truck"} with integers, you want to learn estimator which doesn't imply any relationship between values of categorical feature. To transform this feature type into binary or rational, and still maintain that property of unordered values you can use One Hot Encoding. It's very common technique: instead of each categorical feature in original dataset it will create n new binary features, where n - number of unique values in original categorical feature. If you want to know where those n new binary features is exactly located in resulting dataset - you will have to use feature_indices_ attribute, all new binary features for categorical feature i from original dataset are now in columns feature_indices_[i]:feature_indices_[i+1] of new dataset.

OneHotEncoder根据数据集中该特征的值确定每个分类特征的范围,请看以下示例:

OneHotEncoder determines range of each categorical feature from values of this feature from dataset, look at this example:

dataset = [[0, 0],
           [1, 1],
           [2, 4],
           [0, 5]]

# First categorial feature has values in range [0,2] and dataset contains all values from that range.
# Second feature has values in range [0,5], but values (2, 3) are missing.
# Assuming that one encoded categorial values with that integer range, 2 and 3 must be somewhere, or it's sort of error.
# Thus OneHotEncoder will remove columns of values 2 and 3 from resulting dataset
enc = OneHotEncoder()
enc.fit(dataset)

print(enc.n_values_)
# prints array([3,6])
# first feature has 3 possible values, i.e 3 columns in resulting dataset
# second feature has 6 possible values
print(enc.feature_indices_)
# prints array([0, 3, 9])
# first feature decomposed into 3 columns (0,1,2), second — into 6 (3,4,5,6,7,8)
print(enc.active_features_)
# prints array([0, 1, 2, 3, 4, 7, 8])
# but two values of second feature never occurred, so active features doesn't list (5,6), and resulting dataset will not contain those columns too
enc.transform(dataset).toarray()
# prints this array
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.]])

这篇关于OneHotEncoder中的active_features_属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆