编码 pandas 中的字符串功能 [英] Encoding string features in pandas

查看：78 发布时间：2020/5/24 2:58:16 python pandas scikit-learn

本文介绍了编码 pandas 中的字符串功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有如下数据框

train_df
'type', 'manufacturer', 'year', 'num_doors'
sedan, bmw, 2012, 4
couple, audi, 2014, 2
and so on

和test_df的格式相似所有功能都是分类功能(一些字符串，一些int)，我想将它们编码为分类变量.

and test_df in similar format All the features are categorical features (some string, some int) and I want to encode them as categorical variables.

在pandas/sklearn中处理这些分类变量的好方法是什么另外，一旦将转换应用于火车df.我也要按照这些编码对test_df进行编码吗?

Whats a good way to handle these categorical variables in pandas/sklearn Also, once the transformation is applied on train df.. I want to encode the test_df also as per these encodings?

推荐答案

在读取数据时，将dtype指定为category，以使每一个单列本质上是分类的.

When reading your data, specify dtype to be category to make every single column categorical in nature.

df = pd.read_csv('file.csv', dtype='category')
df

     type manufacturer  year num_doors
0   sedan          bmw  2012         4
1  couple         audi  2014         2

df.dtypes

type            category
manufacturer    category
year            category
num_doors       category
dtype: object

如果您只想转换特定的列子集，则可以这样做-

If you want to convert only a specific subset of columns, something like this would do -

f = dict.fromkeys(['type', 'manufacturer', ...], 'categorical')

将f传递给dtype.

df = pd.read_csv('file.csv', dtype=f)

这篇关于编码 pandas 中的字符串功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

编码 pandas 中的字符串功能 [英] Encoding string features in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

编码 pandas 中的字符串功能 [英] Encoding string features in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭