将字符串数组(类别)从pandas数据帧转换为int数组 [英] Convert array of string (category) to array of int from a pandas dataframe

查看：121 发布时间：2020/5/18 19:16:04 python numpy pandas

本文介绍了将字符串数组(类别)从pandas数据帧转换为int数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试做与上一个非常相似的操作问题，但出现错误. 我有一个包含特征，标签的pandas数据框，我需要做一些转换才能将特征和label变量发送到机器学习对象中:

I am trying to do something very similar to that previous question but I get an error. I have a pandas dataframe containing features,label I need to do some convertion to send the features and the label variable into a machine learning object:

import pandas
import milk
from scikits.statsmodels.tools import categorical

那我有:

trainedData=bigdata[bigdata['meta']<15]
untrained=bigdata[bigdata['meta']>=15]
#print trainedData
#extract two columns from trainedData
#convert to numpy array
features=trainedData.ix[:,['ratio','area']].as_matrix(['ratio','area'])
un_features=untrained.ix[:,['ratio','area']].as_matrix(['ratio','area'])
print 'features'
print features[:5]
##label is a string:single, touching,nuclei,dust
print 'labels'

labels=trainedData.ix[:,['type']].as_matrix(['type'])
print labels[:5]
#convert single to 0, touching to 1, nuclei to 2, dusts to 3
#
tmp=categorical(labels,drop=True)
targets=categorical(labels,drop=True).argmax(1)
print targets

输出控制台首先产生:

features
[[ 0.38846334  0.97681855]
[ 3.8318634   0.5724734 ]
[ 0.67710876  1.01816444]
[ 1.12024943  0.91508699]
[ 7.51749674  1.00156707]]
labels
[[single]
[touching]
[single]
[single]
[nuclei]]

我遇到了以下错误:

Traceback (most recent call last):
File "/home/claire/Applications/ProjetPython/projet particule et objet/karyotyper/DAPI-Trainer02-MILK.py", line 83, in <module>
tmp=categorical(labels,drop=True)
File "/usr/local/lib/python2.6/dist-packages/scikits.statsmodels-0.3.0rc1-py2.6.egg/scikits/statsmodels/tools/tools.py", line 206, in categorical
tmp_dummy = (tmp_arr[:,None]==data).astype(float)
AttributeError: 'bool' object has no attribute 'astype'

是否可以将数据框中的类别变量类型"转换为int类型? 'type'可以取值'single'，'touching'，'nuclei'，'dusts'，我需要使用int值进行转换，例如0、1、2、3.

Is it possible to convert the category variable 'type' within the dataframe into int type ? 'type' can take the values 'single', 'touching','nuclei','dusts' and I need to convert with int values such 0, 1, 2, 3.

推荐答案

如果您有字符串或其他对象的向量，并希望为其提供分类标签，则可以使用Factor类(在名称空间):

If you have a vector of strings or other objects and you want to give it categorical labels, you can use the Factor class (available in the pandas namespace):

In [1]: s = Series(['single', 'touching', 'nuclei', 'dusts', 'touching', 'single', 'nuclei'])

In [2]: s
Out[2]: 
0    single
1    touching
2    nuclei
3    dusts
4    touching
5    single
6    nuclei
Name: None, Length: 7

In [4]: Factor(s)
Out[4]: 
Factor:
array([single, touching, nuclei, dusts, touching, single, nuclei], dtype=object)
Levels (4): [dusts nuclei single touching]

该因子具有属性labels和levels:

In [7]: f = Factor(s)

In [8]: f.labels
Out[8]: array([2, 3, 1, 0, 3, 2, 1], dtype=int32)

In [9]: f.levels
Out[9]: Index([dusts, nuclei, single, touching], dtype=object)

这是用于一维矢量的，因此不确定是否可以立即将其应用于您的问题，但请看一下.

This is intended for 1D vectors so not sure if it can be instantly applied to your problem, but have a look.

顺便说一句，我建议您在statsmodels和/或scikit-learn邮件列表上询问这些问题，因为我们大多数人都不是SO用户.

BTW I recommend that you ask these questions on the statsmodels and / or scikit-learn mailing list since most of us are not frequent SO users.

这篇关于将字符串数组(类别)从pandas数据帧转换为int数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将字符串数组(类别)从pandas数据帧转换为int数组 [英] Convert array of string (category) to array of int from a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将字符串数组(类别)从pandas数据帧转换为int数组 [英] Convert array of string (category) to array of int from a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭