"ClassificationDataSet"中的"target"有什么用处? [英] What is `target` in `ClassificationDataSet` good for?
问题描述
我试图找出ClassificationDataSet
的参数target
可以用来做什么,但是我仍然不清楚.
I've tried to find out what the parameter target
of ClassificationDataSet
can be used for, but I'm still not clear about that.
>>> from pybrain.datasets import ClassificationDataSet
>>> help(ClassificationDataSet)
Help on class ClassificationDataSet in module pybrain.datasets.classification:
class ClassificationDataSet(pybrain.datasets.supervised.SupervisedDataSet)
| Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1.
|
| Method resolution order:
| ClassificationDataSet
| pybrain.datasets.supervised.SupervisedDataSet
| pybrain.datasets.dataset.DataSet
| pybrain.utilities.Serializable
| __builtin__.object
|
| Methods defined here:
|
| __add__(self, other)
| Adds the patterns of two datasets, if dimensions and type match.
|
| __init__(self, inp, target=1, nb_classes=0, class_labels=None)
| Initialize an empty dataset.
|
| `inp` is used to specify the dimensionality of the input. While the
| number of targets is given by implicitly by the training samples, it can
| also be set explicity by `nb_classes`. To give the classes names, supply
| an iterable of strings as `class_labels`.
|
| __reduce__(self)
因为它不包含有关目标的信息(除了默认情况下为1),所以我查看了
As this does not contain information about target (except that it's 1 per default) I took a look at the source code of ClassificationDataSet:
class ClassificationDataSet(SupervisedDataSet):
""" Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1. """
def __init__(self, inp, target=1, nb_classes=0, class_labels=None):
"""Initialize an empty dataset.
`inp` is used to specify the dimensionality of the input. While the
number of targets is given by implicitly by the training samples, it can
also be set explicity by `nb_classes`. To give the classes names, supply
an iterable of strings as `class_labels`."""
# FIXME: hard to keep nClasses synchronized if appendLinked() etc. is used.
SupervisedDataSet.__init__(self, inp, target)
self.addField('class', 1)
self.nClasses = nb_classes
if len(self) > 0:
# calculate class histogram, if we already have data
self.calculateStatistics()
self.convertField('target', int)
if class_labels is None:
self.class_labels = list(set(self.getField('target').flatten()))
else:
self.class_labels = class_labels
# copy classes (may be changed into other representation)
self.setField('class', self.getField('target'))
尚不清楚,因此我查看了 SupervisedDataSet :
It's still not clear, so I've looked at SupervisedDataSet:
class SupervisedDataSet(DataSet):
"""SupervisedDataSets have two fields, one for input and one for the target.
"""
def __init__(self, inp, target):
"""Initialize an empty supervised dataset.
Pass `inp` and `target` to specify the dimensions of the input and
target vectors."""
DataSet.__init__(self)
if isscalar(inp):
# add input and target fields and link them
self.addField('input', inp)
self.addField('target', target)
else:
self.setField('input', inp)
self.setField('target', target)
self.linkFields(['input', 'target'])
# reset the index marker
self.index = 0
# the input and target dimensions
self.indim = self.getDimension('input')
self.outdim = self.getDimension('target')
这似乎与输出尺寸有关.但是target
然后不应该是nb_classes
吗?
It seems to be about the output dimension. But shouldn't target
then be nb_classes
?
推荐答案
target
参数是训练样本输出维度的维度.要完全了解它和nb_classes
之间的区别,让我们来看一下_convertToOneOfMany
方法:
target
argument is dimension of the training sample's output dimension. To fully understand the difference between it and nb_classes
lets look at the _convertToOneOfMany
method:
def _convertToOneOfMany(self, bounds=(0, 1)):
"""Converts the target classes to a 1-of-k representation, retaining the
old targets as a field `class`.
To supply specific bounds, set the `bounds` parameter, which consists of
target values for non-membership and membership."""
if self.outdim != 1:
# we already have the correct representation (hopefully...)
return
if self.nClasses <= 0:
self.calculateStatistics()
oldtarg = self.getField('target')
newtarg = zeros([len(self), self.nClasses], dtype='Int32') + bounds[0]
for i in range(len(self)):
newtarg[i, int(oldtarg[i])] = bounds[1]
self.setField('target', newtarg)
self.setField('class', oldtarg)
因此,从理论上讲,target
是输出的维数,而nb_classes
是分类类的数量.这对于数据转换很有用.
例如,假设我们在xor
函数中具有用于训练网络的数据,如下所示:
So theoretically speaking target
is dimension of the output while nb_classes
is number of classification classes. This is useful for data transformation.
For example lets say we have data for training network in xor
function like so:
IN OUT
[0,0],0
[0,1],1
[1,0],1
[1,1],0
因此,输出的维数等于1,但是有两个输出类:0和1. 因此我们可以将数据更改为:
So the dimension of output is equal to one, but there are two output classes: 0 and 1. So we can change our data to:
IN OUT
[0,0],(0,1)
[0,1],(1,0)
[1,0],(1,0)
[1,1],(0,1)
现在输出的第一个参数是True
的值,第二个是False
的值.
这是常见的做法,例如在手写识别方面有更多的类.
Now first parameter of output is the value of True
and second is the value of False
.
This is common practice with more classes for example in handwriting recognition.
希望能为您清除此精简版.
Hope that clear this lite bit up for you.
这篇关于"ClassificationDataSet"中的"target"有什么用处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!