"ClassificationDataSet"中的"target"有什么用处? [英] What is `target` in `ClassificationDataSet` good for?

查看:212
本文介绍了"ClassificationDataSet"中的"target"有什么用处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出ClassificationDataSet的参数target可以用来做什么,但是我仍然不清楚.

I've tried to find out what the parameter target of ClassificationDataSet can be used for, but I'm still not clear about that.

>>> from pybrain.datasets import ClassificationDataSet
>>> help(ClassificationDataSet)
Help on class ClassificationDataSet in module pybrain.datasets.classification:

class ClassificationDataSet(pybrain.datasets.supervised.SupervisedDataSet)
 |  Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1.
 |  
 |  Method resolution order:
 |      ClassificationDataSet
 |      pybrain.datasets.supervised.SupervisedDataSet
 |      pybrain.datasets.dataset.DataSet
 |      pybrain.utilities.Serializable
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __add__(self, other)
 |      Adds the patterns of two datasets, if dimensions and type match.
 |  
 |  __init__(self, inp, target=1, nb_classes=0, class_labels=None)
 |      Initialize an empty dataset. 
 |      
 |      `inp` is used to specify the dimensionality of the input. While the 
 |      number of targets is given by implicitly by the training samples, it can
 |      also be set explicity by `nb_classes`. To give the classes names, supply
 |      an iterable of strings as `class_labels`.
 |  
 |  __reduce__(self)

因为它不包含有关目标的信息(除了默认情况下为1),所以我查看了

As this does not contain information about target (except that it's 1 per default) I took a look at the source code of ClassificationDataSet:

class ClassificationDataSet(SupervisedDataSet):
    """ Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1. """

    def __init__(self, inp, target=1, nb_classes=0, class_labels=None):
        """Initialize an empty dataset.

        `inp` is used to specify the dimensionality of the input. While the
        number of targets is given by implicitly by the training samples, it can
        also be set explicity by `nb_classes`. To give the classes names, supply
        an iterable of strings as `class_labels`."""
        # FIXME: hard to keep nClasses synchronized if appendLinked() etc. is used.
        SupervisedDataSet.__init__(self, inp, target)
        self.addField('class', 1)
        self.nClasses = nb_classes
        if len(self) > 0:
            # calculate class histogram, if we already have data
            self.calculateStatistics()
        self.convertField('target', int)
        if class_labels is None:
            self.class_labels = list(set(self.getField('target').flatten()))
        else:
            self.class_labels = class_labels
        # copy classes (may be changed into other representation)
        self.setField('class', self.getField('target'))

尚不清楚,因此我查看了 SupervisedDataSet :

It's still not clear, so I've looked at SupervisedDataSet:

class SupervisedDataSet(DataSet):
    """SupervisedDataSets have two fields, one for input and one for the target.
    """

    def __init__(self, inp, target):
        """Initialize an empty supervised dataset.

        Pass `inp` and `target` to specify the dimensions of the input and
        target vectors."""
        DataSet.__init__(self)
        if isscalar(inp):
            # add input and target fields and link them
            self.addField('input', inp)
            self.addField('target', target)
        else:
            self.setField('input', inp)
            self.setField('target', target)

        self.linkFields(['input', 'target'])

        # reset the index marker
        self.index = 0

        # the input and target dimensions
        self.indim = self.getDimension('input')
        self.outdim = self.getDimension('target')

这似乎与输出尺寸有关.但是target然后不应该是nb_classes吗?

It seems to be about the output dimension. But shouldn't target then be nb_classes?

推荐答案

target参数是训练样本输出维度的维度.要完全了解它和nb_classes之间的区别,让我们来看一下_convertToOneOfMany方法:

target argument is dimension of the training sample's output dimension. To fully understand the difference between it and nb_classes lets look at the _convertToOneOfMany method:

def _convertToOneOfMany(self, bounds=(0, 1)):
    """Converts the target classes to a 1-of-k representation, retaining the
    old targets as a field `class`.

    To supply specific bounds, set the `bounds` parameter, which consists of
    target values for non-membership and membership."""
    if self.outdim != 1:
        # we already have the correct representation (hopefully...)
        return
    if self.nClasses <= 0:
        self.calculateStatistics()
    oldtarg = self.getField('target')
    newtarg = zeros([len(self), self.nClasses], dtype='Int32') + bounds[0]
    for i in range(len(self)):
        newtarg[i, int(oldtarg[i])] = bounds[1]
    self.setField('target', newtarg)
    self.setField('class', oldtarg)

因此,从理论上讲,target是输出的维数,而nb_classes是分类类的数量.这对于数据转换很有用. 例如,假设我们在xor函数中具有用于训练网络的数据,如下所示:

So theoretically speaking target is dimension of the output while nb_classes is number of classification classes. This is useful for data transformation. For example lets say we have data for training network in xor function like so:

 IN   OUT
[0,0],0
[0,1],1
[1,0],1
[1,1],0

因此,输出的维数等于1,但是有两个输出类:0和1. 因此我们可以将数据更改为:

So the dimension of output is equal to one, but there are two output classes: 0 and 1. So we can change our data to:

 IN    OUT
[0,0],(0,1)
[0,1],(1,0)
[1,0],(1,0)
[1,1],(0,1)

现在输出的第一个参数是True的值,第二个是False的值. 这是常见的做法,例如在手写识别方面有更多的类.

Now first parameter of output is the value of True and second is the value of False. This is common practice with more classes for example in handwriting recognition.

希望能为您清除此精简版.

Hope that clear this lite bit up for you.

这篇关于"ClassificationDataSet"中的"target"有什么用处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆