使用to_categorical转换np.array时出现内存问题 [英] memory issues when transforming np.array using to_categorical

查看:225
本文介绍了使用to_categorical转换np.array时出现内存问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的numpy数组:

I have a numpy array like this:

[[0. 1. 1. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 1. 0. 1.]]

我像这样转换它以减少内存需求:

I transform it like this to reduce the memory demand:

x_val = x_val.astype(np.int)

导致的结果:

[[0 1 1 ... 0 0 1]
 [0 0 0 ... 0 0 1]
 [0 0 1 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 1 0 1]]

但是,当我这样做时:

x_val = to_categorical(x_val)

我得到:

in to_categorical
    categorical = np.zeros((n, num_classes), dtype=np.float32)
MemoryError

任何想法为何?最终,numpy数组包含用于二进制分类问题的标签.到目前为止,我已经在Keras ANN中将其用作float32,并且效果很好,并且我取得了不错的性能.那么实际上有必要运行to_categorical吗?

Any ideas why? Ultimately, the numpy array contains the labels for a binary classification problem. So far, I have used it as float32 as is in a Keras ANN and it worked fine and I achieved pretty good performance. So is it actually necessary to run to_categorical?

推荐答案

您不需要使用to_categorical,因为我猜您正在执行多标签分类.为避免一劳永逸(!),请允许我解释一下.

You don't need to use to_categorical since I guess you are doing multi-label classification. To avoid any confusion once and for all(!), let me explain this.

如果您要进行二进制分类,则意味着每个样本可能仅属于一个样本 两个类别中的一个猫vs狗或快乐vs悲伤或正面评论与负面评论,然后:

If you are doing binary classification, meaning each sample may belong to only one of two classes e.g. cat vs dog or happy vs sad or positive review vs negative review, then:

  • 标签应类似于形状为(n_samples,)[0 1 0 0 1 ... 0],即每个样本都带有一个(例如猫)或零(例如狗)标签.
  • 用于最后一层的激活函数通常是sigmoid(或任何其他输出范围[0,1]范围内的值的函数).
  • 通常使用的损失函数是binary_crossentropy.
  • The labels should be like [0 1 0 0 1 ... 0] with shape of (n_samples,) i.e. each sample has a one (e.g. cat) or zero (e.g. dog) label.
  • The activation function used for the last layer is usually sigmoid (or any other function that outputs a value in range [0,1]).
  • The loss function usually used is binary_crossentropy.

如果您要进行多类别分类,则意味着每个样本可能仅属于许多类别之一,例如猫vs狗vs狮子或快乐vs中立vs悲伤或正面评论vs中立评论vs负面评论,然后:

If you are doing multi-class classification, meaning each sample may belong to only one of many classes e.g. cat vs dog vs lion or happy vs neutral vs sad or positive review vs neutral review vs negative review, then:

  • 标签应采用单热编码,即[1, 0, 0]对应于cat,[0, 1, 0]对应于dog,[0, 0, 1]对应于lion,在这种情况下,标签的形状为(n_samples, n_classes);或者它们可以是整数(即稀疏标签),即1代表猫,2代表狗,3代表狮子,在这种情况下,标签的形状为(n_samples,). to_categorical函数用于将稀疏标签转换为单热编码标签,当然,如果您愿意的话.
  • 使用的激活功能通常是softmax.
  • 所使用的损失函数取决于标签的格式:如果标签是一次性编码的,则使用categorical_crossentropy;如果标签稀疏,则使用sparse_categorical_crossentropy.
  • The labels should be either one-hot encoded, i.e. [1, 0, 0] corresponds to cat, [0, 1, 0] corresponds to dog and [0, 0, 1] corresponds to lion, which in this case the labels have a shape of (n_samples, n_classes); Or they can be integers (i.e. sparse labels), i.e. 1 for cat, 2 for dog and 3 for lion, which in this case the labels have a shape of (n_samples,). The to_categorical function is used to convert sparse labels to one-hot encoded labels, of course if you wish to do so.
  • The activation function used is usually softmax.
  • The loss function used depends on the format of labels: if they are one-hot encoded, categorical_crossentropy is used and if they are sparse then sparse_categorical_crossentropy is used.

如果您要进行多标签分类,则意味着每个样本可能属于零个,一个或多个一类,例如图片可能同时包含猫和狗,然后:

If you are doing multi-label classification, meaning each sample may belong to zero, one or more than one classes e.g. an image may contain both cat and dog, then:

  • 标签应类似于[[1 0 0 1 ... 0], ..., [0 0 1 0 ... 1]],形状为(n_samples, n_classes).例如,标签[1 1]表示相应的样本属于两个类别(例如猫和狗).
  • 使用的激活函数为sigmoid,因为大概每个类都独立于另一个类.
  • 使用的损失函数是binary_crossentropy.
  • The labels should be like [[1 0 0 1 ... 0], ..., [0 0 1 0 ... 1]] with shape of (n_samples, n_classes). For example, a label [1 1] means that the corresponding sample belong to both classes (e.g. cat and dog).
  • The activation function used is sigmoid since presumably each class is independent of another class.
  • The loss function used is binary_crossentropy.

这篇关于使用to_categorical转换np.array时出现内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆