使用 to_categorical 转换 np.array 时出现内存问题 [英] memory issues when transforming np.array using to_categorical

查看:23
本文介绍了使用 to_categorical 转换 np.array 时出现内存问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的 numpy 数组:

I have a numpy array like this:

[[0. 1. 1. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 1. 0. 1.]]

我这样转换以减少内存需求:

I transform it like this to reduce the memory demand:

x_val = x_val.astype(np.int)

导致:

[[0 1 1 ... 0 0 1]
 [0 0 0 ... 0 0 1]
 [0 0 1 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 1 0 1]]

但是,当我这样做时:

x_val = to_categorical(x_val)

我明白了:

in to_categorical
    categorical = np.zeros((n, num_classes), dtype=np.float32)
MemoryError

有什么想法吗?最终,numpy 数组包含二元分类问题的标签.到目前为止,我已经将它用作 float32 ,就像在 Keras ANN 中一样,它运行良好,我取得了相当不错的性能.那么真的有必要运行to_categorical吗?

Any ideas why? Ultimately, the numpy array contains the labels for a binary classification problem. So far, I have used it as float32 as is in a Keras ANN and it worked fine and I achieved pretty good performance. So is it actually necessary to run to_categorical?

推荐答案

你不需要使用 to_categorical 因为我猜你在做多标签分类.为了一劳永逸地避免任何混淆(!),让我解释一下.

You don't need to use to_categorical since I guess you are doing multi-label classification. To avoid any confusion once and for all(!), let me explain this.

如果你在做二元分类,这意味着每个样本可能只属于一个两个类,例如猫 vs 狗或快乐 vs 悲伤或正面评论 vs 负面评论,然后:

If you are doing binary classification, meaning each sample may belong to only one of two classes e.g. cat vs dog or happy vs sad or positive review vs negative review, then:

  • 标签应该像 [0 1 0 0 1 ... 0] 形状为 (n_samples,) 即每个样本都有一个(例如猫)或零(例如狗)标签.
  • 用于最后一层的激活函数通常是 sigmoid(或任何其他输出 [0,1] 范围内值的函数).
  • 通常使用的损失函数是binary_crossentropy.
  • The labels should be like [0 1 0 0 1 ... 0] with shape of (n_samples,) i.e. each sample has a one (e.g. cat) or zero (e.g. dog) label.
  • The activation function used for the last layer is usually sigmoid (or any other function that outputs a value in range [0,1]).
  • The loss function usually used is binary_crossentropy.

如果您正在进行多类分类,这意味着每个样本可能只属于多个类中的一个,例如猫 vs 狗 vs 狮子或快乐 vs 中性 vs 悲伤或正面评论 vs 中性评论 vs 负面评论,然后:

If you are doing multi-class classification, meaning each sample may belong to only one of many classes e.g. cat vs dog vs lion or happy vs neutral vs sad or positive review vs neutral review vs negative review, then:

  • 标签应该是one-hot编码,即[1, 0, 0]对应cat,[0, 1, 0]对应dog和[0, 0, 1] 对应于lion,在这种情况下,标签的形状为(n_samples, n_classes);或者它们可以是整数(即稀疏标签),即 1 代表猫,2 代表狗,3 代表狮子,在这种情况下标签的形状为 (n_samples,).to_categorical 函数用于将稀疏标签转换为单热编码标签,当然如果您愿意的话.
  • 使用的激活函数通常是softmax.
  • 使用的损失函数取决于标签的格式:如果它们是单热编码,则使用 categorical_crossentropy,如果它们是稀疏的,则使用 sparse_categorical_crossentropy.
  • The labels should be either one-hot encoded, i.e. [1, 0, 0] corresponds to cat, [0, 1, 0] corresponds to dog and [0, 0, 1] corresponds to lion, which in this case the labels have a shape of (n_samples, n_classes); Or they can be integers (i.e. sparse labels), i.e. 1 for cat, 2 for dog and 3 for lion, which in this case the labels have a shape of (n_samples,). The to_categorical function is used to convert sparse labels to one-hot encoded labels, of course if you wish to do so.
  • The activation function used is usually softmax.
  • The loss function used depends on the format of labels: if they are one-hot encoded, categorical_crossentropy is used and if they are sparse then sparse_categorical_crossentropy is used.

如果您正在进行多标签分类,这意味着每个样本可能属于零、一个或多个类别,例如一张图片可能同时包含猫和狗,然后:

If you are doing multi-label classification, meaning each sample may belong to zero, one or more than one classes e.g. an image may contain both cat and dog, then:

  • 标签应该像 [[1 0 0 1 ... 0], ..., [0 0 1 0 ... 1]] 形状为 (n_samples, n_classes).例如,标签 [1 1] 表示对应的样本属于两个类(例如猫和狗).
  • 使用的激活函数是 sigmoid 因为大概每个类都独立于另一个类.
  • 使用的损失函数是binary_crossentropy.
  • The labels should be like [[1 0 0 1 ... 0], ..., [0 0 1 0 ... 1]] with shape of (n_samples, n_classes). For example, a label [1 1] means that the corresponding sample belong to both classes (e.g. cat and dog).
  • The activation function used is sigmoid since presumably each class is independent of another class.
  • The loss function used is binary_crossentropy.

这篇关于使用 to_categorical 转换 np.array 时出现内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆