Keras:制作一个神经网络来找到一个数字的模数 [英] Keras: Making a neural network to find a number's modulus

查看:15
本文介绍了Keras:制作一个神经网络来找到一个数字的模数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名经验丰富的 Python 开发人员,但在机器学习方面完全是新手.这是我第一次尝试使用 Keras.你能说出我做错了什么吗?

I'm an experienced Python developer, but a complete newbie in machine learning. This is my first attempt to use Keras. Can you tell what I'm doing wrong?

我正在尝试制作一个神经网络,它采用二进制形式的数字,并在除以 7 时输出其模数.(我的目标是执行一个非常简单的任务,只是为了看看一切正常.)

I'm trying to make a neural network that takes a number in binary form, and outputs its modulo when dividing by 7. (My goal was to take a very simple task just to see that everything works.)

在下面的代码中,我定义了网络,并在 10,000 个随机数上对其进行了训练.然后我在 500 个随机数上对其进行测试.

In the code below I define the network and I train it on 10,000 random numbers. Then I test it on 500 random numbers.

出于某种原因,我得到的准确度约为 1/7,这是您对完全随机算法所期望的准确度,即我的神经网络没有做任何事情.

For some reason the accuracy that I get is around 1/7, which is the accuracy you'd expect from a completely random algorithm, i.e. my neural network isn't doing anything.

谁能帮我找出问题所在?

import keras.models
import numpy as np
from python_toolbox import random_tools

RADIX = 7

def _get_number(vector):
    return sum(x * 2 ** i for i, x in enumerate(vector))

def _get_mod_result(vector):
    return _get_number(vector) % RADIX

def _number_to_vector(number):
    binary_string = bin(number)[2:]
    if len(binary_string) > 20:
        raise NotImplementedError
    bits = (((0,) * (20 - len(binary_string))) +
            tuple(map(int, binary_string)))[::-1]
    assert len(bits) == 20
    return np.c_[bits]


def get_mod_result_vector(vector):
    return _number_to_vector(_get_mod_result(vector))


def main():
    model = keras.models.Sequential(
        (
            keras.layers.Dense(
                units=20, activation='relu', input_dim=20
            ),
            keras.layers.Dense(
                units=20, activation='relu'
            ),
            keras.layers.Dense(
                units=20, activation='softmax'
            )
        )
    )
    model.compile(optimizer='sgd',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    data = np.random.randint(2, size=(10000, 20))
    labels = np.vstack(map(get_mod_result_vector, data))

    model.fit(data, labels, epochs=10, batch_size=50)
    def predict(number):
        foo = model.predict(_number_to_vector(number))
        return _get_number(tuple(map(round, foo[0])))
    def is_correct_for_number(x):
        return bool(predict(x) == x % RADIX)
    predict(7)
    sample = random_tools.shuffled(range(2 ** 20))[:500]
    print('Total accuracy:')
    print(sum(map(is_correct_for_number, sample)) / len(sample))
    print(f'(Accuracy of random algorithm is {1/RADIX:.2f}')


if __name__ == '__main__':
    main()

推荐答案

UPD

经过一些修补后,我能够使用 RNN 找到一个相当不错的解决方案.它在不到 5% 的所有可能的唯一输入上进行训练,并在随机测试样本上提供 >90% 的准确度.您可以将批次数从 40 增加到 100 以使其更准确(尽管在某些运行中,模型有可能无法收敛到正确的答案 - 这里比通常情况下更高).我在这里改用 Adam 优化器,不得不将样本数量增加到 50K(10K 导致我过度拟合).

After some tinkering I was able to get to a reasonably good solution using RNNs. It trains on less than 5% of all possible unique inputs and gives >90% accuracy on the random test sample. You can increase number of batches to 100 from 40 to make it a bit more accurate (though in some runs there is a chance the model won't converge to the right answer - here it is higher than usually). I have switched to using Adam optimizer here and had to increase number of samples to 50K (10K led to overfitting for me).

请理解,这个解决方案有点诙谐,因为它基于任务域知识,即我们的目标函数可以通过输入位序列(甚至如果您反转输入位序列,则更简单的公式,但在 LSTM 中使用 go_backwards=True 在这里没有帮助).

Please understand that this solution is a bit of a tongue-in-cheek thing, because it is based on the task-domain knowledge that our target function can be defined by a simple recurring formula on the sequence of input bits (even simpler formula if you reverse your input bit sequence, but using go_backwards=True in LSTM didn't help here).

如果您反转输入位顺序(以便我们始终从最高有效位开始),那么目标函数的循环公式就是 F_n = G(F_{n-1}, x_n),其中 F_n = MOD([x_1,...,x_n], 7)G(x, y) = MOD(2*x+y, 7) - 只有 49 个不同的输入和 7 个可能的输出.所以模型类必须学习初始状态+这个G更新函数.对于从最低有效位开始的序列,循环公式稍微复杂一些,因为它还需要跟踪每一步的当前 MOD(2**n, 7),但它看来这个难度和训练无关.

If you inverse the input bits order (so that we always start with the most significant bit) than the recurring formula for the target function is just F_n = G(F_{n-1}, x_n), where F_n = MOD([x_1,...,x_n], 7), and G(x, y) = MOD(2*x+y, 7) - only has 49 different inputs and 7 possible outputs. So the model kind of have to learn initial state + this G update function. For the sequence starting with the least significant bit the recurring formula is slightly more complicated cause it will also need to keep track on what is current MOD(2**n, 7) on each step, but it seems that this difficulty doesn't matter for training.

请注意 - 这些公式仅用于解释 RNN 在这里工作的原因.下面的网络只是一个普通的 LSTM 层 + softmax,原始输入的比特被视为一个序列.

Please note - these formulas are only to explain why RNN works here. The net below is just a plain LSTM layer + softmax with original input of bits treated as a sequence.

使用 RNN 层得到答案的完整代码:

Full code for the answer using RNN layer:

import keras.models
import numpy as np
from python_toolbox import random_tools

RADIX = 7
FEATURE_BITS = 20

def _get_number(vector):
    return sum(x * 2 ** i for i, x in enumerate(vector))

def _get_mod_result(vector):
    return _get_number(vector) % RADIX

def _number_to_vector(number):
    binary_string = bin(number)[2:]
    if len(binary_string) > FEATURE_BITS:
        raise NotImplementedError
    bits = (((0,) * (FEATURE_BITS - len(binary_string))) +
            tuple(map(int, binary_string)))[::-1]
    assert len(bits) == FEATURE_BITS
    return np.c_[bits]


def get_mod_result_vector(vector):
    v = np.repeat(0, 7)
    v[_get_mod_result(vector)] = 1
    return v


def main():
    model = keras.models.Sequential(
        (
            keras.layers.Reshape(
                (1, -1)
            ),
            keras.layers.LSTM(
                units=100,
            ),
            keras.layers.Dense(
                units=7, activation='softmax'
            )
        )
    )
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.01),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    data = np.random.randint(2, size=(50000, FEATURE_BITS))
    labels = np.vstack(map(get_mod_result_vector, data))

    model.fit(data, labels, epochs=40, batch_size=50)
    def predict(number):
        foo = model.predict(_number_to_vector(number))
        return np.argmax(foo)
    def is_correct_for_number(x):
        return bool(predict(x) == x % RADIX)
    sample = random_tools.shuffled(range(2 ** FEATURE_BITS))[:500]
    print('Total accuracy:')
    print(sum(map(is_correct_for_number, sample)) / len(sample))
    print(f'(Accuracy of random algorithm is {1/RADIX:.2f}')


if __name__ == '__main__':
    main()

原答案

我不确定它是怎么发生的,但是您选择检查代码的特定任务对于神经网络来说是极其困难的.我认为最好的解释是,当特征以改变一个特征总是完全改变目标输出值的方式相互连接时,神经网络并不是真的很好.一种看待它的方法是在您期望某个答案时查看特征集 - 在您的情况下,它们看起来像是 20 维空间中大量平行超平面的联合 - 对于 7 个类别中的每一个,这些集的平面被很好地"交错并留给神经网络区分.

I'm not sure how it happened, but the particular task you chose to check your code is extremely difficult for a NN. I think the best explanation would be that NNs are not really good when features are interconnected in such way that changing one feature always change value of your target output completely. One way to look at it would be to see the sets of features when you expect a certain answer - in your case they will look like unions of very large number of parallel hyper planes in 20 dimensional space - and for each of 7 categories these sets of planes are "nicely" interleaved and left for NN to distinguish.

也就是说-如果您的示例数量很大,例如 10K,而可能的输入数量较小,则假设您的输入位数只有 8 位(因此仅可能有 256 个唯一输入)- 网络应该学习"正确的功能相当不错(通过记住"每个输入的正确答案,没有泛化).在您的情况下,这不会发生,因为代码有以下错误.

That said - if your number of examples is large, say 10K and number of possible inputs is smaller, say your input bit numbers are only 8 bits large (so 256 unique inputs possible only) - networks should "learn" the right function quite ok (by "remembering" correct answers for every input, without generalization). In your case that doesn't happen because the code has the following bug.

您的标签是 20 维向量,其中包含 0-6 位整数(您实际需要的标签)-所以我猜您几乎是在尝试教 NN 将答案作为单独的分类器学习(只有 3 位)可能是非零).我把它改成了我假设你真正想要的 - 长度为 7 的向量,只有一个值为 1,其他值为 0(所谓的一种热编码,根据 this).如果你想尝试分别学习每一位,你绝对不应该在最后一层使用 softmax 20,因为这样的输出会在 20 个类别上生成总和为 1 的概率(在这种情况下,你应该训练 20-or-rather-3 二进制分类器).由于您的代码没有为 keras 提供正确的输入,因此您最终得到的模型有点随机,并且您应用的舍入旨在为 95%-100% 的输入输出相同的值.

Your labels were 20-dimensional vectors with bits of 0-6 integer (your actual desired label) - so I guess you were pretty much trying to teach NN to learn bits of the answer as separate classifiers (with only 3 bits ever possible to be non-zero). I changed that to what I assume you actually wanted - vectors of length 7 with only one value being 1 and others 0 (so-called one hot encoding which keras actually expects for categorical_crossentropy according to this). If you wanted to try to learn each bit separately you definitely shouldn't have used softmax 20 in the last layer, cause such output generates probabilities on 20 classes which sum up to 1 (in that case you should have trained 20-or-rather-3 binary classifiers instead). Since your code didn't give keras correct input the model you got in the end was kind of random and with rounding you applied was intented to output the same value for 95%-100% of inputs.

下面稍微更改的代码训练了一个模型,该模型可以或多或少地正确猜测每个数字 0 到 255 的 mod 7 答案(同样,几乎记住了每个输入的正确答案).如果您尝试增加 FEATURE_BITS,您将看到结果大幅下降.如果您真的想训练 NN 以 20 位或更多位的输入(并且不为 NN 提供所有可能的输入和无限的训练时间)来学习此任务,您将需要应用一些特定于任务的特征转换和/或一些层经过精心设计,完全擅长于您想要完成的任务,正如其他人在对您的问题的评论中已经提到的那样.

Slightly changed code below trains a model which can more or less correctly guess the mod 7 answer for every number 0 to 255 (again, pretty much remembers the correct answer for every input). If you try to increase FEATURE_BITS you will see large degradation of the results. If you actually want to train NN to learn this task as is with 20 or more bits of input (and without supplying NN with all possible inputs and infinite time to train) you will need to apply some task-specific feature transformations and/or some layers carefully designed to exactly be good at task you want to achieve as others already mentioned in comments to your question.

import keras.models
import numpy as np
from python_toolbox import random_tools

RADIX = 7
FEATURE_BITS = 8

def _get_number(vector):
    return sum(x * 2 ** i for i, x in enumerate(vector))

def _get_mod_result(vector):
    return _get_number(vector) % RADIX

def _number_to_vector(number):
    binary_string = bin(number)[2:]
    if len(binary_string) > FEATURE_BITS:
        raise NotImplementedError
    bits = (((0,) * (FEATURE_BITS - len(binary_string))) +
            tuple(map(int, binary_string)))[::-1]
    assert len(bits) == FEATURE_BITS
    return np.c_[bits]


def get_mod_result_vector(vector):
    v = np.repeat(0, 7)
    v[_get_mod_result(vector)] = 1
    return v


def main():
    model = keras.models.Sequential(
        (
            keras.layers.Dense(
                units=20, activation='relu', input_dim=FEATURE_BITS
            ),
            keras.layers.Dense(
                units=20, activation='relu'
            ),
            keras.layers.Dense(
                units=7, activation='softmax'
            )
        )
    )
    model.compile(optimizer='sgd',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    data = np.random.randint(2, size=(10000, FEATURE_BITS))
    labels = np.vstack(map(get_mod_result_vector, data))

    model.fit(data, labels, epochs=100, batch_size=50)
    def predict(number):
        foo = model.predict(_number_to_vector(number))
        return np.argmax(foo)
    def is_correct_for_number(x):
        return bool(predict(x) == x % RADIX)
    sample = random_tools.shuffled(range(2 ** FEATURE_BITS))[:500]
    print('Total accuracy:')
    print(sum(map(is_correct_for_number, sample)) / len(sample))
    print(f'(Accuracy of random algorithm is {1/RADIX:.2f}')


if __name__ == '__main__':
    main()

这篇关于Keras:制作一个神经网络来找到一个数字的模数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆