Keras:建立神经网络以找到数字的模数 [英] Keras: Making a neural network to find a number's modulus

查看:67
本文介绍了Keras:建立神经网络以找到数字的模数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一位经验丰富的Python开发人员,但是机器学习方面的新手.这是我第一次使用Keras.你能告诉我我在做什么错吗?

我正在尝试制作一个神经网络,该神经网络采用二进制形式的数字,并在除以7时输出其模数.(我的目标是要做一个非常简单的任务,以确保一切正常.)

在下面的代码中,我定义了网络,并在10,000个随机数上对其进行训练.然后,我在500个随机数上对其进行了测试.

由于某种原因,我获得的准确度大约是1/7,这是您希望从完全随机算法获得的准确度,即我的神经网络没有做任何事情.

谁能帮助我找出问题所在?

import keras.models
import numpy as np
from python_toolbox import random_tools

RADIX = 7

def _get_number(vector):
    return sum(x * 2 ** i for i, x in enumerate(vector))

def _get_mod_result(vector):
    return _get_number(vector) % RADIX

def _number_to_vector(number):
    binary_string = bin(number)[2:]
    if len(binary_string) > 20:
        raise NotImplementedError
    bits = (((0,) * (20 - len(binary_string))) +
            tuple(map(int, binary_string)))[::-1]
    assert len(bits) == 20
    return np.c_[bits]


def get_mod_result_vector(vector):
    return _number_to_vector(_get_mod_result(vector))


def main():
    model = keras.models.Sequential(
        (
            keras.layers.Dense(
                units=20, activation='relu', input_dim=20
            ),
            keras.layers.Dense(
                units=20, activation='relu'
            ),
            keras.layers.Dense(
                units=20, activation='softmax'
            )
        )
    )
    model.compile(optimizer='sgd',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    data = np.random.randint(2, size=(10000, 20))
    labels = np.vstack(map(get_mod_result_vector, data))

    model.fit(data, labels, epochs=10, batch_size=50)
    def predict(number):
        foo = model.predict(_number_to_vector(number))
        return _get_number(tuple(map(round, foo[0])))
    def is_correct_for_number(x):
        return bool(predict(x) == x % RADIX)
    predict(7)
    sample = random_tools.shuffled(range(2 ** 20))[:500]
    print('Total accuracy:')
    print(sum(map(is_correct_for_number, sample)) / len(sample))
    print(f'(Accuracy of random algorithm is {1/RADIX:.2f}')


if __name__ == '__main__':
    main()

解决方案

UPD

经过一番修补,我能够使用RNN获得一个相当不错的解决方案.它只训练所有可能的唯一输入的5%以下,并在随机测试样本上提供> 90%的准确度.您可以将批次的数量从40个增加到100个,以使其更加准确(尽管在某些运行中,该模型可能无法收敛到正确的答案-在这里它比平时要高).我在这里改用Adam优化器,不得不将样本数量增加到50K(10K导致我过度拟合).

请理解,此解决方案有点费劲,因为它基于任务域知识,我们的目标函数可以通过对输入位序列(甚至是输入序列)的简单重复公式来定义如果您反转输入的位序列,则可以使用更简单的公式,但是在LSTM中使用go_backwards=True对此无济于事.

如果您反转输入位的顺序(以便我们总是从最高有效位开始),则目标函数的重复公式只是F_n = G(F_{n-1}, x_n),其中F_n = MOD([x_1,...,x_n], 7)G(x, y) = MOD(2*x+y, 7)-只有49个不同输入和7个可能的输出.因此,该模型必须学习初始状态+此G更新功能.对于从最低有效位开始的序列,重复公式会稍微复杂一些,因为它还需要跟踪每个步骤上当前的MOD(2**n, 7),但这似乎对于训练来说并不重要.

请注意-这些公式仅用于解释RNN在这里起作用的原因.下面的网络只是一个普通的LSTM层+ softmax,原始的位输入被视为一个序列.

使用RNN层的答案的完整代码:

import keras.models
import numpy as np
from python_toolbox import random_tools

RADIX = 7
FEATURE_BITS = 20

def _get_number(vector):
    return sum(x * 2 ** i for i, x in enumerate(vector))

def _get_mod_result(vector):
    return _get_number(vector) % RADIX

def _number_to_vector(number):
    binary_string = bin(number)[2:]
    if len(binary_string) > FEATURE_BITS:
        raise NotImplementedError
    bits = (((0,) * (FEATURE_BITS - len(binary_string))) +
            tuple(map(int, binary_string)))[::-1]
    assert len(bits) == FEATURE_BITS
    return np.c_[bits]


def get_mod_result_vector(vector):
    v = np.repeat(0, 7)
    v[_get_mod_result(vector)] = 1
    return v


def main():
    model = keras.models.Sequential(
        (
            keras.layers.Reshape(
                (1, -1)
            ),
            keras.layers.LSTM(
                units=100,
            ),
            keras.layers.Dense(
                units=7, activation='softmax'
            )
        )
    )
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.01),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    data = np.random.randint(2, size=(50000, FEATURE_BITS))
    labels = np.vstack(map(get_mod_result_vector, data))

    model.fit(data, labels, epochs=40, batch_size=50)
    def predict(number):
        foo = model.predict(_number_to_vector(number))
        return np.argmax(foo)
    def is_correct_for_number(x):
        return bool(predict(x) == x % RADIX)
    sample = random_tools.shuffled(range(2 ** FEATURE_BITS))[:500]
    print('Total accuracy:')
    print(sum(map(is_correct_for_number, sample)) / len(sample))
    print(f'(Accuracy of random algorithm is {1/RADIX:.2f}')


if __name__ == '__main__':
    main()

原始答案

我不确定它是如何发生的,但是对于NN,您选择检查代码的特定任务非常困难.我认为最好的解释是,当要素以这样的方式互连时,NN并不是很好,即改变一个要素总是会完全改变目标输出的值.一种看待它的方法是在您希望得到某个答案时查看特征集-在您的情况下,它们看起来像是在20维空间中非常大量的平行超平面的并集-并且对于这7个类别中的每一个,这些集合的飞机很好地"交错,并留给NN进行区分.

也就是说-如果您的示例数量很大,例如10K,可能的输入数量较小,则说您的输入位数只有8位大(因此只能有256个唯一输入)-网络应该学习"正确的功能还不错(通过记住"每个输入的正确答案,而无需泛化).对于您而言,这种情况不会发生,因为代码具有以下错误.

您的标签是20维向量,其位为0-6整数(您实际需要的标签)-因此,我想您很想教NN作为单独的分类器来学习答案的位(以前只有3位)可能为非零).我将其更改为我想像的您真正想要的-长度为7的向量,其中一个值只有1而其他值为0(所谓的一种热编码,keras实际上期望根据 解决方案

UPD

After some tinkering I was able to get to a reasonably good solution using RNNs. It trains on less than 5% of all possible unique inputs and gives >90% accuracy on the random test sample. You can increase number of batches to 100 from 40 to make it a bit more accurate (though in some runs there is a chance the model won't converge to the right answer - here it is higher than usually). I have switched to using Adam optimizer here and had to increase number of samples to 50K (10K led to overfitting for me).

Please understand that this solution is a bit of a tongue-in-cheek thing, because it is based on the task-domain knowledge that our target function can be defined by a simple recurring formula on the sequence of input bits (even simpler formula if you reverse your input bit sequence, but using go_backwards=True in LSTM didn't help here).

If you inverse the input bits order (so that we always start with the most significant bit) than the recurring formula for the target function is just F_n = G(F_{n-1}, x_n), where F_n = MOD([x_1,...,x_n], 7), and G(x, y) = MOD(2*x+y, 7) - only has 49 different inputs and 7 possible outputs. So the model kind of have to learn initial state + this G update function. For the sequence starting with the least significant bit the recurring formula is slightly more complicated cause it will also need to keep track on what is current MOD(2**n, 7) on each step, but it seems that this difficulty doesn't matter for training.

Please note - these formulas are only to explain why RNN works here. The net below is just a plain LSTM layer + softmax with original input of bits treated as a sequence.

Full code for the answer using RNN layer:

import keras.models
import numpy as np
from python_toolbox import random_tools

RADIX = 7
FEATURE_BITS = 20

def _get_number(vector):
    return sum(x * 2 ** i for i, x in enumerate(vector))

def _get_mod_result(vector):
    return _get_number(vector) % RADIX

def _number_to_vector(number):
    binary_string = bin(number)[2:]
    if len(binary_string) > FEATURE_BITS:
        raise NotImplementedError
    bits = (((0,) * (FEATURE_BITS - len(binary_string))) +
            tuple(map(int, binary_string)))[::-1]
    assert len(bits) == FEATURE_BITS
    return np.c_[bits]


def get_mod_result_vector(vector):
    v = np.repeat(0, 7)
    v[_get_mod_result(vector)] = 1
    return v


def main():
    model = keras.models.Sequential(
        (
            keras.layers.Reshape(
                (1, -1)
            ),
            keras.layers.LSTM(
                units=100,
            ),
            keras.layers.Dense(
                units=7, activation='softmax'
            )
        )
    )
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.01),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    data = np.random.randint(2, size=(50000, FEATURE_BITS))
    labels = np.vstack(map(get_mod_result_vector, data))

    model.fit(data, labels, epochs=40, batch_size=50)
    def predict(number):
        foo = model.predict(_number_to_vector(number))
        return np.argmax(foo)
    def is_correct_for_number(x):
        return bool(predict(x) == x % RADIX)
    sample = random_tools.shuffled(range(2 ** FEATURE_BITS))[:500]
    print('Total accuracy:')
    print(sum(map(is_correct_for_number, sample)) / len(sample))
    print(f'(Accuracy of random algorithm is {1/RADIX:.2f}')


if __name__ == '__main__':
    main()

ORIGINAL ANSWER

I'm not sure how it happened, but the particular task you chose to check your code is extremely difficult for a NN. I think the best explanation would be that NNs are not really good when features are interconnected in such way that changing one feature always change value of your target output completely. One way to look at it would be to see the sets of features when you expect a certain answer - in your case they will look like unions of very large number of parallel hyper planes in 20 dimensional space - and for each of 7 categories these sets of planes are "nicely" interleaved and left for NN to distinguish.

That said - if your number of examples is large, say 10K and number of possible inputs is smaller, say your input bit numbers are only 8 bits large (so 256 unique inputs possible only) - networks should "learn" the right function quite ok (by "remembering" correct answers for every input, without generalization). In your case that doesn't happen because the code has the following bug.

Your labels were 20-dimensional vectors with bits of 0-6 integer (your actual desired label) - so I guess you were pretty much trying to teach NN to learn bits of the answer as separate classifiers (with only 3 bits ever possible to be non-zero). I changed that to what I assume you actually wanted - vectors of length 7 with only one value being 1 and others 0 (so-called one hot encoding which keras actually expects for categorical_crossentropy according to this). If you wanted to try to learn each bit separately you definitely shouldn't have used softmax 20 in the last layer, cause such output generates probabilities on 20 classes which sum up to 1 (in that case you should have trained 20-or-rather-3 binary classifiers instead). Since your code didn't give keras correct input the model you got in the end was kind of random and with rounding you applied was intented to output the same value for 95%-100% of inputs.

Slightly changed code below trains a model which can more or less correctly guess the mod 7 answer for every number 0 to 255 (again, pretty much remembers the correct answer for every input). If you try to increase FEATURE_BITS you will see large degradation of the results. If you actually want to train NN to learn this task as is with 20 or more bits of input (and without supplying NN with all possible inputs and infinite time to train) you will need to apply some task-specific feature transformations and/or some layers carefully designed to exactly be good at task you want to achieve as others already mentioned in comments to your question.

import keras.models
import numpy as np
from python_toolbox import random_tools

RADIX = 7
FEATURE_BITS = 8

def _get_number(vector):
    return sum(x * 2 ** i for i, x in enumerate(vector))

def _get_mod_result(vector):
    return _get_number(vector) % RADIX

def _number_to_vector(number):
    binary_string = bin(number)[2:]
    if len(binary_string) > FEATURE_BITS:
        raise NotImplementedError
    bits = (((0,) * (FEATURE_BITS - len(binary_string))) +
            tuple(map(int, binary_string)))[::-1]
    assert len(bits) == FEATURE_BITS
    return np.c_[bits]


def get_mod_result_vector(vector):
    v = np.repeat(0, 7)
    v[_get_mod_result(vector)] = 1
    return v


def main():
    model = keras.models.Sequential(
        (
            keras.layers.Dense(
                units=20, activation='relu', input_dim=FEATURE_BITS
            ),
            keras.layers.Dense(
                units=20, activation='relu'
            ),
            keras.layers.Dense(
                units=7, activation='softmax'
            )
        )
    )
    model.compile(optimizer='sgd',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    data = np.random.randint(2, size=(10000, FEATURE_BITS))
    labels = np.vstack(map(get_mod_result_vector, data))

    model.fit(data, labels, epochs=100, batch_size=50)
    def predict(number):
        foo = model.predict(_number_to_vector(number))
        return np.argmax(foo)
    def is_correct_for_number(x):
        return bool(predict(x) == x % RADIX)
    sample = random_tools.shuffled(range(2 ** FEATURE_BITS))[:500]
    print('Total accuracy:')
    print(sum(map(is_correct_for_number, sample)) / len(sample))
    print(f'(Accuracy of random algorithm is {1/RADIX:.2f}')


if __name__ == '__main__':
    main()

这篇关于Keras:建立神经网络以找到数字的模数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆