神经网络训练中的数据编码 [英] Data Encoding for Training in Neural Network

查看:346
本文介绍了神经网络训练中的数据编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将349,900个单词从词典文件转换为md5哈希.示例如下:

I have converted 349,900 words from a dictionary file to md5 hash. Sample are below:

74b87337454200d4d33f80c4663dc5e5
594f803b380a41396ed63dca39503542
0b4e7a0e5fe84ad35fb5f95b9ceeac79
5d793fc5b00a2348c3fb9ab59e5ca98a
3dbe00a167653a1aaee01d93e77e730e
ffc32e9606a34d09fca5d82e3448f71f
2fa9f0700f68f32d2d520302906e65ce
1c9b32ff1b53bd892b87578a11cbd333
26a10043bba821303408ebce568a2746
c3c32ff3481e9745e10defa7ce5b511e 

我想训练一个神经网络,以使用诸如多层感知器之类的简单架构解密哈希.由于所有哈希值的长度均为32,因此我想输入的节点数为32,但是这里的问题是输出节点的数.由于输出是字典中的单词,因此没有特定的长度.它可以是各种长度.这就是为什么我对我应该拥有多少个输出节点感到困惑的原因.

I want to train a neural network to decrypt a hash using just simple architecture like MultiLayer Perceptron. Since all hash value is of length 32, I was thingking that the number of input nodes is 32, but the problem here is the number of output nodes. Since the output are words in the dictionary, it doesn't have any specific length. It could be of various length. That is the reason why Im confused on how many number of output nodes shall I have.

如何编码数据,以便可以有特定数量的输出节点?

How will I encode my data, so that I can have specific number of output nodes?

我在此处中找到了一篇论文该链接实际上是使用神经网络解密哈希的.报纸说

I have found a paper here in this link that actually decrypt a hash using neural network. The paper said

神经网络的输入是要解码的加密文本.它以双极或二进制格式馈入神经网络.然后,它遍历隐藏层到达最终输出层,该输出层也采用双极性或二进制格式(如输入中所述).然后将其转换回纯文本以进行进一步处理.

The input to the neural network is the encrypted text that is to be decoded. This is fed into the neural network either in bipolar or binary format. This then traverses through the hidden layer to the final output layer which is also in the bipolar or binary format (as given in the input). This is then converted back to the plain text for further process.

我将如何执行本文中所说的内容.我正在考虑限制要解密的字符数.最初,我最多只能将其限制为4个字符(仅出于测试目的).

How will I implement what is being said in the paper. I am thinking to limit the number of characters to decrypt. Initially , I can limit it up to 4 characters only(just for test purposes).

我的输入节点将是32节点,代表哈希的每个字符.每个输入节点将具有(each_hash_character/256的ASCII值).我的输出节点将有32个节点,这些节点也代表二进制格式.由于8位/8节点代表一个字符,因此我的网络仅能够解密最多4个字符的字符,因为(32/8)=4.(如果需要,我可以增加它.)我计划使用33个节点.我的网络架构可行吗? 32 x 33 x 32?如果没有,为什么?请引导我.

My input nodes will be 32 nodes representing every character of the hash. Each input node will have the (ASCII value of the each_hash_character/256). My output node will have 32 nodes also representing binary format. Since 8 bits/8 nodes represent one character, my network will have the capability of decrypting characters up to 4 characters only because (32/8) = 4. (I can increase it if I want to. ) Im planning to use 33 nodes. Is my network architecture feasible? 32 x 33 x 32? If no, why? Please guide me.

推荐答案

您可以在矢量空间中将字典中的单词映射(例如,成袋的单词word2vec等).在那种情况下,单词以固定长度编码.输出层中神经元的数量将与该长度匹配.

You could map the word in the dictionary in a vectorial space (e.g. bag of words, word2vec,..). In that case the words are encoded with a fix length. The number of neurons in the output layer will match that length.

这篇关于神经网络训练中的数据编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆