TensorFlow中的神经网络比随机森林更差,并且每次都预测相同的标签 [英] Neural Network in TensorFlow works worse than Random Forest and predict the same label each time

查看:872
本文介绍了TensorFlow中的神经网络比随机森林更差,并且每次都预测相同的标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是DNN和TesorFlow的新手. 我有用于二进制分类的NN问题.

I am new in DNN and TesorFlow. I have the problem with NN using for binary classification.

作为输入数据,我有文本数据集,该数据集由TF-IDF转换为数值向量.

As input data I have text dataset, which was transformed by TF-IDF into numerical vectors.

训练数据集的行数为43000 功能数4235

The number of rows for training dataset is 43 000 The number of features 4235

我尝试使用TFlearn库,然后使用Keras io.但是结果是相同的-NN只能预测一个标签0或1,并且其准确度要比随机森林差.

I tried to use TFlearn library and then Keras io. But the result is the same - NN predict only one label 0 or 1 and give worse Accuracy then Random Forest.

我将添加用于NN构建的脚本.请告诉我其中有什么问题.

I will add the script, which I use for NN building. Please, tell me what is wrong in it.

model = Sequential()

model.add(Dense(100, input_dim=4235, init='uniform', activation='relu'))
model.add(Dense(4235, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, nb_epoch=100, batch_size=10,  verbose=2)

推荐答案

仅提供您提供的信息就有许多可能的原因,您还可以尝试改进许多方面,但是从高层次上讲,这是我的经验中最重要的项目.抱歉,您是否已经检查过大部分内容?

There's many possible reasons with just the information you provided and also many things you can try to improve, but from a high level here's the most important items in my experience. I apologize if you already checked most of this:

当没有足够的数据时,深度学习实际上可能比经典" ML(例如,树,svm)表现更差.多少就取决于任务,但是作为一个宽松的经验法则,您可能希望具有与您拥有的数据量相同数量级的多个模型参数.在发布的模型中,您有100 x 4235 + 100 x 4235 + 4235 * 1 = 851,235.

Deep Learning might actually perform worst than "classical" ML (e.g. trees, svm) when there's not enough data. How much is enough is task dependent, but as a loose rule of thumb you may want to have a number of model parameters around the same order of magnitude as the amount of data you have. In the model you posted you have 100 x 4235 + 100 x 4235 + 4235 * 1 = 851,235 parameters.

从您发布的代码来看,您似乎没有使用任何正则化(例如,辍学或L2),也没有使用验证集来衡量训练集中模型的质量.您的模型可能过度拟合了训练集.

From the code you posted it seems you're not using any regularization (e.g. dropout or L2) nor using a validation set to measure the quality of the model out of the training set. Your model could be overfitting the training set.

对于文本建模,通常使用RNN(例如LSTM或GRU)或CNN代替密集/完全连接层. RNN和CNN包含用于约束密集层中缺少的序列的体系结构约束.换句话说,密集层缺乏有关数据类型的先验知识,因此它们可能需要更多的数据/训练时间才能获得类似的性能. Keras回购中有很多这样的示例: https://github.com/fchollet/keras/tree/master/examples

For modeling text it's typical to use RNNs (e.g. LSTM or GRU) or CNNs instead of Dense/Fully connected layers. RNNs and CNNs contain architectural constraints to model sequences that is absent in Dense layers. In other words Dense layers lack prior knowledge about the type of data so they will potentially need a lot more data/train time to attain similar performance. There's plenty of examples of this in the Keras repo: https://github.com/fchollet/keras/tree/master/examples

这样的例子就是带有LSTM的IMDB文本(二进制)分类: https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

One such example is this IMDB text (binary) classification with LSTM: https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

深度学习中另一个非常常见的工具是将文本编码为单词向量序列(有时是一个热门字符).这些既可以初始化为随机向量,也可以使用预先训练的向量(例如GLOVE和word2vec)初始化.上面的示例使用了前一种方法.

Another very common tool in deep learning is to encode text as a sequence of word vectors (and sometimes one-hot characters). These can either be initialized a s random vectors or initialized with pre-trained vectors (e.g. GLOVE and word2vec). The example above uses the former approach.

这篇关于TensorFlow中的神经网络比随机森林更差,并且每次都预测相同的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆