为什么Tensorflow tf.learn分类结果差异很大? [英] Why do Tensorflow tf.learn classification results vary a lot?

查看:103
本文介绍了为什么Tensorflow tf.learn分类结果差异很大?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用TensorFlow高级API tf.learn来训练和评估DNN分类器,以进行一系列二进制文本分类(实际上,我需要多标签分类,但此刻我需要分别检查每个标签).我的代码与 tf.learn教程

I use the TensorFlow high-level API tf.learn to train and evaluate a DNN classifier for a series of binary text classifications (actually I need multi-label classification but at the moment I check every label separately). My code is very similar to the tf.learn Tutorial

classifier = tf.contrib.learn.DNNClassifier(
    hidden_units=[10],
    n_classes=2,
    dropout=0.1,
    feature_columns=tf.contrib.learn.infer_real_valued_columns_from_input(training_set.data))
classifier.fit(x=training_set.data, y=training_set.target, steps=100)
val_accuracy_score = classifier.evaluate(x=validation_set.data, y=validation_set.target)["accuracy"]

准确度分数从54%到90%大致不等,验证(测试)集中的21个文档始终相同.

Accuracy score varies roughly from 54% to 90%, with 21 documents in the validation (test) set which are always the same.

非常明显的偏差是什么意思?我了解有一些随机因素(例如辍学),但据我了解,该模型应该朝着最优方向收敛.

What does the very significant deviation mean? I understand there are some random factors (eg. dropout), but to my understanding the model should converge towards an optimum.

我使用单词(引词),双向和三字母组,情感分数和 LIWC分数作为特征,因此我确实有一个非常高维的特征空间,只有28个培训和21个验证文档.这会引起问题吗?除了收集更多的训练数据外,我如何能持续改善结果?

I use words (lemmas), bi- and trigrams, sentiment scores and LIWC scores as features, so I do have a very high-dimensional feature space, with only 28 training and 21 validation documents. Can this cause problems? How can I consistently improve the results apart from collecting more training data?

更新:为澄清起见,我生成了一个出现的单词和n-gram的字典,并丢弃了仅出现1次的单词和n-gram,所以我只使用语料库中存在的单词(n-grams).

Update: To clarify, I generate a dictionary of occurring words and n-grams and discard those that occur only 1 time, so I only use words (n-grams) that exist in the corpus.

推荐答案

这与TensorFlow无关.这个数据集非常小,因此您可以获得任何结果.您有28 + 21点,在一个具有无限"维数的空间中(大约有1,000,000个英语单词,因此10 ^ 18个卦,但是其中一些不存在,并且可以肯定的是它们在您的49个语言中不存在文档,但仍然至少有1,000,000个尺寸).对于此类问题,您必须期望结果的巨大差异.

This has nothing to do with TensorFlow. This dataset is ridiculously small, thus you can obtain any results. You have 28 + 21 points, in a space which has "infinite" amount of dimensions (there are around 1,000,000 english words, thus 10^18 trigrams, however some of them do not exist, and for sure they do not exist in your 49 documents, but still you have at least 1,000,000 dimensions). For such problem, you have to expect huge variance of the results.

除了收集更多的训练数据外,我如何能持续改善结果?

How can I consistently improve the results apart from collecting more training data?

您几乎不能.这只是对小样本进行任何统计分析的简单方法.

You pretty much cannot. This is simply way to small sample to do any statistical analysis.

因此,您最好的选择是更改评估方案,而不是将数据分割为28/21进行10倍交叉验证(约50分),这意味着您将必须运行10个实验,每份都有45份培训文档和4份测试文档,并对结果取平均值.这是您唯一可以减少方差的方法,但是请记住,即使使用CV,数据集也是如此小,不能保证,您的模型实际上在野外"表现得很好(一旦应用于从未见过数据).

Consequently the best you can do is change evaluation scheme instead of splitting data to 28/21 do 10-fold cross validation, with ~50 points this means that you will have to run 10 experiments, each with 45 training documents and 4 testing ones, and average the result. This is the only thing you can do to reduce the variance, however remember that even with CV, dataset so small gives you no guarantees how well your model will actualy behave "in the wild" (once applied to never seen before data).

这篇关于为什么Tensorflow tf.learn分类结果差异很大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆