火车中的正/负比例 [英] Positives/negatives proportion in train set

查看：108 发布时间：2020/5/4 9:55:27 machine-learning information-retrieval

本文介绍了火车中的正/负比例的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试获取 Rocchio算法，以使相关反馈起作用.我有一个查询，还有一些标记为肯定和否定的文档.例如，我有60个正词和337个负词.我想使用此数据集的一部分训练我的模型(在这种情况下-调整查询)，并在另一部分进行测试.但是，对于这种不平衡的数据集，我不确定要纳入训练集中的负数和正数.

I'm trying to get Rocchio algorithm for relevance feedback to work. I have a query, and a few documents marked positives and negatives. For example, I have 60 positives and 337 negatives. I want to train my model(in this case - adjust the query) using part of this dataset and test it on the other part. But having this kind of imbalanced dataset i'm not sure how many negatives and how many positives to take into training set.

另一个问题是，根据测试数据集中阳性/阴性的比例，我得到的误导性为Precision，Recall和F1评分结果.测试数据集中有49个正值和17个负值使我的Precision = 0.742，Recall = 1.000和F1 = 0.852，其中TP = 49，FP = 17，TN = 0，FN = 0.

Another problem is that depending on the positives/negatives proportion in test dataset I get misleading Precision, Recall and F1-score results. Having 49 positives and 17 negatives in test dataset gives me Precision=0.742, Recall=1.000 and F1=0.852, with number of TP=49, FP=17, TN=0, FN=0.

其他查询的正负比例分布并没有提示我为模型选择哪个比例.

Distribution of positives/negatives proportion for other queries doesnt give me any hint on which proportion to choose for my model.

因此，我要问您的是有关使用不平衡数据集以获取正确结果的一些建议.

So what im asking you for is some advice on working with imbalanced datasets to get correct results.

在此先感谢您，这样的菜鸟问题(-ish?):-)

Thanks in advance, sorry for such a noob(-ish?) question :-)

火车中的正/负比例 [英] Positives/negatives proportion in train set

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

火车中的正/负比例 [英] Positives/negatives proportion in train set

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭