如何在带有单独测试集的Rapidminer中应用InformationGain? [英] How to apply InformationGain in rapidminer with seperate test set ?

查看:159
本文介绍了如何在带有单独测试集的Rapidminer中应用InformationGain?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Rapidminer处理文本分类.我有单独的测试和训练分组.我使用n折交叉验证将信息增益应用于数据集,但是我对如何将其应用于单独的测试集感到困惑?以下是图片

I am dealing with text classification in rapidminer. I have seperate test and training splits. I applied Information Gain to a dataset using n-fold cross validation but i am confused on how to apply it on seperate test set ? Below is attached image

在图中,我已将用于训练的第一个来自文件的处理文档"的单词列表输出连接到用于测试的第二个来自文件的处理文档"的单词列表输出,但我想将简化的功能应用于第二个"来自文件的处理文件"也许应该是按重量选择"(缩小尺寸)运算符返回的文件,但它返回的重量是我无法提供给第二个来自文件的处理文件"的重量.我搜索了很多,但没有找到能满足我需要的东西?

In figure i have connected the word list output from first "Process Documents From Files" which is used for training to second "Processed Documents From Files" which is used for testing but i want to apply the reduced feature to the second "Process Documents From Files" which perhaps should be the one returned from "Select By Weight" (reduced dimensions) operator but it returns weights which i cannot provide to second "Process Documents From Files". I searched alot but did'nt managed to find anything which can satisfy my need ?

Rapidminer真的有可能进行单独的测试/训练拆分并应用特征选择吗?

Is it really possible for Rapidminer to have seperate test/train splits and apply feature selection ?

有什么方法可以将这些权重转换为单词表?请不要说在存储库中写(我不能这样做)?

Is there any way to convert these weights into word list ? Please don't say write in repository (i can't do this) ?

在这种情况下,当我具有不同的测试/训练分割并需要应用特征选择时,我如何确保测试/训练分割具有相同的维向量?

In such scenario when i have different test/train splits and needs to apply feature selection, how would i make sure that test/train splits have same dimension vectors ?

我真的很困惑,请帮忙...

I am really trapped out at it, kindly help ...

推荐答案

在较低的Process Documents运算符之后立即在Apply Model之前插入一个新的Select By Weight运算符.使用Multiply运算符从Weight By Information Gain运算符复制权重,并将其连接到新的Select By Weight运算符的输入.

Immediately after the lower Process Documents operator insert a new Select By Weight operator before the Apply Model. Use a Multiply operator to copy the weights from the Weight By Information Gain operator and connect this to the input of the new Select By Weight operator.

这篇关于如何在带有单独测试集的Rapidminer中应用InformationGain?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆