如何将文本文件转换为 ARFF 格式? [英] How to convert a text file into ARFF format?

查看:87
本文介绍了如何将文本文件转换为 ARFF 格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 WEKA 工具进行文本分类,我必须将纯文本文件转换为 ARFF 格式.但是,我不知道该怎么做.任何人都可以帮我将文本文件转换为 ARFF 格式吗?

I'm using WEKA tool for text classification, and I have to convert plain text files into ARFF format. However, I don't know how to do that. Can anyone please help me to convert a text file into ARFF format?

感谢 Renklauf 的回复,

Thank you Renklauf for ur response,

我不明白这些要点因为像记事本这样的文本编辑器只允许有限数量的列,所以你需要得到像 Notepad++ 这样的东西才能在一行中容纳所有内容."..你能不能简单解释一下..

I didn't understood these points "Since text editors like Notepad only allow a limited number of columns, you'll need to get something like Notepad++ to fit everything on one line." .. can u plz explain in brief ..

假设文本数据就像一个简单的体育文章

Suppose the text data is like a simple sport article like

" 篮球是一项团队运动,目标是将球射入水平放置的篮筐以在遵循一套规则的同时得分.通常,两队五名球员在标记的矩形球场上比赛,每个队都有一个篮筐宽度端.篮球是世界上最受欢迎和最受广泛关注的运动之一"...

" Basketball is a team sport, the objective being to shoot a ball through a basket horizontally positioned to score points while following a set of rules. Usually, two teams of five players play on a marked rectangular court with a basket at each width end. Basketball is one of the world's most popular and widely viewed sports" ...

这是我的文本文档,我想将其转换为 arff 格式.. 之后我需要使用该 arff 格式文件进行 SVM 文本分类..

This is my text document and I want to convert this to arff format .. and after that I need to use that arff format file for SVM text classification ..

推荐答案

对于文档分类任务,每个文档都被视为一个属性,必须用引号引起来.假设您有一个包含 10 篇体育文章的语料库,这些文章标记为亲洋基队或亲红袜队,用于分类器自动将体育文章分类为亲洋基队或亲红袜队.您需要获取每个文档,用引号将其括起来,将其放在一行中,然后将您的 {yankees, red_sox} 属性值放在引号括起来的字符串之后.

For a document classification task, each document is considered an attribute and must be enclosed in quotes. Suppose you have a corpus of 10 sports articles tagged as either pro-Yankees or pro-Red Sox for a classifier that automatically classifies sports articles as either pro-Yankees or pro-Red Sox. You need to take each document, enclose it in quotes,place it on a single line, and then place your {yankees, red_sox} attribute value after the quotes-enclosed string.

 @relation yankeesOrRedSox
 @attribute article string
 @attribute yankeesOrSox { yankees, red_sox }
 @data

 "text of article 1 here", yankees
 .
 .
 .
 "text of article 10 here", red_sox

将文章放在一行中是关键.当我开始使用 Weka 进行文本分类时,这一点起初让我感到非常沮丧.由于像记事本这样的文本编辑器只允许有限数量的列,因此您需要使用 Notepad++ 之类的东西来将所有内容放在一行中.Notepad++ 具有 Join Lines 功能,允许您在一行中放置大量文本.

It's key that the article is placed on a single line. When I began using Weka for text classification, this is a point that caused me a lot of frustration at first. Since text editors like Notepad only allow a limited number of columns, you'll need to get something like Notepad++ to fit everything on one line. Notepad++ has a Join Lines function that allows you to place a lot of text on a single line.

希望这会有所帮助.

这篇关于如何将文本文件转换为 ARFF 格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆