文本挖掘数据集 [英] text mining data set

查看:111
本文介绍了文本挖掘数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在论文中使用文本挖掘方法

我如何找到这个guid的数据集

请帮帮我

i want to work with text mining methods on essay
how can i find a data set for this guid
please help me

推荐答案

一般情况下 - 关于这个问题和你发布的其他问题(我刚刚简要回顾过) - 如果你在发布之前研究你的问题,你将节省相当多的时间题。这不仅可以节省其他人的时间,也可以为您节省时间(因此您不必多次澄清您的问题,但仍然没有得到答案)。此外,如果您花了一些时间了解您计划使用的技术并掌握基础知识或基础知识,您将获得更大的成功。



如果你想要对你的问题或问题的解决方案给出一个好的答案,你需要对问题和你的情况给出更多细节,因为人们无法理解你的想法,你给出的细节对任何人来说都是不够的甚至尝试回答。例如,您需要对此文本挖掘系统做些什么? (当你回答时,请不要说,我的文字 - 告诉我们你的意思)。您是在构建整个系统,还是只构建它的一个部分?您目前需要解决哪些部分 - 数据源,用户界面等?你提到一个GUID - 你的意思是什么?我假设你只是指用户界面 - 是否有用户界面,如果有,它是什么类型 - 桌面,WinForms,WPF,Web,Silverlight,ASP.Net等?用户界面上的论文在哪里?您是否需要选择控件来放入论文,或者您是否已经对它们进行控制?如果是这样,你试图把文章放入什么控制?论文的格式是什么 - 文本文件,word文档,pdf,html等?它们来自哪里 - 来自互联网,桌面上的文件,或者某种类型的数据库?您是否需要将它们保存到磁盘,或者您打算如何存储它们?你不需要回答我在上面提出的所有问题,但你至少应该解决与你的问题相关的问题(大多数我提出的问题)。



你的问题的快速答案是,DataSets没有实用的方法可以容纳冗长的字符串,如散文(我不是说这是不可能的,只是它对你没有多大帮助。 DataSet包含关系数据库对象 - DataTables - 每个对象都有列和行。



考虑项目的设计并尝试确定合适的结构。这里有一个建议:如果你不需要保留格式,请尝试将每篇文章放入一个字符串中。您的论文集可以保存在通用集合中,例如List(即List< string>)。或者,您可以查看使用内容管理系统(CMS)来保存论文。研究这些主题,看看你是否能提出一个有用的想法。



祝你好运!
In general - regarding this question and others you post (which I just reviewed briefly) - you will save quite a bit of time if you research your problem before posting a question. Not only will this save other people time, this will save time for you as well (so you don''t have to clarify your question several times and still receive no answer). Also, you would have greater success if you spent some time learning more about the technologies you plan to employ and developing a solid knowledge of the basics or fundamentals.

If you want a good answer to your question or solution to your problem, you need to give more specifics regarding the problem and your situation, because people can''t read your mind and the details you''ve given aren''t sufficient for anyone to even attempt an answer. For instance, what will you need to do with this text-mining system? (when you respond, please don''t say, "mine text" - tell us what you mean by that). Are you building the entire system, or merely a single part of it? What parts are currently in place that you must work around - data sources, user interface, etc.? You mention a GUID - what do you mean? I assume you are simply referring to the user interface - is there a user interface and, if so, what kind is it - Desktop, WinForms, WPF, Web, Silverlight, ASP.Net, etc.? Where are the essays going on the user interface? Do you need to choose controls to put the essays into, or do you have controls for them already? If so, what controls are you trying to put the essays into? What is the format of the essays - text files, word documents, pdfs, html, etc.? Where are they coming from - from the internet, files on your desktop or, or from a database of some sort? Do you need to save them to disk, or how do you plan to store them? You don''t need to answer all of the questions I''ve posed above, but you should, at least, address the issues that relate to your question (most of the questions I''ve posed).

The quick answer to your question is that there is no practical way in which DataSets can hold lengthy strings such as essays (I''m not saying it''s impossible, just that it wouldn''t do much for you). DataSets hold relational database objects - DataTables - that each have columns and rows.

Consider the design of your project and try to identify a suitable structure. Here''s a suggestion: if you don''t need to preserve formatting, try putting each essay into a string. Your collection of essays could be held in a generic collection, such as a List (i.e., List<string>). Alternatively, you could look at using a content management system (CMS) to hold the essays. Research those topics and see if you can come up with a usable idea.

Best of luck!


这篇关于文本挖掘数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆