文本挖掘 - 从非结构化文本中提取波段名称 [英] Text mining - extract name of band from unstructured text

查看:53
本文介绍了文本挖掘 - 从非结构化文本中提取波段名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这是一个笼统的、开放式的问题.我基本上是在寻求帮助来决定前进的方向,也许是一些阅读材料.

I'm aware that this is kind of a general, open-ended question. I'm essentially looking for help in deciding a way forward, and perhaps for some reading material.

我正在研究一种进行非结构化文本挖掘的算法,并试图从该文本中提取特定的内容 - 乐队(单个艺术家、乐队等)的名称.文本本身没有可预测的结构,但相对较小(1、2 行文本).

I'm working on an algorithm that does unstructured text mining, and trying to extract something specific - the names of bands (single artists, bands, etc) from that text. The text itself has no predictable structure, but it is relatively small (1, 2 rows of text).

一些例子可能是(非真实事件):

Some examples may be (not real events):

Concert Green Day At Wembley Stadium
Extraordinary representation - Norah Jones in Poland - at the Polish Opera

现在,我正在考虑尝试一个分类器,但文本似乎很小,无法为其提供任何真正的训练信息.可能还有其他几种文本挖掘技术、启发式方法或算法可以为此类问题产生良好的结果(或者可能没有算法会).

Now, I'm thinking of trying out a classifier but the text seems to small to provide any real training information for it. There probably are several other text mining techniques, heuristics or algorithms that may yield good results for this kind of problem (or perhaps no algorithm will).

推荐答案

由于数据结构的原因,预训练模型的性能可能会很差.此外,一般的organizationlocationperson 类别可能对您没有用处.

Because of the structure of your data a pre-trained model will probably perform poorly. Besides, the general organization, location, and person categories will probably not be useful for you.

我不认为文本本身太小,大多数 NER 系统一次只处理一个句子.因此,为您自己的训练集提供 NER 库可能会很有效,例如 http://nlp.stanford.edu/ner/index.shtml

I don't think the text themselves are too small, most NER-systems work on one sentence at a time. So providing your own training set with a NER-library will probably work well, such as http://nlp.stanford.edu/ner/index.shtml

如果您不想创建训练集,您将需要一本包含所有乐队/艺术家的字典.那么你显然找不到不知名的乐队/艺术家.

If you don't want to create a training set you will need a dictionary with all the bands/artists. Then you obviously can't find unknown bands/artists.

这篇关于文本挖掘 - 从非结构化文本中提取波段名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆