将命名实体作为文本分类中的功能? [英] Named entities as a feature in text categorization?

查看:116
本文介绍了将命名实体作为文本分类中的功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用现有的文本分类(监督)技术,为什么我们不将文本中的命名实体(NE)视为培训和测试的功能?您认为我们可以通过使用网元作为功能来提高精度吗?

With existing text categorization (supervised) techniques why don't we consider Named Entities (NE) in the text as a feature in training and testing? Do you think we can improve precision with using NEs as a feature?

推荐答案

这在很大程度上取决于您正在使用的域.您必须基于该域定义功能.假设在搜索引擎中您正在学习学习排名问题,生成动态排名,NE不会给您带来任何好处.它在很大程度上取决于您正在工作的域以及定义的输出分类标签(监督学习).

It depends a lot on the domain you are working in. You have to define the features based on the domain. Say in a search engine you are working on learning to rank problem, generating a dynamic rank, the NE's wont give you any benefit here. It largerly depends on the domain that you are working and also the output categorization labels (supervised learning) defined.

现在说您正在对与足球",电影"或政治"等相关的文档进行分类.在这种情况下,命名实体可以工作.我在这里给您一个例子,假设您使用的是将文件分类为足球,电影,政治等的神经网络.现在说一个文件"Lionel Messi受邀参加社交网络"的总理,目前是包括Jesse Eisenberg,Andrew Garfield和Justin Timberlake在内的演员和工作人员."在这里,命名实体(输入要素)和电影(定义的输出)之间的联系会更牢固,因此将其归类为Movie上的文档.

Now say you are working on classifying documents pertaining to Soccer or Movie or Polictics and so on. In this case Named Entities can work. I will give you an example here, say you are using a Neural Network which categorizes documents into Soccer, Movie, Politics etc. Now say a document comes in "Lionel Messi was invited to attend the premier of "The Social Network", also present were the cast and crew including Jesse Eisenberg, Andrew Garfield and Justin Timberlake" Here the connection between named entities (input features) and movie (output defined) will be stronger and hence it will be classified as a document on Movie.

再举一个例子,假设我们的文档是汤姆·克鲁斯(Tom Cruise)在电影《最后的足球比赛》中描绘了莱昂内尔·梅西(Lionel Messi)的角色.这带来的好处是,您的神经网络已经了解到,当演员和足球运动员合而为一时,文档很有可能是电影.再次取决于数据和培训,这可能也是相反的方式(但这就是学习的全部内容;可以查看过去的数据)

Another example, say our document is "Tom Cruise is portraying the character of Lionel Messi in the movie "The last soccer game". Here comes the benefit say your neural network has learnt that when an actor and footballer comes together in one document there is high probability of it being a movie. Again it depends on the data and training it may be other way round too (but that is what is learning all about; seeing the past data)

所以我的答案是尝试一下,没有人阻止您将命名实体作为要素.这可能对您正在使用的域有所帮助.

So my answer would be try it out, nobody is stopping you to have named entities as features. It might help for the domain that you are working in.

这篇关于将命名实体作为文本分类中的功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆