使用SVM为命名实体选择功能 [英] Feature selection for Named entity using SVM

查看:80
本文介绍了使用SVM为命名实体选择功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些用户评论数据,我希望从这些数据中找到消费电子品牌的名称. 例如,请考虑这些谈论"PS4",诺基亚720 lumia",苹果ipad",索尼bravia"的ne_chinked例句:-

I have some user comments data from which I want to find the name of consumer electronic brands. For instance consider these ne_chinked example sentence which talk about "PS4", "nokia 720 lumia" ,"apple ipad", "sony bravia":-

In [52]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('When is the PS4 releasing')))
Out[52]: Tree('S', [('When', 'WRB'), ('is', 'VBZ'), ('the', 'DT'), Tree('ORGANIZATION', [('PS4', 'NNP')]), ('releasing', 'NN')])

In [53]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('I couldnt find the nokia 720 lumia in stores')))
Out[53]: Tree('S', [('I', 'PRP'), ('couldnt', 'VBP'), ('find', 'JJ'), ('the', 'DT'), ('nokia', 'NN'), ('720', 'CD'), ('lumia', 'NN'), ('in', 'IN'), ('stores', 'NNS')])

In [54]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('I just bought apple ipad and its really awesome')))
Out[54]: Tree('S', [('I', 'PRP'), ('just', 'RB'), ('bought', 'VBD'), ('apple', 'JJ'), ('ipad', 'NN'), ('and', 'CC'), ('its', 'PRP$'), ('really', 'RB'), ('awesome', 'JJ')])

In [55]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('I would like to buy 1 Sony bravia led television')))
Out[55]: Tree('S', [('I', 'PRP'), ('would', 'MD'), ('like', 'VB'), ('to', 'TO'), ('buy', 'VB'), ('1', 'CD'), ('Sony', 'NNP'), ('bravia', 'IN'), ('led', 'VBN'), ('television', 'NN')])  

问题是我该如何表示svm要学习的数据.我阅读了数十篇研究论文,但没有一篇向svm展示它们如何表示特征数据. 有人可以帮忙

The problem is how do I represent the data for the svm to learn. I read tens of research papers but none of them have disclosed how they represented the feature data to the svm. Can anybody please help

推荐答案

我要做的是在列表中添加您关注的所有电子品牌条目,然后为了使每个条目都是唯一的,我将使用其条目的索引在列表中作为功能.

What I would do is add all entries of electronic brands you care in a list, and then in order for each entry to be unique I would use its entry's index in the list as a feature.

例如 ['诺基亚','苹果','微软']

e.g. ['Nokia', 'Apple', 'Microsoft']

然后: 诺基亚=> 1 微软=> 2等

then: Nokia => 1 Microsoft => 2 etc

这可能有助于在每个品牌上具有唯一的代表,因此,我将介绍SVM的其他功能.

This could help having a unique representation per brand as as a result a feature for SVM amongst others I pressume.

这篇关于使用SVM为命名实体选择功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆