从单个字符串中删除停用词 [英] Removing stop words from single string
本文介绍了从单个字符串中删除停用词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的查询是string = 'Alligator in water'
,其中in
是停用词.如何删除它,以便得到stop_remove = 'Alligator water'
作为输出.我用ismember
尝试过,但是它返回匹配单词的整数值,我想把剩下的单词作为输出.
My query is string = 'Alligator in water'
where in
is a stop word. How can I remove it so that I get stop_remove = 'Alligator water'
as output. I have tried it with ismember
but it returns integer value for matching word, I want to get the remaining words as output.
in
只是一个例子,我想删除所有可能的停用词.
in
is just an example, I'd like to remove all possible stop words.
推荐答案
使用它删除所有停用词.
Use this for removing all stop-words.
代码
% Source of stopwords- http://norm.al/2009/04/14/list-of-english-stop-words/
stopwords_cellstring={'a', 'about', 'above', 'above', 'across', 'after', ...
'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', ...
'already', 'also','although','always','am','among', 'amongst', 'amoungst', ...
'amount', 'an', 'and', 'another', 'any','anyhow','anyone','anything','anyway', ...
'anywhere', 'are', 'around', 'as', 'at', 'back','be','became', 'because','become',...
'becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below',...
'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom','but', 'by',...
'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de',...
'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight',...
'either', 'eleven','else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', ...
'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify',...
'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found',...
'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt',...
'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', ...
'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'ie', 'if',...
'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last',...
'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile',...
'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must',...
'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine',...
'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off',...
'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise',...
'our', 'ours', 'ourselves', 'out', 'over', 'own','part', 'per', 'perhaps', 'please',...
'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious',...
'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so',...
'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', ...
'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them',...
'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', ...
'therein', 'thereupon', 'these', 'they', 'thickv', 'thin', 'third', 'this', 'those',...
'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too',...
'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up',...
'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when',...
'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein',...
'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever',...
'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet',...
'you', 'your', 'yours', 'yourself', 'yourselves', 'the'};
str1 = 'Alligator in water of the pool'
split1 = regexp(str1,'\s','Split');
out_str1 = strjoin(split1(~ismember(split1,stopwords_cellstring)),' ')
输出
str1 =
Alligator in water of the pool
out_str1 =
Alligator water pool
注意::此代码使用 strjoin from Mathworks File-exchange
.
这篇关于从单个字符串中删除停用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文