使用python创建字形 [英] Create wordforms using python
本文介绍了使用python创建字形的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何使用Python获得不同的单词形式.我想创建一个如下所示的列表.
How can I get different word forms using Python. I want to create a list like the following.
Work=['Work','Working','Works']
我的代码:
raw = nltk.clean_html(html)
cleaned = re.sub(r'& ?(ld|rd)quo ?[;\]]', '\"', raw)
tokens = nltk.wordpunct_tokenize(cleaned)
stemmer = PorterStemmer()
t = [stemmer.stem(t) if t in Words else t for t in tokens]
text = nltk.Text(t)
word = words(n)
Words = [stemmer.stem(e) for e in word]
find = ' '.join(str(e) for e in Words)
search_words = set(find.split(' '))
sents = ' '.join([s.lower() for s in text])
blob = TextBlob(sents.decode('ascii','ignore'))
matches = [map(str, blob.sentences[i-1:i+2]) # from prev to after next
for i, s in enumerate(blob.sentences) # i is index, e is element
if search_words & set(s.words)]
#return list(itertools.chain(' '.join (str(y).replace('& rdquo','').replace('& rsquo','') for y in matches))
return list(itertools.chain(*matches))
推荐答案
有点棘手.我试图查看文本中的词干形式,然后将其与单词列表进行映射.我也将其更改为小写,因为令牌化没有做到这一点,并且比它完美地映射.下面是更新的代码
It was a little tricky. I tried to look at the stemmed forms in the text and then mapped it with the list of words. Also i changed it to lower case since tokenizing doesn't do that and than it mapped perfectly. Below is the updated code
raw = nltk.clean_html(html)
cleaned = re.sub(r'& ?(ld|rd)quo ?[;\]]', '\"', raw)
tokens = nltk.wordpunct_tokenize(cleaned)
lower = [w.lower() for w in tokens]
stemmer = PorterStemmer()
t = [stemmer.stem(t) if t in Words else t for t in lower]
text = nltk.Text(t)
word = words(n)
Words = [stemmer.stem(e) for e in word]
find = ' '.join(str(e) for e in Words)
search_words = set(find.split(' '))
sents = ' '.join([s.lower() for s in text])
blob = TextBlob(sents.decode('ascii','ignore'))
matches = [map(str, blob.sentences[i-1:i+2]) # from prev to after next
for i, s in enumerate(blob.sentences) # i is index, e is element
if search_words & set(s.words)]
#return list(itertools.chain(' '.join (str(y).replace('& rdquo','').replace('& rsquo','') for y in matches))
返回列表(itertools.chain(* matches))
return list(itertools.chain(*matches))
这篇关于使用python创建字形的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文