使用spacy和html突出显示动词短语 [英] Highlight verb phrases using spacy and html

查看:125
本文介绍了使用spacy和html突出显示动词短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经设计了一个红色字体动词短语并将其输出为HTML的代码.

I have devised a code to red font verb phrases and output it as HTML.

from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
import codecs
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book. The dog is barking.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
with open("my.html","w") as fp:
    for list in lists:
        search_word = (list.text)
        fp.write(sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))

当前输出

The author **is writing** a new book. The dog is barking.The author is writing a new book. The dog **is barking.**

句子被重复两次,首先是写作,最后是吠叫.

The sentence is getting repeated twice and first is writing and last is barking is detected.

预期输出:

The author **is writing** a new book. The dog **is barking.**

在将其发送到列表检查之前,我是否必须对句子进行标记处理?请帮忙吗?

Should i have to do a sentence tokenization before sending it to list check? Please help?

推荐答案

找到了另一种更合乎逻辑的方法.与其替换整个句子,不如替换具有模式的句子.

Found an alternative and more logical way. Instead of replacing in whole sentence, it is better to replace in a sentence which have the pattern.

with open("my.html","w") as fp:
for _list in lists:
    search_word = (_list.text)
    containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]
    fp.write(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))

以上代码将分别写出句子.如果要作为句子使用,请将修改内容附加到列表中,然后将它们加入,然后再写入文件,如下所示.

the above code will write the sentences separately. If you want to do it as a sentence, append the modifications to a list and join them before writing to a file as below.

mod_sentence = []
for _list in lists:
    search_word = (_list.text)
    containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]+'.'
    mod_sentence.append(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
with open("my.html","w") as fp:
    fp.write(''.join(mod_sentence))

希望这会有所帮助!干杯!

Hope this helps! Cheers!

这篇关于使用spacy和html突出显示动词短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆