Python-从段落中提取句子 [英] Python - Extracting sentences from paragraphs

查看:505
本文介绍了Python-从段落中提取句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python&的新手可以使用一些帮助:

I am new to python & can use some help:

这只是一个示例:

我有一本字典(在列表中重复相同的键值:

I have a dictionary (with same key values repeating inside a list:

list_dummy = [{'a': 1, 'b':"The house is great. I loved it.",'e':"loved,the"}, {'a': 3, 'b': "Building is white in colour. I liked it.",'e':"colour"}, {'a': 5, 'b': "She is looking pretty. She is in my college",'e':"pretty"}]

'b'-由正文组成 'e'-由单词组成(可以不止一个)

'b' - consists of body text 'e' - consists of words(can be more than one)

我想从'b'中提取句子,其中'e'中包含一个或多个单词.

I want to extract sentences out of 'b' which contains either one or more words from 'e' in them.

我需要先通过send_tokenize&将文本拆分为句子比需要提取. Sent_tokenize仅将字符串作为输入.如何进行?

I need to first split the text into sentences by sent_tokenize & than need to extract. Sent_tokenize takes only string as an input. How to proceed?

推荐答案

好吧,我似乎无法让nltk模块正常工作,但是只要sent_tokenize()返回这样的句子字符串列表,我认为应该这样做你希望做的(如果我理解正确的话):

Well I can't seem to get the nltk module working to test but as long as sent_tokenize() returns a list of sentence strings something like this I think should do what you're hoping (if I understood correctly):

ans = []
for d in list_dummy:
    tmp = sent_tokenize(d['b'])
    s = [x for x in tmp if any(w.upper() in x.upper() for w in d['e'].split(","))]
    ans += s

这假定e始终是逗号分隔的列表,并且您对不区分大小写的搜索感兴趣. ans变量只是句子的简单列表,其中包含字典中'e'值中的一个单词.

This assumes that e will always be a comma separated list and that you're interested in case insensitive searching. The ans variable will just be a flat list of sentences that contain a word from the 'e' value in the dictionary.

编辑

如果您更喜欢使用正则表达式,则可以使用re模块:

If you prefer using regular expressions you could use the re module:

import re
ans = []
for d in list_dummy:
    b = sent_tokenize(d['b'])
    e = d['e'].split(",")
    rstring = ".*" + "|".join(e) + ".*"
    r = re.compile(rstring)
    ans.append([x for x in b if r.match(x)])

这篇关于Python-从段落中提取句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆