在Python中,如何获取字符串文本,并返回包含字符串列表的列表? [英] In Python, how to take in a string text, and returns a list which contains lists of strings?

查看:124
本文介绍了在Python中,如何获取字符串文本,并返回包含字符串列表的列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此函数接收一个字符串文本,并返回一个包含字符串列表的列表,该字符串文本中每个句子的一个列表.

This function takes in a string text, and returns a list which contains lists of strings, one list for each sentence in the string text.

句子之间用字符串.",?"或!"之一分隔.我们忽略了其他标点符号分隔句子的可能性.因此"X先生"将变成2个句子,不"将是两个单词.

Sentences are separated by one of the strings ".", "?", or "!". We ignore the possibility of other punctuation separating sentences. so 'Mr.X' will turn to 2 sentences, and 'don't' will be two words.

例如,文本为

Hello, Jack.  How is it going?  Not bad; pretty good, actually...  Very very
good, in fact.

函数返回:

 ['hello', 'jack'],
 ['how', 'is', 'it', 'going'],
 ['not', 'bad', 'pretty', 'good', 'actually'],
 ['very', 'very', 'good', 'in', 'fact']]

最令人困惑的部分是如何使函数检测字符.!?以及如何使其成为包含每个句子中单词的列表列表.谢谢.

The most confusing part is how to make the function detect the characters , . ! ? and how to make it a list of lists contains words in each sentence. Thank you.

推荐答案

在我看来,这很像是一个作业问题,因此,我将提供一般性提示,而不是确切的代码.

This sounds very much like a homework problem to me, so I'll provide general tips instead of exact code.

一个字符串具有split(char)函数.您可以使用它根据特定字符分割字符串.但是,您将不得不使用循环并执行多次拆分.

a string has the split(char) function on it. You can use this to split your string based on a specific character. However, you will have to use a loop and perform the split multiple times.

您还可以使用正则表达式查找匹配项(这将是更好的解决方案.)这将使您可以立即查找所有匹配项.然后,您将遍历所有匹配项,并根据空格将其吐出,同时去除标点符号.

You could also use a regular expression to find matches (that would be a better solution.) That would let you find all matches at once. Then you would iterate over the matches and spit them based on spaces, while stripping out punctuation.

这是一个可用于一次获取所有句子组的正则表达式示例:

Here's an example of a regular expression you could use to get sentence groups all at once:

\s*([^.?!]+)\s*

括号中的\ s *会导致从结果中删除任何多余的空格,并且括号是捕获组.您可以使用re.findall()获取所有捕获结果的列表,然后可以遍历这些项目,并使用re.split()和一些条件逻辑将所有单词附加到新列表中.

The \s* surrounding the parenthesis causes any extra spaces to be removed from the result, and the parenthesis are a capture group. You can use re.findall() to get a list of all captured results, and then you can loop over these items and use re.split() and some conditional logic to append all the words to a new list.

让我知道您的相处方式,如果还有其他疑问,请向我们提供您到目前为止的代码.

Let me know how you get along with that, and if you have any other questions please provide us the code you have so far.

这篇关于在Python中,如何获取字符串文本,并返回包含字符串列表的列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆