将字符串转换为单词列表? [英] Converting a String to a List of Words?

查看:415
本文介绍了将字符串转换为单词列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python将字符串转换为单词列表.我想采取以下措施:

I'm trying to convert a string to a list of words using python. I want to take something like the following:

string = 'This is a string, with words!'

然后转换为这样的内容:

Then convert to something like this :

list = ['This', 'is', 'a', 'string', 'with', 'words']

请注意省略标点符号和空格.最快的解决方法是什么?

Notice the omission of punctuation and spaces. What would be the fastest way of going about this?

推荐答案

尝试一下:

import re

mystr = 'This is a string, with words!'
wordList = re.sub("[^\w]", " ",  mystr).split()

工作方式:

从文档中:

re.sub(pattern, repl, string, count=0, flags=0)

返回通过用替换repl替换字符串中最左边的非重叠模式所获得的字符串.如果找不到该模式,则字符串将保持不变. repl可以是字符串或函数.

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function.

所以在我们的例子中:

pattern是任何非字母数字字符.

pattern is any non-alphanumeric character.

[\ w]表示任何字母数字字符,并且等于该字符集 [a-zA-Z0-9 _]

[\w] means any alphanumeric character and is equal to the character set [a-zA-Z0-9_]

a到z,A到Z,0到9并加下划线.

a to z, A to Z , 0 to 9 and underscore.

,因此我们匹配任何非字母数字字符并将其替换为空格.

然后我们对它进行split(),它按空格分割字符串并将其转换为列表

and then we split() it which splits string by space and converts it to a list

所以"hello-world"

so 'hello-world'

成为"hello world"

becomes 'hello world'

带有re.sub

然后是['hello','world']

and then ['hello' , 'world']

split()之后

让我知道是否有任何疑问.

let me know if any doubts come up.

这篇关于将字符串转换为单词列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆