将字符串转换为单词列表? [英] Converting a String to a List of Words?
问题描述
我正在尝试使用python将字符串转换为单词列表.我想采取以下措施:
I'm trying to convert a string to a list of words using python. I want to take something like the following:
string = 'This is a string, with words!'
然后转换为这样的内容:
Then convert to something like this :
list = ['This', 'is', 'a', 'string', 'with', 'words']
请注意省略标点符号和空格.最快的解决方法是什么?
Notice the omission of punctuation and spaces. What would be the fastest way of going about this?
推荐答案
尝试一下:
import re
mystr = 'This is a string, with words!'
wordList = re.sub("[^\w]", " ", mystr).split()
工作方式:
从文档中:
re.sub(pattern, repl, string, count=0, flags=0)
返回通过用替换repl替换字符串中最左边的非重叠模式所获得的字符串.如果找不到该模式,则字符串将保持不变. repl可以是字符串或函数.
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function.
所以在我们的例子中:
pattern是任何非字母数字字符.
pattern is any non-alphanumeric character.
[\ w]表示任何字母数字字符,并且等于该字符集 [a-zA-Z0-9 _]
[\w] means any alphanumeric character and is equal to the character set [a-zA-Z0-9_]
a到z,A到Z,0到9并加下划线.
a to z, A to Z , 0 to 9 and underscore.
,因此我们匹配任何非字母数字字符并将其替换为空格.
然后我们对它进行split(),它按空格分割字符串并将其转换为列表
and then we split() it which splits string by space and converts it to a list
所以"hello-world"
so 'hello-world'
成为"hello world"
becomes 'hello world'
带有re.sub
然后是['hello','world']
and then ['hello' , 'world']
split()之后
让我知道是否有任何疑问.
let me know if any doubts come up.
这篇关于将字符串转换为单词列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!