将字符串拆分为具有多个单词边界分隔符的单词 [英] Split Strings into words with multiple word boundary delimiters

查看:68
本文介绍了将字符串拆分为具有多个单词边界分隔符的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为我想做的是一项相当常见的任务,但我在网上找不到任何参考资料.我有带标点符号的文本,我想要一个单词列表.

I think what I want to do is a fairly common task but I've found no reference on the web. I have text with punctuation, and I want a list of the words.

"Hey, you - what are you doing here!?"

应该

['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

但是 Python 的 str.split() 只适用于一个参数,所以在我用空格分割后,所有单词都带有标点符号.有什么想法吗?

But Python's str.split() only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?

推荐答案

正则表达式合理的情况:

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

这篇关于将字符串拆分为具有多个单词边界分隔符的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆