将字符串拆分为列表,保留重音字符和表情,但删除标点符号 [英] Split a string into a list, leaving accented chars and emoticons but removing punctuation

查看:370
本文介绍了将字符串拆分为列表,保留重音字符和表情,但删除标点符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有字符串:

"O João foi almoçar :) ." 

我如何最好地将其拆分成python中的单词列表,如下所示:

how do i best split it into a list of words in python like so:

['O','João', 'foi', 'almoçar', ':)']

?

谢谢:)

索非亚

推荐答案

如果像您的示例一样,标点符号属于其自己的以空格分隔的标记,那么这很容易:

If the punctuation falls into its own space-separated token as with your example, then it's easy:

>>> filter(lambda s: s not in string.punctuation, "O João foi almoçar :) .".split())
['O', 'Jo\xc3\xa3o', 'foi', 'almo\xc3\xa7ar', ':)']

如果不是这种情况,您可以定义一个这样的笑脸字典(您需要添加更多):

If this is not the case, you can define a dictionary of smileys like this (you'll need to add more):

d = { ':)': '<HAPPY_SMILEY>', ':(': '<SAD_SMILEY>'}

,然后将每个笑脸实例替换为不包含标点符号的占位符(我们认为<>不是标点符号):

and then replace each instance of the smiley with the place-holder that doesn't contain punctuation (we'll consider <> not to be punctuation):

for smiley, placeholder in d.iteritems():
    s = s.replace(smiley, placeholder)

哪个让我们进入"O João foi almoçar <HAPPY_SMILEY> .".

然后我们删除标点符号:

We then strip punctuation:

s = ''.join(filter(lambda c: c not in '.,!', list(s)))

哪个给了我们"O João foi almoçar <HAPPY_SMILEY>".

我们确实恢复了笑脸:

for smiley, placeholder in d.iteritems():
    s = s.replace(placeholder, smiley)

然后我们拆分:

s = s.split()

将最终结果提供给我们:['O', 'Jo\xc3\xa3o', 'foi', 'almo\xc3\xa7ar', ':)'].

Giving us our final result: ['O', 'Jo\xc3\xa3o', 'foi', 'almo\xc3\xa7ar', ':)'].

将所有内容放到一个函数中

Putting it all together into a function:

def split_special(s):
    d = { ':)': '<HAPPY_SMILEY>', ':(': '<SAD_SMILEY>'}
    for smiley, placeholder in d.iteritems():
        s = s.replace(smiley, placeholder)
    s = ''.join(filter(lambda c: c not in '.,!', list(s)))
    for smiley, placeholder in d.iteritems():
        s = s.replace(placeholder, smiley)
    return s.split()

这篇关于将字符串拆分为列表,保留重音字符和表情,但删除标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆