Python，从字符串中删除所有非字母字符 [英] Python, remove all non-alphabet chars from string

查看：80 发布时间：2021/6/25 19:57:40 python regex

本文介绍了Python，从字符串中删除所有非字母字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个 Python MapReduce 字数统计程序.问题是数据中散落着许多非字母字符，我发现这篇文章从 Python 中的字符串中去除除字母数字字符之外的所有内容这显示了使用正则表达式的不错解决方案，但我不确定如何实现它

I am writing a python MapReduce word count program. Problem is that there are many non-alphabet chars strewn about in the data, I have found this post Stripping everything but alphanumeric chars from a string in Python which shows a nice solution using regex, but I am not sure how to implement it

def mapfn(k, v):
    print v
    import re, string 
    pattern = re.compile('[\W_]+')
    v = pattern.match(v)
    print v
    for w in v.split():
        yield w, 1

恐怕我不确定如何使用库 re 甚至正则表达式.我不确定如何将正则表达式模式正确应用于传入的字符串(书的行)v 以检索没有任何非字母数字字符的新行.

I'm afraid I am not sure how to use the library re or even regex for that matter. I am not sure how to apply the regex pattern to the incoming string (line of a book) v properly to retrieve the new line without any non-alphanumeric chars.

建议?

推荐答案

使用 re.sub

import re

regex = re.compile('[^a-zA-Z]')
#First parameter is the replacement, second parameter is your input string
regex.sub('', 'ab3d*E')
#Out: 'abdE'

或者，如果您只想删除一组特定的字符(因为在您的输入中使用撇号可能没问题...)

Alternatively, if you only want to remove a certain set of characters (as an apostrophe might be okay in your input...)

regex = re.compile('[,\.!?]') #etc.

这篇关于Python，从字符串中删除所有非字母字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python，从字符串中删除所有非字母字符 [英] Python, remove all non-alphabet chars from string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python，从字符串中删除所有非字母字符 [英] Python, remove all non-alphabet chars from string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭