用下划线代替空格但不是全部 [英] Replace spaces with underscores but not all of them

查看:73
本文介绍了用下划线代替空格但不是全部的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Python 还很陌生,但我们正在清理一些文本文件,除其他外,我需要执行以下操作:用下划线替换空格,但仅限于某些情况.情况是开头用/2标记,结尾用/1标记.

I'm pretty new to Python, but we are working on cleaning up some text files, and, among others, I will need to do the following: Replace spaces with underscores, but only in some cases. The cases are such that the beginning is marked with /2, and the end is marked with /1.

例如:

Here is some text, /2This is an example/1 only.

我想把它变成:

Here is some text, This_is_an_example only.

我知道如何进行通用替换(仅使用 python 或使用正则表达式),也知道如何进行匹配所有 /2...../1 表达.但无法弄清楚如何组合这些:仅在找到表达式时才替换,而将其余的文本放在一边.如有任何建议,我将不胜感激!

I know how to do a universal replace (either just with python or with regex), and also know how to do a regex search that would match all the /2...../1 expressions. But cannot figure out how to combine those: to replace ONLY when the expression is found, and leave the rest of the text alone. I would be very grateful for any suggestions!

人们不断要求我提供代码和/或将我指向基本的 Python 文档.这是一个相对较长的程序,因为我们必须用我们的输入做很多事情,这只是其中之一.这将是一系列查找和替换步骤的一部分;以下是其他一些:

People keep asking for a code I have and/or point me to basic python documentations. It is a relatively long program since we have to do a lot of things with our input, and this is just one of them. It would be part of a series of find and replace steps; here are some others:

for x in handle:
    for r in (("^009", ""),("/c", ""),("#", ""),("\@", "")):
        x = x.replace(*r)
        # get rid of all remaining latex commands
    x = re.sub("\\\\[a-z]+", "", x)
    x = re.sub("\.h/.*?//", "", x)
    # get rid of punctuation
    x = re.sub('\.', '', x)
    x = re.sub('\,', '', x)
    x = re.sub('\;', '', x)
    x = re.sub('\n', ' \n', x)
    x = re.sub('\|.*?\|', '', x)
    x = re.sub('\'', '', x)
    x = re.sub('\"', '', x)
    # Here's an initial attempt
    y = re.findall('\/2.*?\/1', x)
    for item in y:
        title = re.sub('\s', '_', item)
#but the question is how do I place these things back into x?
    s.write(x)
s.close()
handle.close()


编辑 2:这是一个(另一个)不起作用的东西:


Edit 2: Here is a(nother) thing that does NOT work:

for item in re.findall('\/2.*?\/1', x):
        item = re.sub('\s', '_', item)

推荐答案

re.sub 与 lambda 一起使用:

Use re.sub with a lambda:

x = re.sub(r'/2.*?/1', lambda x: re.sub(r'\s+', '_', x.group()), x)

匹配 /2/1 之间的所有字符串,并只用嵌套的 re.sub 替换那里的空白字符串.

Match all strings between /2 and /1 and replace whitespace strings only there with the nested re.sub.

这篇关于用下划线代替空格但不是全部的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆