用下划线代替空格但不是全部 [英] Replace spaces with underscores but not all of them
问题描述
我对 Python 还很陌生,但我们正在清理一些文本文件,除其他外,我需要执行以下操作:用下划线替换空格,但仅限于某些情况.情况是开头用/2
标记,结尾用/1
标记.
I'm pretty new to Python, but we are working on cleaning up some text files, and, among others, I will need to do the following:
Replace spaces with underscores, but only in some cases. The cases are such that the beginning is marked with /2
, and the end is marked with /1
.
例如:
Here is some text, /2This is an example/1 only.
我想把它变成:
Here is some text, This_is_an_example only.
我知道如何进行通用替换(仅使用 python 或使用正则表达式),也知道如何进行匹配所有 /2...../1
表达.但无法弄清楚如何组合这些:仅在找到表达式时才替换,而将其余的文本放在一边.如有任何建议,我将不胜感激!
I know how to do a universal replace (either just with python or with regex), and also know how to do a regex search that would match all the /2...../1
expressions. But cannot figure out how to combine those: to replace ONLY when the expression is found, and leave the rest of the text alone.
I would be very grateful for any suggestions!
人们不断要求我提供代码和/或将我指向基本的 Python 文档.这是一个相对较长的程序,因为我们必须用我们的输入做很多事情,这只是其中之一.这将是一系列查找和替换步骤的一部分;以下是其他一些:
People keep asking for a code I have and/or point me to basic python documentations. It is a relatively long program since we have to do a lot of things with our input, and this is just one of them. It would be part of a series of find and replace steps; here are some others:
for x in handle:
for r in (("^009", ""),("/c", ""),("#", ""),("\@", "")):
x = x.replace(*r)
# get rid of all remaining latex commands
x = re.sub("\\\\[a-z]+", "", x)
x = re.sub("\.h/.*?//", "", x)
# get rid of punctuation
x = re.sub('\.', '', x)
x = re.sub('\,', '', x)
x = re.sub('\;', '', x)
x = re.sub('\n', ' \n', x)
x = re.sub('\|.*?\|', '', x)
x = re.sub('\'', '', x)
x = re.sub('\"', '', x)
# Here's an initial attempt
y = re.findall('\/2.*?\/1', x)
for item in y:
title = re.sub('\s', '_', item)
#but the question is how do I place these things back into x?
s.write(x)
s.close()
handle.close()
编辑 2:这是一个(另一个)不起作用的东西:
Edit 2: Here is a(nother) thing that does NOT work:
for item in re.findall('\/2.*?\/1', x):
item = re.sub('\s', '_', item)
推荐答案
将 re.sub
与 lambda 一起使用:
Use re.sub
with a lambda:
x = re.sub(r'/2.*?/1', lambda x: re.sub(r'\s+', '_', x.group()), x)
匹配 /2
和 /1
之间的所有字符串,并只用嵌套的 re.sub
替换那里的空白字符串.
Match all strings between /2
and /1
and replace whitespace strings only there with the nested re.sub
.
这篇关于用下划线代替空格但不是全部的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!