去除换行符的正则表达式 [英] Regular expression to remove line breaks

查看:123
本文介绍了去除换行符的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Python 的完全新手,我遇到了正则表达式问题.我试图删除文本文件中每一行末尾的换行符,但前提是它跟在小写字母之后,即 [a-z].如果行尾以小写字母结尾,我想用空格替换换行符/换行符.

I am a complete newbie to Python, and I'm stuck with a regex problem. I'm trying to remove the line break character at the end of each line in a text file, but only if it follows a lowercase letter, i.e. [a-z]. If the end of the line ends in a lower case letter, I want to replace the line break/newline character with a space.

这是我目前得到的:

import re
import sys

textout = open("output.txt","w")
textblock = open(sys.argv[1]).read()
textout.write(re.sub("[a-z]\z","[a-z] ", textblock, re.MULTILINE) )
textout.close()

推荐答案

尝试

re.sub(r"(?<=[a-z])\r?\n"," ", textblock)

\Z 只匹配字符串的末尾,在最后一个换行符之后,所以它绝对不是你在这里需要的.\z 不被 Python 正则表达式引擎识别.

\Z only matches at the end of the string, after the last linebreak, so it's definitely not what you need here. \z is not recognized by the Python regex engine.

(?<=[az]) 是一个 正向后视断言,用于检查当前位置之前的字符是否为小写 ASCII 字符.只有这样,正则表达式引擎才会尝试匹配换行符.

(?<=[a-z]) is a positive lookbehind assertion that checks if the character before the current position is a lowercase ASCII character. Only then the regex engine will try to match a line break.

此外,始终使用带有正则表达式的原始字符串.使反斜杠更容易处理.

Also, always use raw strings with regexes. Makes backslashes easier to handle.

这篇关于去除换行符的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆