使用 Python 正则表达式查找/替换文档中的 URL [英] Find/replace an URL in document using Python regex

查看：76 发布时间：2021/10/1 19:58:03 python xml regex

本文介绍了使用 Python 正则表达式查找/替换文档中的 URL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Python 正则表达式专家！我正在尝试更改 xml 文档中的一行.原行是:

Experts of Python regular expressions! I'm trying to change a line in a xml document. The original line is:

<Tag name="low"     Value="%hello%\dir"/>

我想看到的结果是:

<Tag name="low"     Value="C:\art"/>

我失败的直接尝试是:

lines = re.sub("%hello%\dir"", "C:\art"/>

这不起作用.不会更改文档中的任何内容.带有 % 的东西?

This doesn't work. Doesn't change anything in the doc. Something with %?

出于测试目的，我尝试了:

For testing purposes I tried:

lines = re.sub("dir", "C:\art", a)

我得到:

<Tag name="low"     Value="%hello%\C:BELrt"/>

问题在于 \a = BEL.

The problem is that \a = BEL.

我尝试了很多其他的东西，但都无济于事.我该如何解决这个问题?

I've tried a bunch of other things, but to no avail. How do I go about this problem?

推荐答案

这是个好问题.它同时显示了文本表示的三个问题:

It is a good question. It shows three issues with a text representation at once:

'\a' Python 字符串文字是单个 BELL 字符.

'\a' Python string literal is a single BELL character.

要在 Python 源代码中输入反斜杠后跟字母 'a'，您需要使用原始文字:r'\a' 或转义斜杠 '\\a'.


To input backslash followed by letter 'a' in Python source code you need either use raw-literals: r'\a' or escape the slash '\\a'.
r'\d'(两个字符)在解释为正则表达式时具有特殊含义(r'\d' 表示匹配一个数字正则表达式).
r'\d' (two characters) has special meaning when interpreted as a regular expression (r'\d' means match a digit in a regex).
除了 Python 字符串文字的规则外，您还需要转义可能的正则表达式元字符.您可以在一般情况下使用 re.escape(your_string) 或仅使用 r'\\d' 或 '\\\\d'.'\a' 在 repl 部分也应该被转义(在你的情况下两次:r'\\a' 或 '\\\\a'):
In addition to rules for Python string literals you also need to escape possible regex metacharacters. You could use re.escape(your_string) in general case or just r'\\d' or '\\\\d'. '\a' in the repl part should also be escaped (twice in your case: r'\\a' or '\\\\a'):
>>> old, new = r'%hello%\dir', r'C:\art'
>>> print re.sub(re.escape(old), new.encode('string-escape'), xml)
<Tag name="low"     Value="C:\art"/>

顺便说一句，在这种情况下你根本不需要正则表达式:
btw, you don't need regular expressions at all in this case:
>>> print xml.replace(old, new)
<Tag name="low"     Value="C:\art"/>

最后XML 属性值不能包含某些字符 也应该转义，例如 '&'、'"'、"<" 等

at last XML attribute value can't contain certain characters that are also should be escaped e.g., '&', '"', "<", etc.
一般来说，您不应该使用正则表达式来操作 XML.Python 的 stdlib 具有 XML 解析器.
In general you should not use regex to manipulate XML. Python's stdlib has XML parsers.
>>> import xml.etree.cElementTree as etree
>>> xml = r'<Tag name="low"     Value="%hello%\dir"/>'
>>> tag = etree.fromstring(xml)
>>> tag.set('Value', r"C:\art & design")
>>> etree.dump(tag)
<Tag Value="C:\art &amp; design" name="low" />


                        这篇关于使用 Python 正则表达式查找/替换文档中的 URL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用 Python 正则表达式查找/替换文档中的 URL [英] Find/replace an URL in document using Python regex

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Python 正则表达式查找/替换文档中的 URL [英] Find/replace an URL in document using Python regex

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭