将LaTeX保留字符与正则表达式匹配 [英] Match LaTeX reserved characters with regex
问题描述
我有一个HTML到LaTeX解析器,该解析器专门针对其应做的事情(将HTML的代码片段转换为LaTeX的代码片段),但是填写变量存在一些问题.问题是,应该允许变量 包含LaTeX保留字符(即# $ % ^ & _ { } ~ \
).这些必须逃脱,以免杀死我们的LaTeX渲染器.
I have an HTML to LaTeX parser tailored to what it's supposed to do (convert snippets of HTML into snippets of LaTeX), but there is a little issue with filling in variables. The issue is that variables should be allowed to contain the LaTeX reserved characters (namely # $ % ^ & _ { } ~ \
). These need to be escaped so that they won't kill our LaTeX renderer.
处理转换的程序,所有内容都是用Python编写的,因此我试图找到一个不错的解决方案.我的第一个想法是简单地执行.replace()
,但是replace不允许您仅在第一个不是\
时进行匹配.我的第二次尝试是使用正则表达式,但失败了.
The program that handles the conversion and everything is written in Python, so I tried to find a nice solution. My first idea was to simply do a .replace()
, but replace doesn't allow you to match only if the first is not a \
. My second attempt was a regex, but I failed miserably at that.
我想出的正则表达式是([^\][#\$%\^&_\{\}~\\])
.我希望这将与任何保留字符匹配,但前提是它前面没有\
.不幸的是,这与我输入文本中的单个字符匹配.我还尝试过此正则表达式的其他变体,但无法使其正常工作.这些变化主要包括在正则表达式的第二部分中删除/添加斜杠.
The regex I came up with is ([^\][#\$%\^&_\{\}~\\])
. I hoped that this would match any of the reserved characters, but only if it didn't have a \
in front. Unfortunately, this matches ever single character in my input text. I've also tried different variations on this regex, but I can't get it to work. The variations mainly consisted of removing/adding slashes in the second part of the regex.
任何人都可以使用此正则表达式吗?
Can anyone help with this regex?
编辑糟糕,我似乎也包括了斜线.显示了我发布此内容时的状态:)在我看来,它们不应该逃脱,但是从答案中的正则表达式中删除它们相对容易.谢谢大家!
EDIT Whoops, I seem to have included the slashes as well. Shows how awake I was when I posted this :) They shouldn't be escaped in my case, but it's relatively easy to remove them from the regexes in the answers. Thanks all!
推荐答案
[^\]
是除\
之外的所有内容的字符类,这就是为什么它匹配所有内容的原因.您想要否定的后向断言:
The [^\]
is a character class for anything not a \
, that is why it is matching everything. You want a negative lookbehind assertion:
((?<!\)[#\$%\^&_\{\}~\\])
只要...
不在
(?<!...)
will match whatever follows it as long as ...
is not in front of it. You can check this out at the python docs
这篇关于将LaTeX保留字符与正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!