将LaTeX保留字符与正则表达式匹配 [英] Match LaTeX reserved characters with regex

查看:237
本文介绍了将LaTeX保留字符与正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个HTML到LaTeX解析器,该解析器专门针对其应做的事情(将HTML的代码片段转换为LaTeX的代码片段),但是填写变量存在一些问题.问题是,应该允许变量 包含LaTeX保留字符(即# $ % ^ & _ { } ~ \).这些必须逃脱,以免杀死我们的LaTeX渲染器.

I have an HTML to LaTeX parser tailored to what it's supposed to do (convert snippets of HTML into snippets of LaTeX), but there is a little issue with filling in variables. The issue is that variables should be allowed to contain the LaTeX reserved characters (namely # $ % ^ & _ { } ~ \). These need to be escaped so that they won't kill our LaTeX renderer.

处理转换的程序,所有内容都是用Python编写的,因此我试图找到一个不错的解决方案.我的第一个想法是简单地执行.replace(),但是replace不允许您仅在第一个不是\时进行匹配.我的第二次尝试是使用正则表达式,但失败了.

The program that handles the conversion and everything is written in Python, so I tried to find a nice solution. My first idea was to simply do a .replace(), but replace doesn't allow you to match only if the first is not a \. My second attempt was a regex, but I failed miserably at that.

我想出的正则表达式是([^\][#\$%\^&_\{\}~\\]).我希望这将与任何保留字符匹配,但前提是它前面没有\.不幸的是,这与我输入文本中的单个字符匹配.我还尝试过此正则表达式的其他变体,但无法使其正常工作.这些变化主要包括在正则表达式的第二部分中删除/添加斜杠.

The regex I came up with is ([^\][#\$%\^&_\{\}~\\]). I hoped that this would match any of the reserved characters, but only if it didn't have a \ in front. Unfortunately, this matches ever single character in my input text. I've also tried different variations on this regex, but I can't get it to work. The variations mainly consisted of removing/adding slashes in the second part of the regex.

任何人都可以使用此正则表达式吗?

Can anyone help with this regex?

编辑糟糕,我似乎也包括了斜线.显示了我发布此内容时的状态:)在我看来,它们不应该逃脱,但是从答案中的正则表达式中删除它们相对容易.谢谢大家!

EDIT Whoops, I seem to have included the slashes as well. Shows how awake I was when I posted this :) They shouldn't be escaped in my case, but it's relatively easy to remove them from the regexes in the answers. Thanks all!

推荐答案

[^\]是除\之外的所有内容的字符类,这就是为什么它匹配所有内容的原因.您想要否定的后向断言:

The [^\] is a character class for anything not a \, that is why it is matching everything. You want a negative lookbehind assertion:

((?<!\)[#\$%\^&_\{\}~\\])

只要...不在(?<!...)前面,它就会与之匹配.您可以在 python文档

(?<!...) will match whatever follows it as long as ... is not in front of it. You can check this out at the python docs

这篇关于将LaTeX保留字符与正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆