在python中使用regexp进行Latex命令替换 [英] Latex command substitution using regexp in python
问题描述
我编写了一个非常丑陋的脚本,以便在python中解析某些乳胶行并进行字符串替换.我在这里是因为我想写一些值得骄傲的东西,并学习:P
I wrote a very ugly script in order to parse some rows of latex in python and doing string substitution. I'm here because I'm want to write something to be proud of, and learn :P
更具体地说,我想更改:
More specifically, I'd like to change:
-
\ket{(.*)}
转换为|(.*)\rangle
-
\bra{(.*)}
转换为\langle(*)|
\ket{(.*)}
into|(.*)\rangle
\bra{(.*)}
into\langle(*)|
为此,我编写了一个非常丑陋的脚本.预期的用途是做这样的事情:
To this end, I wrote a very very ugly script. The intended use is to do a thing like this:
cat file.tex | python script.py > new_file.tex
所以我要做的是以下几点.它正在工作,但是一点也不好,我想知道您是否可以给我一个建议,即使是指向正确使用命令的链接都可以.请注意,我之所以进行递归,是因为当我找到第一个"\ ket {"时,我知道我想替换第一个出现的}"(即,我确定"\ ket {"中没有其他子命令).但是再次,这不是解析乳胶的正确方法.
So what I did is the following. It's working, but is not nice at all and I'm wondering if you could give me a suggestion, even a link to the right command to use is ok. Note that I do recursion because when I have found the first "\ket{" i know that I want to replace the first occuring "}" (i.e. I'm sure there are no other subcommands within "\ket{"). But again, it's not the right way of parsing latex.
def recursion_ket(string_input, string_output=""):
match = re.search("\ket{", string_input)
if not match:
return string_input
else:
string_output = re.sub(r"\\ket{", '|', string_input, 1)
string_output_second =re.sub(r"}", "\rangle", stringa_output.split('|', 1)[1], 1)
string_output = string_output.split('|', 1)[0]+string_output_second
string_output=recursion_ket(string_output, string_output)
return string_output
if __name__ == '__main__':
with open(sys.argv[1]) as f:
content=f.readlines()
new=[]
for line in content:
new.append(ricorsione_ket(line))
z=open(sys.argv[2], 'w')
for i in new:
z.write(i.replace("\r", '\\r').replace("\b", '\\b'))
z.write("")
我知道那很丑.这绝对不是正确的方法.可能是因为我来自perl,而且我不习惯使用python regexp.
Which I know is very ugly. And it's definitely not the right way of doing it. Probably it's because I come from perl, and I'm not used to python regexp.
-
第一个问题:是否可以使用regexp替换匹配字符串的边界",而将内部保持原样?我想保留\ command {xxx}的内容.
First problem: is it possible to use regexp to substitute just the "border" of a matching string, and leave the inside as it is? I want to leave the content of \command{xxx} as it is.
第二个问题:\ r.显然,当我尝试在终端或文件中打印每个字符串时,我需要确保\ r不被解释为回车.我尝试使用自动转义,但这不是我所需要的.它用另一个\逃避了\ n,这不是我想要的.
Second problem: the \r. Apparently, when I try to print on the terminal or in a file each string, I need to make sure \r is not interpreted as carriage return. I have tried to use the automatic escape, but it's not what I need. It escapes the \n with another \ and this is not what I want.
推荐答案
要回答您的问题,
- 第一个问题:您可以使用(命名)组
- 第二个问题:在Python3中,您可以使用r"\ btree"优雅地处理反斜杠.
使用类似 github.com/alvinwan/TexSoup 的乳胶解析器,我们可以简化代码少量.我知道OP要求使用正则表达式,但是如果OP不依赖于工具,则解析器会更强大.
Using a latex parser like github.com/alvinwan/TexSoup, we can simplify the code a bit. I know OP has asked for regex, but if OP is tool-agnostic, a parser would be more robust.
我们可以将其抽象为替换函数
We can abstract this into a replace function
def replaceTex(soup, command, replacement):
for node in soup.find_all(command):
node.replace(replacement.format(args=node.args))
然后,按以下方式使用此replaceTex
函数
Then, use this replaceTex
function in the following way
>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> replaceTex('bra', r"|{args[0]}\rangle")
>>> replaceTex('ket', r"\langle{args[0]}|")
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol
演示
这是一个基于TexSoup的独立演示:
Demo
Here's a self-contained demonstration, based on TexSoup:
>>> import TexSoup
>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> soup
\section{hello} text \bra{(.)} haha \ket{(.)}lol
>>> soup.ket.replace(r"|{args[0]}\rangle".format(args=soup.ket.args))
>>> soup.bra.replace(r"\langle{args[0]}|".format(args=soup.bra.args))
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol
这篇关于在python中使用regexp进行Latex命令替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!