在python中使用regexp进行Latex命令替换 [英] Latex command substitution using regexp in python

查看:120
本文介绍了在python中使用regexp进行Latex命令替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个非常丑陋的脚本,以便在python中解析某些乳胶行并进行字符串替换.我在这里是因为我想写一些值得骄傲的东西,并学习:P

I wrote a very ugly script in order to parse some rows of latex in python and doing string substitution. I'm here because I'm want to write something to be proud of, and learn :P

更具体地说,我想更改:

More specifically, I'd like to change:

  • \ket{(.*)}转换为|(.*)\rangle
  • \bra{(.*)}转换为\langle(*)|
  • \ket{(.*)} into |(.*)\rangle
  • \bra{(.*)} into \langle(*)|

为此,我编写了一个非常丑陋的脚本.预期的用途是做这样的事情:

To this end, I wrote a very very ugly script. The intended use is to do a thing like this:

cat file.tex | python script.py > new_file.tex

所以我要做的是以下几点.它正在工作,但是一点也不好,我想知道您是否可以给我一个建议,即使是指向正确使用命令的链接都可以.请注意,我之所以进行递归,是因为当我找到第一个"\ ket {"时,我知道我想替换第一个出现的}"(即,我确定"\ ket {"中没有其他子命令).但是再次,这不是解析乳胶的正确方法.

So what I did is the following. It's working, but is not nice at all and I'm wondering if you could give me a suggestion, even a link to the right command to use is ok. Note that I do recursion because when I have found the first "\ket{" i know that I want to replace the first occuring "}" (i.e. I'm sure there are no other subcommands within "\ket{"). But again, it's not the right way of parsing latex.

def recursion_ket(string_input, string_output=""):
    match = re.search("\ket{", string_input)
    if not match:
        return string_input
    else:
        string_output = re.sub(r"\\ket{", '|', string_input, 1)
        string_output_second =re.sub(r"}", "\rangle", stringa_output.split('|', 1)[1],  1)
        string_output = string_output.split('|', 1)[0]+string_output_second
        string_output=recursion_ket(string_output, string_output)
    return string_output

if __name__ == '__main__':
    with open(sys.argv[1]) as f:
        content=f.readlines()
        new=[]
        for line in content:
            new.append(ricorsione_ket(line))
        z=open(sys.argv[2], 'w')
        for i in new:
            z.write(i.replace("\r", '\\r').replace("\b", '\\b'))
            z.write("")

我知道那很丑.这绝对不是正确的方法.可能是因为我来自perl,而且我不习惯使用python regexp.

Which I know is very ugly. And it's definitely not the right way of doing it. Probably it's because I come from perl, and I'm not used to python regexp.

  • 第一个问题:是否可以使用regexp替换匹配字符串的边界",而将内部保持原样?我想保留\ command {xxx}的内容.

  • First problem: is it possible to use regexp to substitute just the "border" of a matching string, and leave the inside as it is? I want to leave the content of \command{xxx} as it is.

第二个问题:\ r.显然,当我尝试在终端或文件中打印每个字符串时,我需要确保\ r不被解释为回车.我尝试使用自动转义,但这不是我所需要的.它用另一个\逃避了\ n,这不是我想要的.

Second problem: the \r. Apparently, when I try to print on the terminal or in a file each string, I need to make sure \r is not interpreted as carriage return. I have tried to use the automatic escape, but it's not what I need. It escapes the \n with another \ and this is not what I want.

推荐答案

要回答您的问题,

  • 第一个问题:您可以使用(命名)组
  • 第二个问题:在Python3中,您可以使用r"\ btree"优雅地处理反斜杠.

使用类似 github.com/alvinwan/TexSoup 的乳胶解析器,我们可以简化代码少量.我知道OP要求使用正则表达式,但是如果OP不依赖于工具,则解析器会更强大.

Using a latex parser like github.com/alvinwan/TexSoup, we can simplify the code a bit. I know OP has asked for regex, but if OP is tool-agnostic, a parser would be more robust.

我们可以将其抽象为替换函数

We can abstract this into a replace function

def replaceTex(soup, command, replacement):
    for node in soup.find_all(command):
        node.replace(replacement.format(args=node.args))

然后,按以下方式使用此replaceTex函数

Then, use this replaceTex function in the following way

>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> replaceTex('bra', r"|{args[0]}\rangle")
>>> replaceTex('ket', r"\langle{args[0]}|")
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol

演示

这是一个基于TexSoup的独立演示:

Demo

Here's a self-contained demonstration, based on TexSoup:

>>> import TexSoup
>>> soup = TexSoup(r"\section{hello} text \bra{(.)} haha \ket{(.)}lol")
>>> soup
\section{hello} text \bra{(.)} haha \ket{(.)}lol
>>> soup.ket.replace(r"|{args[0]}\rangle".format(args=soup.ket.args))
>>> soup.bra.replace(r"\langle{args[0]}|".format(args=soup.bra.args))
>>> soup
\section{hello} text \langle(.)| haha |(.)\ranglelol

这篇关于在python中使用regexp进行Latex命令替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆