替换所有的"\"在“< code>"内不是的字符标签 [英] Replace all "\" characters which are not* inside "<code>" tags*

查看：105 发布时间：2020/4/29 3:57:41 php regex latex lookahead lookbehind

本文介绍了替换所有的"\"在“< code>"内*不是*的字符标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

第一件事:都不是此，此，也不此回答了我的问题.因此，我将打开一个新的.

First things first: Neither this, this, this nor this answered my question. So I'll open a new one.

好的，好的.我知道，正则表达式不是解析常规HTML的方法.请注意，创建的文档是使用受限制的受控HTML子集编写的.而且编写文档的人都知道他们在做什么.他们都是IT专业人员！

Okay okay. I know that regexes are not the way to parse general HTML. Please take note that the created documents are written using a limited, controlled HTML subset. And people writing the docs know what they're doing. They are all IT professionals!

鉴于受控语法，可以使用正则表达式解析我在此处的文档.

Given the controlled syntax it is possible to parse the documents I have here using regexes.

我不是要从网络上下载任意文档并对其进行解析！

I am not trying to download arbitrary documents from the web and parse them!

如果解析失败失败，则文档将被编辑，因此它将进行解析.我在这里要解决的问题比这更笼统(即不要替换其他两个模式中的模式).

And if the parsing does fail, the document is edited, so it'll parse. The problem I am addressing here is more general than that (i.e. not replace patterns inside two other patterns).

在我们的办公室中，我们应该漂亮地打印"我们的文档.因此，为什么有人提出将其全部放入Word文档中的原因.幸运的是，到目前为止，我们还没有到那儿.而且，如果我完成此操作，则可能不需要.

In our office we are supposed to "pretty print" our documentation. Hence why some came up with putting it all into Word documents. So far we're thankfully not quite there yet. And, if I get this done, we might not need to.

文档的主要部分存储在TikiWiki数据库中.我创建了一个愚蠢的PHP脚本，该脚本将文档从HTML(通过LaTeX)转换为PDF.所选Wiki系统的必须具有功能之一是WYSIWYG编辑器.正如预期的那样，这使我们留下的文件的DOM形式不那么正式.

The main part of the docs are stored in a TikiWiki database. I've created a daft PHP script which converts the documents from HTML (via LaTeX) to PDF. One of the must have features of the selected Wiki-System was a WYSIWYG editor. Which, as expected leaves us with documents with a less then formal DOM.

因此，我正在使用简单"正则表达式对文档进行音译.到目前为止，一切都正常(大部分)，但是我遇到了一个我自己还没有想到的问题.

Consequently, I am transliterating the document using "simple" regexes. It all works (mostly) fine so far, but I encountered one problem I haven't figured out on my own yet.

某些特殊字符需要替换为LaTeX标记.例如，\字符应替换为 $\backslash$ (除非有人知道其他解决方案?).

Some special characters need to replaced by LaTeX markup. For exaple, the \ character should be replaced by $\backslash$ (unless someone knows another solution?).

除了！

我确实将<code>标签替换为verbatim部分.但是，如果此code块包含反斜杠(Windows文件夹名称就是这种情况)，则脚本仍将替换这些反斜杠.

I do replace <code> tags with verbatim sections. But if this code block contains backslashes (as is the case for Windows folder names), the script still replaces these backslashes.

我认为我可以使用否定的LookBehinds和/或LookAheads解决此问题.但是我的尝试没有用.

I reckon I could solve this using negative LookBehinds and/or LookAheads. But my attempts did not work.

当然，使用真正的解析器会更好.实际上，这是我的脑内路线图"上的内容，但目前目前不在范围之内.该脚本对于我们有限的知识领域来说已经足够好了.创建解析器需要我从头开始.

Granted, I would be better off with a real parser. In fact, it is something on my "in-brain-roadmap", but it is currently out of the scope. The script works well enough for our limited knowledge domain. Creating a parser would require me to start pretty much from scratch.

The Hello \ World document is located in:
<code>C:\documents\hello_world.txt</code>

预期产量

The Hello $\backslash$ World document is located in:
\begin{verbatim}C:\documents\hello_world.txt\end{verbatim}

这是迄今为止我能想到的最好的方法:

This is the best I could come up with so far:

<?php
$patterns = array(
    "special_chars2" => array( '/(?<!<code[^>]*>.*)\\\\[^$](?!.*<\/code>)/U', '$\\backslash$'),
);

foreach( $patterns as $name => $p ){
    $tex_input = preg_replace( $p[0], $p[1], $tex_input );
}
?>

请注意，这只是摘录，而[^$]是另一个LaTeX要求.

Note that this is only an excerpt, and the [^$] is another LaTeX requirement.

似乎起作用的另一种尝试:

Another attempt which seemed to work:

<?php
$patterns = array(
    "special_chars2" => array( '/\\\\[^$](?!.*<\/code>)/U', '$\\backslash$'),
);

foreach( $patterns as $name => $p ){
    $tex_input = preg_replace( $p[0], $p[1], $tex_input );
}
?>

...换句话说:忽略了负面的印象.

... in other words: leaving out the negative lookbehind.

但是，与向后看和向前看相比，这看起来更容易出错.

But this looks more error-prone than with both lookbehind and lookahead.

您可能已经注意到，模式是不贪心的(/.../U).那么，这种匹配在<code>块内只会尽可能少地匹配吗?考虑环顾四周吗?

As you may have noticed, the pattern is ungreedy (/.../U). So will this match only as little possible inside a <code> block? Considering the look-arounds?

替换所有的"\"在“< code>"内不是的字符标签 [英] Replace all "\" characters which are not* inside "<code>" tags*

问题描述

预期产量

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

替换所有的"\"在“&lt; code&gt;"内*不是*的字符标签 [英] Replace all &quot;\&quot; characters which are *not* inside &quot;&lt;code&gt;&quot; tags

问题描述

预期产量

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

替换所有的"\"在“< code>"内不是的字符标签 [英] Replace all "\" characters which are not* inside "<code>" tags*

登录关闭