匹配“",“",“"的正则表达式.出现在XML节点内的字符 [英] Regular expression to match ">", "<", "&" chars that appear inside XML nodes

查看:142
本文介绍了匹配“",“",“"的正则表达式.出现在XML节点内的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PHP中的PCRE库编写正则表达式.

I'm trying to write a regular expression using the PCRE library in PHP.

我需要一个正则表达式来仅匹配任何XML节点的字符串部分中存在的&><字符,而不是标签声明本身.

I need a regex to match only &, > and < chars that exist within string part of any XML node and not the tag declaration themselves.

输入XML:

<pnode>
  <cnode>This string contains > and < and & chars.</cnode>
</pnode>

这个想法是要搜索并替换这些字符,然后将它们转换为等效的XML实体.

The idea is to to a search and replace these chars and convert them to XML entities equivalents.

如果我要将整个XML转换为实体,则XML将如下所示:

If I was to convert the entire XML to entities the XML would look like this:

整个XML转换为实体

&lt;pnode&gt;
  &lt;cnode&gt;This string contains &gt; and &lt; and &amp; chars.&lt;/cnode&gt;
&lt;/pnode&gt;

我需要它看起来像这样:

I need it to look like this:

更正XML

<pnode>
  <cnode>This string contains &gt; and &lt and &amp; chars.</cnode>
</pnode>

我试图用look-ahaead编写一个正则表达式来匹配这些字符,但是我还不知道如何使它起作用.我的尝试(目前仅尝试匹配>符号):

I have tried to write a regular expression to match these chars using look-ahaead but I don't know enough to get this to work. My attempt (currently only attempting to match > symbols):

/>(?=[^<]*<)/g

只是要弄清楚我要修复的XML来自第三方,他们似乎无法修复它的结尾,因此我试图对其进行修复.

推荐答案

最后,我选择使用整洁库.我使用的代码如下所示:

In the end I've opted to use the Tidy library in PHP. The code I used is shown below:

  // Specify configuration
  $config = array(
    'input-xml'  => true,
    'show-warnings' => false,
    'numeric-entities' => true,
    'output-xml' => true);

  $tidy = new tidy();
  $tidy->parseFile('feed.xml', $config, 'latin1');
  $tidy->cleanRepair()

这可以完美地纠正所有编码错误并将无效字符转换为XML实体.

This works perfectly correcting all the encoding errors and converting invalid characters to XML entities.

这篇关于匹配“",“",“"的正则表达式.出现在XML节点内的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆