匹配“",“",“"的正则表达式.出现在XML节点内的字符 [英] Regular expression to match ">", "<", "&" chars that appear inside XML nodes
问题描述
我正在尝试使用PHP中的PCRE库编写正则表达式.
I'm trying to write a regular expression using the PCRE library in PHP.
我需要一个正则表达式来仅匹配任何XML节点的字符串部分中存在的&
,>
和<
字符,而不是标签声明本身.
I need a regex to match only &
, >
and <
chars that exist within string part of any XML node and not the tag declaration themselves.
输入XML:
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
这个想法是要搜索并替换这些字符,然后将它们转换为等效的XML实体.
The idea is to to a search and replace these chars and convert them to XML entities equivalents.
如果我要将整个XML转换为实体,则XML将如下所示:
If I was to convert the entire XML to entities the XML would look like this:
整个XML转换为实体
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
我需要它看起来像这样:
I need it to look like this:
更正XML
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
我试图用look-ahaead编写一个正则表达式来匹配这些字符,但是我还不知道如何使它起作用.我的尝试(目前仅尝试匹配>符号):
I have tried to write a regular expression to match these chars using look-ahaead but I don't know enough to get this to work. My attempt (currently only attempting to match > symbols):
/>(?=[^<]*<)/g
只是要弄清楚我要修复的XML来自第三方,他们似乎无法修复它的结尾,因此我试图对其进行修复.
推荐答案
最后,我选择使用整洁库.我使用的代码如下所示:
In the end I've opted to use the Tidy library in PHP. The code I used is shown below:
// Specify configuration
$config = array(
'input-xml' => true,
'show-warnings' => false,
'numeric-entities' => true,
'output-xml' => true);
$tidy = new tidy();
$tidy->parseFile('feed.xml', $config, 'latin1');
$tidy->cleanRepair()
这可以完美地纠正所有编码错误并将无效字符转换为XML实体.
This works perfectly correcting all the encoding errors and converting invalid characters to XML entities.
这篇关于匹配“",“",“"的正则表达式.出现在XML节点内的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!