从XML中提取HTML时关闭标签 [英] Closing tags when extracting HTML from XML
问题描述
源文件:
<?xml version =1.0encoding =utf-8?>
< html>
< head>
< title>简化示例表格< / title>
< / head>
< body>
< TLA:document xmlns:TLA =http://www.TLA.com>
< TLA:上下文>
< / TLA:上下文>
< table id =table_logostyle =display:inline>
< tr>
< td height =20align =middle>大标题出现在这里< / td>
< / tr>
< tr>
< td align =center>
< img src =logo.jpgborder =0>< / img>
< / td>
< / tr>
< / table>
< TLA:page>
< TLA:question id =q_id_1>
< table id =table_id_1>
< tr>
< td>标签文字在这里< / td>
< td>
< input id =input_id_1type =text>< / input>
< / td>
< / tr>
< / table>
< / TLA:问题>
< / TLA:page>
<! - 重复多次 - >
< / TLA:文件>
< / body>
< / html>
样式表:
< xsl:stylesheet version =1.0xmlns:xsl =http://www.w3.org/1999/XSL/Transform
xmlns:TLA =http:// www .TLA.comexclude-result-prefixes =TLA>
< xsl:output method =htmlindent =yesversion =4.0/>
< xsl:template match =@ * | node()priority = - 2>
< xsl:copy>
< xsl:apply-templates select =@ * | node()/>
< / xsl:copy>
< / xsl:template>
<! - 这个仅用于元素的标识模板可防止将
TLA名称空间声明复制到输出 - >
< xsl:template match =*>
< xsl:element name ={name()}>
< xsl:apply-templates select =@ * | node()/>
< / xsl:element>
< / xsl:template>
<! - 将处理传递给TLA元素的子元素 - >
< xsl:template match =TLA:*>
< xsl:apply-templates select =*/>
< / xsl:template>
< / xsl:stylesheet>
输出:
< HTML>
< head>
< META http-equiv =Content-Typecontent =text / html; charset = utf-8>
< title>简化示例表格< / title>
< / head>
< body>
< table id =table_logostyle =display:inline>
< tr>
< td height =20align =middle>大标题出现在这里< / td>
< / tr>
< tr>
< td align =center>< img src =logo.jpgborder =0>< / td>
< / tr>
< / table>
< table id =table_id_1>
< tr>
< td>标签文字在这里< / td>
< td>< input id =input_id_1type =text>< / td>
< / tr>
< / table>
< / body>
< / html>
然而,meta,img和input元素没有正确关闭。我已经将xsl:output设置为html,并将版本设置为4.0,所以据我所知他们应该输出正确的html。
I猜测在第一个xsl:template / xsl:copy指令中需要进行微妙的更改,但我的xslt技能受到了极大限制。
需要做什么修改让标签正确关闭?
PS我不确定在不同的工具/解析器之间是否存在差异,但我使用Visual Studio 2012来调试样式表,以便我可以看到任何更改的直接影响。
< img>
和< input>
元素不需要关闭—它仍然是有效的HTML。
如果您希望关闭它们,您可以使用 xml
(使用XSLT2.0你可以使用 xhtml
,就我所知)作为输出方法并添加< meta>
如果您需要,可以标记自己。例如:
$ b
样式表
< xsl:stylesheet version = 1.0xmlns:xsl =http://www.w3.org/1999/XSL/Transform
xmlns:TLA =http://www.TLA.comexclude-result-prefixes = TLA>
< xsl:output method =xmlindent =yesomit-xml-declaration =yes/>
< xsl:template match =@ * | node()priority = - 2>
< xsl:copy>
< xsl:apply-templates select =@ * | node()/>
< / xsl:copy>
< / xsl:template>
< xsl:template match =head>
< xsl:copy>
< meta http-equiv =Content-Typecontent =text / html; charset = utf-8/>
< xsl:apply-templates select =@ * | node()/>
< / xsl:copy>
< / xsl:template>
<! - 这个仅用于元素的标识模板可防止将
TLA名称空间声明复制到输出 - >
< xsl:template match =*>
< xsl:element name ={name()}>
< xsl:apply-templates select =@ * | node()/>
< / xsl:element>
< / xsl:template>
<! - 将处理传递给TLA元素的子元素 - >
< xsl:template match =TLA:*>
< xsl:apply-templates select =*/>
< / xsl:template>
< / xsl:stylesheet>
输出
< HTML>
< head>
< meta http-equiv =Content-Typecontent =text / html; charset = utf-8/>
< title>简化示例表格< / title>
< / head>
< body>
< table id =table_logostyle =display:inline>
< tr>
< td height =20align =middle>大标题出现在这里< / td>
< / tr>
< tr>
< td align =center>
< img src =logo.jpgborder =0/>
< / td>
< / tr>
< / table>
< table id =table_id_1>
< tr>
< td>标签文字在这里< / td>
< td>
< input id =input_id_1type =text/>
< / td>
< / tr>
< / table>
< / body>
< / html>
I am transforming a mixed html and xml document using an xslt stylesheet and extracting only the html elements.
Source file:
<?xml version="1.0" encoding="utf-8" ?>
<html >
<head>
<title>Simplified Example Form</title>
</head>
<body>
<TLA:document xmlns:TLA="http://www.TLA.com">
<TLA:contexts>
<TLA:context id="id_1" value=""></TLA:context>
</TLA:contexts>
<table id="table_logo" style="display:inline">
<tr>
<td height="20" align="middle">Big Title Goes Here</td>
</tr>
<tr>
<td align="center">
<img src="logo.jpg" border="0"></img>
</td>
</tr>
</table>
<TLA:page>
<TLA:question id="q_id_1">
<table id="table_id_1">
<tr>
<td>Label text goes here</td>
<td>
<input id="input_id_1" type="text"></input>
</td>
</tr>
</table>
</TLA:question>
</TLA:page>
<!-- Repeat many times -->
</TLA:document>
</body>
</html>
Stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:TLA="http://www.TLA.com" exclude-result-prefixes="TLA">
<xsl:output method="html" indent="yes" version="4.0" />
<xsl:strip-space elements="*" />
<xsl:template match="@*|node()" priority="-2">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- This element-only identity template prevents the
TLA namespace declaration from being copied to the output -->
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="@* | node()" />
</xsl:element>
</xsl:template>
<!-- Pass processing on to child elements of TLA elements -->
<xsl:template match="TLA:*">
<xsl:apply-templates select="*" />
</xsl:template>
</xsl:stylesheet>
Output:
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Simplified Example Form</title>
</head>
<body>
<table id="table_logo" style="display:inline">
<tr>
<td height="20" align="middle">Big Title Goes Here</td>
</tr>
<tr>
<td align="center"><img src="logo.jpg" border="0"></td>
</tr>
</table>
<table id="table_id_1">
<tr>
<td>Label text goes here</td>
<td><input id="input_id_1" type="text"></td>
</tr>
</table>
</body>
</html>
However there's a problem in that the meta, img, and input elements are not being closed correctly. I've set the xsl:output to html and the version to 4.0 so as far as I know they should output correct html.
I'm guessing that there needs to be a subtle change in the first xsl:template/xsl:copy instruction but my xslt skills are highly limited.
What change needs to be made to get the tags to close correctly?
P.S. I'm not sure if there's a difference between different tools/parsers but I'm using Visual Studio 2012 to debug the stylesheet so that I can see the immediate effect of any changes.
The <meta>
, <img>
and <input>
elements don't need to be closed — it's still valid HTML.
If you want to have them closed, you could use xml
(with XSLT2.0 you could use xhtml
, too, as far as I know) as the output method and add the <meta>
tag yourself if you need it. For example:
Stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:TLA="http://www.TLA.com" exclude-result-prefixes="TLA">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*" />
<xsl:template match="@*|node()" priority="-2">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="head">
<xsl:copy>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- This element-only identity template prevents the
TLA namespace declaration from being copied to the output -->
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="@* | node()" />
</xsl:element>
</xsl:template>
<!-- Pass processing on to child elements of TLA elements -->
<xsl:template match="TLA:*">
<xsl:apply-templates select="*" />
</xsl:template>
</xsl:stylesheet>
Output
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>Simplified Example Form</title>
</head>
<body>
<table id="table_logo" style="display:inline">
<tr>
<td height="20" align="middle">Big Title Goes Here</td>
</tr>
<tr>
<td align="center">
<img src="logo.jpg" border="0"/>
</td>
</tr>
</table>
<table id="table_id_1">
<tr>
<td>Label text goes here</td>
<td>
<input id="input_id_1" type="text"/>
</td>
</tr>
</table>
</body>
</html>
这篇关于从XML中提取HTML时关闭标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!