在HTML页面中获取文本中的特定单词 [英] Get the specific word in text in HTML page
问题描述
如果我有以下HTML页面
If I have the following HTML page
<div>
<p>
Hello world!
</p>
<p> <a href="example.com"> Hello and Hello again this is an example</a></p>
</div>
我想获取特定的单词,例如"hello",并将其更改为"welcome",无论它们在文档中的何处
I want to get the specific word for example 'hello' and change it to 'welcome' wherever they are in the document
您有什么建议吗?无论您使用哪种类型的解析器,我都会很高兴得到您的答案?
Do you have any suggestion? I will be happy to get your answers whatever the type of parser you use?
推荐答案
使用XSLT可以轻松做到这一点.
This is easy to do with XSLT.
XSLT 1.0解决方案:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pTarget" select="'hello'"/>
<xsl:param name="pReplacement" select="'welcome'"/>
<xsl:variable name="vtargetLength" select=
"string-length($pTarget)"/>
<xsl:variable name="vUpper" select=
"'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="vLower" select=
"'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" name="replace">
<xsl:param name="pText" select="."/>
<xsl:variable name="vLowerText" select=
"translate($pText,$vUpper,$vLower)"/>
<xsl:choose>
<xsl:when test=
"not(contains(concat(' ', $vLowerText, ' '),
concat(' ',$pTarget,' ')
)
)">
<xsl:value-of select="$pText"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="vOffset" select=
"string-length(
substring-before(concat(' ', $vLowerText, ' '),
concat(' ', $pTarget,' ')
)
)"/>
<xsl:value-of select="substring($pText, 1, $vOffset)"/>
<xsl:value-of select="$pReplacement"/>
<xsl:call-template name="replace">
<xsl:with-param name="pText" select=
"substring($pText, $vOffset + $vtargetLength+1)"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
何时在提供的XML文档上应用此转换:
<div>
<p>
Hello world!
</p>
<p> <a href="example.com"> Hello and Hello again this is an example</a></p>
</div>
产生了所需的正确结果:
<div>
<p>
welcome world!
</p>
<p>
<a href="example.com"> welcome and welcome again this is an example</a>
</p>
</div>
我的假设是匹配和替换不区分大小写(即"hello"和"heLlo"都应替换为"welcome").如果需要区分大小写的匹配,则可以大大简化转换.
My assumption is that the matching and replacement is case-insensitive (i.e. "hello" and "heLlo" should both be replaced with "welcome"). In case a case-sensitive match is required, the transformation can be considerably simplified.
XSLT 2.0解决方案:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:param name="pTarget" select="'hello'"/>
<xsl:param name="pReplacement" select="'welcome'"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[matches(.,$pTarget, 'i')]">
<xsl:variable name="vEnlargedRep" select=
"replace(concat(' ',.,' '),
concat(' ',$pTarget,' '),
concat(' ',$pReplacement,' '),
'i')"/>
<xsl:variable name="vLen" select="string-length($vEnlargedRep)"/>
<xsl:sequence select=
"substring($vEnlargedRep,2, $vLen -2)"/>
</xsl:template>
</xsl:stylesheet>
当此转换应用于提供的XML文档(如上所示)时,再次产生想要的正确结果:
<div>
<p>
welcome world!
</p>
<p>
<a href="example.com"> welcome and welcome again this is an example</a>
</p>
</div>
说明:使用标准XPath 2.0函数 matches()
和 replace()
指定为第三个参数"i"
-不区分大小写的操作的标志.
Explanation: Use of the standard XPath 2.0 functions matches()
and replace()
specifying as the third argument "i"
-- a flag for case-insensitive operation.
这篇关于在HTML页面中获取文本中的特定单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!