帮助使用PHP和XPath [英] Help With PHP and XPath

查看：57 发布时间：2021/5/15 18:39:37 php regex xslt xpath html-parsing

本文介绍了帮助使用PHP和XPath的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要在PHP中使用XPath做一些事情的帮助.

I need help doing a few things with XPath in PHP.

使用任何给定的HTML，我需要:

With any given HTML, I need to:

删除所有表及其内容
删除第一个h1标签之后的所有内容
仅保留段落(包括其内部HTML(链接，列表等))

使用正则表达式，我可以使一切正常运行.但是，当我遇到嵌套表时，我认为用正则表达式解析HTML确实是愚蠢的.

With regex, I got everything working perfectly. When I encountered nested tables, however, I decided that it is indeed foolish to parse HTML with regex.

非常感谢！

推荐答案

使用任何给定的HTML，我需要:

With any given HTML, I need to:

•删除所有表及其内容

•删除第一个h1之后的所有内容标记

• Remove everything after the first h1 tag

•仅保留段落(包括他们的内部HTML(链接，列表等)

• Keep only paragraphs (INCLUDING their inner HTML (links, lists, etc))

这可以通过XSLT轻松完成:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:h="http://www.w3.org/1999/xhtml" >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <!-- Copy every node except when overriden
      by another template -->
 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <!-- Remove all tables and their contents -->
 <xsl:template match="h:table"/>

 <!-- Remove everything after the first h1 -->
 <xsl:template match="node()[preceding::h:h1]"/>

 <!-- Keep only paragraphs (INCLUDING
      their inner HTML (links, lists, etc))
  -->
 <xsl:template match=
 "node()[not(self::h:p) and not(ancestor::h:p)]">
  <xsl:apply-templates/>
 </xsl:template>
</xsl:stylesheet>

如果您的元素名称不在XHtml命名空间中，则只需删除上述代码中 h: 的任何出现.

In case your element names are not in the XHtml namespace, simple delete any occurence of h: in the above code.

这篇关于帮助使用PHP和XPath的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

帮助使用PHP和XPath [英] Help With PHP and XPath

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

帮助使用PHP和XPath [英] Help With PHP and XPath

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭