将xmllint和xpath与不太完善的HTML文档一起使用? [英] Using xmllint and xpath with a less-than-perfect HTML document?

查看：87 发布时间：2020/7/15 2:53:01 html xml xpath xmllint

本文介绍了将xmllint和xpath与不太完善的HTML文档一起使用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个由现有工具生成的HTML页面-我无法更改此工具的输出.

I have an HTML page that is generated by an existing tool - I cannot change the output of this tool.

但是，我想将xmllint与--xpath选项一起使用，以从下载的网页中挑选出一些特定的信息.问题是页面以以下内容开头:

However, I want to use xmllint with the --xpath option to pick out a few specific pieces of information from the downloaded webpage. The problem is that the page starts with:

<html lang=en><head>...

xmllint几乎立即引发错误:

html.out:2: parser error : AttValue: " or ' expected
<html lang=en><head>
           ^

当然，问题似乎是在lang属性值周围缺少包围的引号引起的.整个页面充满了此类问题. (尽管只是偶尔出现.)

The issue certainly seems to be the missing enclosing quotation marks around the value of the lang attribute. The entire page is full of this kind of issue. (Though only sporadically.)

几乎每个浏览器都可以很好地解析-我如何说服xmllint这样做呢?我希望避免注入中间步骤来修复"文件.相反，我想:

Nearly every browser can parse this just fine - how can I convince xmllint to do so as well? I would like to avoid having to inject an intermediate step to "fix" the file. Instead, I would like to either:

1)找到有助于解析器的标志，验证选项等，或者:

1) Find a flag, validation option, etc. that helps the parser along, or:

2)使用其他工具. (但是，什么?xmllint始终是命令行XPath命令的首选.)

2) Use some other tool. (But what? xmllint is always my go-to for command line XPath commands.)

进一步，仅使用xpath会导致:

Further, using just xpath results in:

> xpath html.out '//myquery...'

not well-formed (invalid token) at line 2, column 11, ...

将xmllint和xpath与不太完善的HTML文档一起使用? [英] Using xmllint and xpath with a less-than-perfect HTML document?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

将xmllint和xpath与不太完善的HTML文档一起使用? [英] Using xmllint and xpath with a less-than-perfect HTML document?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭