libxml2 HTML块解析 [英] libxml2 HTML chunk parsing

查看：86 发布时间：2020/4/30 10:51:45 html c html-parsing libxml2

本文介绍了libxml2 HTML块解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从网站下载HTML.该文件可能很大，因此在下载文件时，我希望已经解析了可用的HTML块，以便该程序的最终用户可以更快地看到该过程.我无法控制组块的生成方式，因此块可以从单词的中间开始，例如像这样:

I'm downloading HTML from a website. The file can be quite large so while the file's downloading, I want to already parse the available chunks of HTML so that the process appears faster for the end-user of my program. I don't have control over how the cunks are generated, so a chunk can begin in the middle of a word, e.g. like so:

chunk 1 --->  <div class="storyti
chunk 2 --->  tle"><a href="htt
chunk 3 --->  p://www.xkcd.com/">XKCD</a>
...and so on.

我看过一个示例，其中使用libxml2完全按照我的描述来解析XML块. libxml2还能解析HTML块吗?我已经整理好要下载的html文件，它报告警告，但没有错误. libxml2也可以解析这些HTML块吗?

I have seen example where libxml2 was used to parse XML chunks exactly how I described. Can libxml2 also parse HTML chunks? I have checked with tidy on the html files I'm going to be downloading, it reports warnings but no errors. Can libxml2 parse those HTML chunks as well?

libxml2 HTML块解析 [英] libxml2 HTML chunk parsing

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

libxml2 HTML块解析 [英] libxml2 HTML chunk parsing

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭