HTML解析器库对于C [英] HTML Parser Library For C

查看:94
本文介绍了HTML解析器库对于C的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只需要一个建议。我有一个程序,需要有效的HTML,并将其保存到一个文件,我需要一种方法来解析这个HTML文件检索HTML文件中记录的每一个形象。 (例如/foo/bar.jpg)。有,我可以用它来实现这一HTML解析库?

I just need a suggestion. I have a program that takes valid html, and saves it to a file, I need a way to parse this html file to retrieve every image documented within that html file. (e.g. /foo/bar.jpg). Is there a html parsing library that I could use to achieve this?

推荐答案

半答案:有一个的的Java 的解析器称为的 tagsoup ,以只是继续道路行驶',解析与尖括号任何东西,总是产生一组有效的事件给应用程序。

Half an answer: There's a Java parser called Tagsoup which will "Just Keep On Truckin'", parsing anything with angle brackets and always producing a valid set of events to the application.

我提到这一点,因为我知道这个想法的的,关键的是,这个名字已经通过具有相同意向,在其他语言库。我找不到一个C版本的权利,但如果你尝试与起点一些创新的搜索,你可能有更多的运气(一点是,它位于解析器之上的应用程序不必关心在恐怖原始来源,但可以pretend它是格式良好的XML,并做XMLish东西/它)。

I mention this because I know that the idea and, crucially, the name have been adopted by libraries which have the same intention, in other languages. I can't find a C version right now, but you may have more luck if you try some inventive searches with that starting point (the point is that the application which sits atop the parser doesn't have to care about the horrors in the original source, but can pretend that it was well-formed XML, and do XMLish things to/with it).

编辑:哦,还有......还有我们去的 Taggle (C ++,但可能足够接近,而且发帖表明,从Java移植它并不难)

oooh, and ... there we go Taggle (C++, but possibly close enough, and that posting suggests that porting it from Java wasn't hard)

这篇关于HTML解析器库对于C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆