是否有从HTML页面中提取数据的库？ [英] Is there a library for extracting data from an HTML page?

查看：125 发布时间：2016/8/24 14:56:20 c++ html objective-c c data-extraction

本文介绍了是否有从HTML页面中提取数据的库？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我想从网页中提取信息。不幸的是，该网站（4chan的）不具有公共API，争取据我所知。

I would like to extract information from a web page. Unfortunately, the website (4chan) doesn't have a public API, for as far as I know.

什么是好的库从一个HTML文档中提取特定的数据？我preFER一个免费的软件库，在UNIX系统上工作。

What is a good library to extract specific data from an HTML document? I prefer a free software library that works on UNIX systems.

编辑：基本上我想从4chan的帖子和图片。该网页是不是有效的HTML（并且不具有的doctype），所以解析器不应该太严格了。

basically I want to get posts and images from 4chan. The webpage isn't valid HTML (and doesn't have a doctype) so the parser shouldn't be too strict.