是否有从HTML页面中提取数据的库? [英] Is there a library for extracting data from an HTML page?

查看:125
本文介绍了是否有从HTML页面中提取数据的库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从网页中提取信息。不幸的是,该网站(4chan的)不具有公共API,争取据我所知。

I would like to extract information from a web page. Unfortunately, the website (4chan) doesn't have a public API, for as far as I know.

什么是好的库从一个HTML文档中提取特定的数据?我preFER一个免费的软件库,在UNIX系统上工作。

What is a good library to extract specific data from an HTML document? I prefer a free software library that works on UNIX systems.

编辑:基本上我想从4chan的帖子和图片。该网页是不是有效的HTML(并且不具有的doctype),所以解析器不应该太严格了。

basically I want to get posts and images from 4chan. The webpage isn't valid HTML (and doesn't have a doctype) so the parser shouldn't be too strict.

推荐答案

你所寻找的是一个HTML DOM解析。

What you are looking for is an HTML Dom Parse.

一个previous问题应该帮你出的这个环节。还检查了<一个href=\"http://stackoverflow.com/questions/773340/can-you-provide-an-example-of-parsing-html-with-your-favorite-parser\">this问题

This link of a previous question should help you out. Also check out this question

这篇关于是否有从HTML页面中提取数据的库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆