一些帮助在Java中刮取页面 [英] Some help scraping a page in Java
问题描述
我需要使用Java来抓取网页,并且我已经读过,正则表达式是一种非常低效的方法,应该将其放入DOM文档中以导航它。
我试过阅读文档,但它看起来太广泛了,我不知道从哪里开始。
您能告诉我如何刮擦<一个href =http://www.cs.grinnell.edu/~walker/fluency-book/labs/sample-table.html>这个表中的数组?我可以尝试从那里找出我的路。一个片段/例子也可以做得很好。
谢谢。
您可以尝试 jsoup:Java HTML Parser 。这是一个很好的示例代码库。
I need to scrape a web page using Java and I've read that regex is a pretty inefficient way of doing it and one should put it into a DOM Document to navigate it.
I've tried reading the documentation but it seems too extensive and I don't know where to begin.
Could you show me how to scrape this table in to an array? I can try figuring out my way from there. A snippet/example would do just fine too.
Thanks.
You can try jsoup: Java HTML Parser. It is an excellent library with good sample codes.
这篇关于一些帮助在Java中刮取页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!