有关抓取网站内容的建议 [英] Advice with crawling web site content

查看：143 发布时间：2019/1/9 20:46:42 java web web-crawler web-scraping jsoup

本文介绍了有关抓取网站内容的建议的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用 jsoup 和java组合抓取部分网站内容。将相关详细信息保存到我的数据库并每天执行相同的活动。

I was trying to crawl some of website content, using jsoup and java combination. Save the relevant details to my database and doing the same activity daily.

但这是交易，当我在浏览器中打开网站时，我得到了html（包含所有元素）那里的标签）。 javascript部分，当我测试它，它工作得很好（我应该用来提取正确的数据）。

But here is the deal, when I open the website in browser I get rendered html (with all element tags out there). The javascript part when I test it, it works just fine (the one which I'm supposed to use to extract the correct data).

但是当我做一个解析/ get with jsoup（来自Java类），只下载初始网站进行解析。这意味着网站有一些动态部分，我想获得这些数据，但由于它们是在网站上异步发布的，我无法用jsoup捕获它。

But when I do a parse/get with jsoup(from Java class), only the initial website is downloaded for parsing. Meaning there are some dynamic parts of a website and I want to get that data but since they're rendered post get, asynchronously on the website I'm unable to capture it with jsoup.

有人知道解决这个问题吗？我使用的是正确的工具集吗？更有经验的人，我提出你的意见。

Does anybody knows a way around this? Am I using the right toolset? more experienced people, I bid your advice.

有关抓取网站内容的建议 [英] Advice with crawling web site content

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

有关抓取网站内容的建议 [英] Advice with crawling web site content

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭