通过抓取动态的HtmlUnit网页 [英] crawl dynamic web page using htmlunit
问题描述
我使用的HtmlUnit从动态的网页,它使用无限滚动动态地获取数据,就像Facebook的新闻源抓取数据。我用下面的句子来模拟向下滚动事件:
I am crawling data using HtmlUnit from a dynamic webpage, which uses infinite scrolling to fetch data dynamically, just like facebook's newsfeed. I used the following sentence to simulate the scrolling down event:
webclient.setJavaScriptEnabled(true);
webclient.setAjaxController(new NicelyResynchronizingAjaxController());
ScriptResult sr=myHtmlPage.executeJavaScript("window.scrollBy(0,600)");
webclient.waitForBackgroundJavaScript(10000);
myHtmlPage=(HtmlPage)sr.getNewPage();
但似乎myHtmlPage保持不变的previous之一,也就是说,新的数据是不是在myHtmlPage追加,结果我只能抓取网页上的头几个数据。感谢您的帮助!
But it seems myHtmlPage stays the same with the previous one, i.e., new data is not appended in myHtmlPage, as a result I can only crawl the first few data on the web page. Thanks for your help!
推荐答案
我有类似的问题,其中的内容进行后装在页面滚动。我解决了它使用:
I had similiar problem where the content were post-loaded during page scrolling. I solved it using:
webClient.getCurrentWindow()setInnerHeight(Integer.MAX_VALUE的);
这篇关于通过抓取动态的HtmlUnit网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!