如何在不向下滚动的情况下强制加载动态内容(使用延迟加载) [英] How Do I Force Loading Of Dynamic Content (That Uses Lazy Loading) Without Scrolling Down

查看:70
本文介绍了如何在不向下滚动的情况下强制加载动态内容(使用延迟加载)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图创建一个小脚本,从特定网站(ilike-photo.com)抓取照片并将其保存在google-drive中。但是这个站点使用了一个叫做延迟加载的东西 - 当用户向下滚动时,会动态加载更多图像。解决这个问题的一种方法是使用window.scroll()直到不再加载内容,然后才抓取图像,但这种技术很慢而且很难看,因为用户实际上看到页面滚动了。有没有办法强制加载动态内容,但保持滚动条(和页面)在顶部?



所有我能想到的是某种伪造方式以某种方式滚动事件,但我不认为这是可能的...

可能有办法找到侦听滚动事件并手动调用它的函数?

我能想到的最后一个选项是使用服务器运行无头浏览器并将服务我的请求,但问题是我没有服务器:)

任何其他建议?

Im trying to create a little script that will grab photos from a specific site ("ilike-photo.com") and save them in google-drive. but this site uses a thing called lazy-loading- more images are dynamically loaded when the user scrolls down. one way to work around this is using window.scroll() until no more content is loaded and only then grab the images, but this technique is slow and ugly because the user actually sees the page getting scrolled. is there a way to force dynamic content to be loaded but keep the scrollbar (and the page) on top?

all i can think of is some way of faking a scroll event somehow but i don't think it's possible...
may be there is a way to find the function that listens to the scroll event and manually call it?
the last option i can think of is using a server that will run an headless browser on it and will serve my request but the problem is that i don't have a server:)
any other suggestions?

推荐答案

问题不是那么简单;如果不看你想要刮去的网站,我认为它无法解决。我不知道你所描述的延迟加载技术究竟是如何实现的,但我确信它可以通过一些不同的方式实现,而这些差异需要不同的抓取方法。差异的一个方面很重要:在所有情况下,滚动会导致一些额外的HTTP请求,并且与滚动事件相关的数据(例如,滚动位置,页面或类似的东西)可以在不同的HTTP请求中传递方式:HTTP参数,URL参数等。



所以,你需要研究这个并采取相应的行动。怎么样?这是我将使用的方法:



使用一些现有的HTTP间谍软件,然后尝试通过加载页面和滚动手动丰富完整内容。此类HTTP间谍工具通常可用作Web浏览器的插件。例如,我使用HttpFox,一个Mozilla浏览器的插件。如果启用了跟踪,它将列出通过浏览器传递的所有HTTP请求和HTTP响应,以及了解如何进行抓取所需的所有详细信息。



-SA
The problem is not so simple; and I don't think it can be solved without looking at the site you are trying to scrape. I have no idea how exactly the lazy loading technique you described is implemented, but I'm sure it can be implemented is some different ways, and those differences would need difference scraping approaches. Only one aspect of the difference is important: in all cases, scrolling causes some additional HTTP requests, and the data related to the scrolling event (say, scrolling position, page, or something like that) can be passed in the HTTP request in different ways: HTTP parameters, URL parameters, etc.

So, you need to study this and act accordingly. How? Here is the approach I would use:

Use some existing HTTP spy software and then try to rich the full content manually, by loading the page and scrolling. Such HTTP spying tools are often available as plug-ins for Web browser. I, for example, use HttpFox, a plug-in for Mozilla browsers. If the tracking is turned on, it will list you all the HTTP requests and HTTP responses passed through the browser, with all the detail needed to understand how to do scraping.

—SA


这篇关于如何在不向下滚动的情况下强制加载动态内容(使用延迟加载)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆