scrapy-splash - IT屋-程序员软件开发技术分享社区

Scrapy+Splash=连接被拒绝

我使用此link安装了Splash。已按照所有步骤进行安装，但Splash不起作用。我的settings.py文件： BOT_NAME = 'Teste' SPIDER_MODULES = ['Test.spiders'] NEWSPIDER_MODULE = 'Test.spiders' DOWNLOADER_MIDDLEWARES = { 'scrapy_splash ..

发布时间：2022-08-02 15:18:51 scrapy web-crawler scrapy-splash splash-js-render 其他开发

Scrapy、Splash和Connection被对方拒绝：10061

我在一个由Java脚本驱动的站点上使用Scrppy和Splash。但是，我不能收到Connection was refused by other side: 10061错误。我得到的日志如下： [scrapy.downloadermiddlewares.retry] DEBUG: Retrying ..

发布时间：2022-07-17 20:40:59 python docker scrapy twisted scrapy-splash Python

Splash不会呈现页面的所有内容

我使用的是Splashv2.3.2，我试图呈现一个page，但它并没有呈现所有内容。它不会呈现图像或动态加载的内容。我正在将http://localhost:8050/与脚本一起使用： function main(splash) local url = splash.args.url assert(splash:go(url)) assert(splash:wait(10 ..

发布时间：2022-04-18 20:09:19 splash-screen scrapy-splash splash-js-render 其他开发

Scrapy Splash Crawler Reator NotRestartable

我已经在Windows 10上使用Visual Studio代码开发了一个SRapy Splash Screper。当我在没有runner.py文件的情况下像这样运行我的刮取器时，它会工作并生成抓取的内容int"；out.json"；：scrapy crawl mytest -o out.json 但是，当我使用runner.py文件在Visual Studio代码中 ..

发布时间：2022-04-18 17:54:32 python scrapy twisted scrapy-splash Python

使用Srapy和Splash跟踪javascript分页

我使用Scrapy和Splash来提取数据。我希望找到一种方法来遵循与javascript供电的分页。URL不会更改，无论您在哪个页面上，它始终是相同的。 Next 我已尝试使用Lua脚本和Splash单击该元素，但不起作用： ..

发布时间：2022-02-22 18:58:25 python scrapy scrapy-splash Python

Scrapy-splash 不允许无限滚动完成

我正在抓取一个二手车经销商网站，该网站在汽车列表页面上有一些 javascript，因此使用了 scrapy-splash. 汽车经销商网页也可以无限滚动，直到列出所有汽车. 我遇到的问题是，在某些情况下，下面的代码不会让无限滚动持续到最后 - 我不知道为什么 - 所以我想念一些汽车. 我在设置文件中将并发请求减少到 1，因此我知道我至少开始在 start_url 页面上报废所有 ..

发布时间：2021-07-17 18:36:58 python web-scraping scrapy scrapy-splash Python

Ubuntu 服务器上的 Scrapy Splash:有一个意外的关键字参数“编码"

我使用的 Scrapy Splash 在我的本地机器上运行良好，但是当我在我的 Ubuntu 服务器上使用它时它返回这个错误.这是为什么?是不是内存不足造成的? 文件“/usr/local/lib64/python2.7/site-packages/twisted/internet/defer.py"，第 1299 行，在 _inlineCallbacks结果 = g.send(result) ..

发布时间：2021-07-17 18:35:52 python web-scraping scrapy scrapy-splash splash-js-render Python

scrapy 飞溅应用程序中的一长串异常

我的 scrapy 应用程序输出了这一长串异常，我没能看出问题是什么，最后一个让我特别困惑. 在我解释为什么这里是链之前: 2020-11-04 17:38:58,394:ERROR:获取启动请求时出错回溯(最近一次调用最后一次):文件“C:\Users\lguarro\Anaconda3\envs\virtual_workspace\lib\site-packages\urllib3\c ..

发布时间：2021-07-16 22:25:02 python scrapy scrapy-splash Python

使用scrapy抓取动态数据

我想使用scrapy(以及其他数据)从纳斯达克网站抓取股票期权链纳斯达克最近更新了他们的网站.这里是我说的网址. 数据不是用普通蜘蛛和scrapy shell加载的.从scrapy docs，我需要使用scrapy-splash 或scrapy-selenium. 在投入时间之前，我想知道哪种工具最适合这份工作.或者还有什么值得推荐的吗? 谢谢！解决方案对于该 ..

发布时间：2021-07-16 22:24:53 python selenium web-scraping scrapy scrapy-splash Python

为什么scrapy-splash 没有发送正确的url?

我正在使用 Splash 来呈现 javascript.但它发送的 URL 不正确.准确地说，它发送前面的 url.看看这段代码. def 解析:splash_args = {'html': 1, 'png': 0}url = 'http://quotes.toscrape.com/js'产量请求(网址，self.parse_result,元={'飞溅':{'端点':'render.html'， ..

发布时间：2021-07-16 22:24:10 python web-scraping scrapy scrapy-splash splash-js-render Python

单页应用程序中分页中的 Python Web Scraping

我目前正在研究如何在单页应用程序 (SPA) 中由 javascript 驱动的分页中使用 python 抓取 Web 内容. 例如，https://angular-8-pagination-example.stackblitz.io/ 我在谷歌上搜索并发现使用 Scrapy 无法抓取 javascript/SPA 驱动的内容.它需要使用飞溅.我是 Scrapy 和 Splash 的新 ..

发布时间：2021-07-16 22:23:18 javascript python scrapy scrapy-splash 前端开发

抓取包含锚标记的网页 <a href = "#>使用scrapy

我正在抓取 manulife 我想转到下一页，当我检查“下一页"时，我得到: Next 应该遵循的正确方法是什么? # -*- 编码:utf-8 -*-导入scrapy导入json从scrapy_splash 导入SplashRequest类宏利(scrapy.Spider):name = ' ..

发布时间：2021-07-16 22:19:16 javascript python web-scraping scrapy scrapy-splash 前端开发

VIEWSTATE、EVENTVALIDATION、__EVENTTARGET 和scrapy & 的问题溅

我如何使用scrapy/splash处理__VIEWSTATE、__EVENTVALIDATION、__EVENTTARGET? 我试过 return FormRequest.from_response(response,[...]'__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(), ..

发布时间：2021-07-16 22:19:13 python web-scraping scrapy scrapy-splash scrapy-shell Python

scrapy-splash 活动内容选择器适用于 shell 但不适用于蜘蛛

我刚开始使用 scrapy-splash 从 opentable.com 检索预订数量.以下在 shell 中工作正常: $scrapy shell 'http://localhost:8050/render.html?url=https://www.opentable.com/new-york-restaurant-listings&timeout=10&wait=0.5'...在 [1] 中 ..

发布时间：2021-07-16 22:17:00 python web-scraping scrapy scrapy-splash splash-js-render Python

Scrapy-splash - lua_script 中的 splash:go(url) 是否再次执行 GET 请求?

我是 Scrapy-splash 的新手，我正在尝试抓取一个懒惰的 datatable，它是一个带有 AJAX 分页的表. 所以我需要加载网站，等到JS被执行，获取表格的html，然后点击分页上的“下一步"按钮. 我的方法有效，但恐怕我请求该网站两次. 第一次当我产生 SplashRequest 然后当 lua_script 被执行. 是真的吗?如果是，如何让它只执行一次请 ..

发布时间：2021-07-16 22:13:01 javascript python scrapy splash-screen scrapy-splash 前端开发

试图伪造和轮换用户代理

我正在尝试伪造用户代理并在 Python 中轮换它们. 我在网上找到了一个关于如何使用 scrapy-useragents 包使用 Scrapy 执行此操作的教程. 我抓取了网页 https://www.whatsmyua.info/，以检查我的用户代理看看它是否与我的不同，以及它是否旋转.它与我的实际用户代理不同吗，但它不会轮换它每次都返回相同的用户代理，我无法弄清楚出了什么问题. s ..

发布时间：2021-07-16 22:10:12 python scrapy user-agent scrapy-splash splash-js-render Python

从 Splash 请求中读取 cookie

我在使用 Splash 发出请求后尝试访问 cookie.以下是我构建请求的方式. script = """功能主(飞溅)飞溅:init_cookies(splash.args.cookies)断言(飞溅:去{飞溅.args.url，标头=splash.args.headers，http_method=splash.args.http_method,body=splash.args.body,} ..

发布时间：2021-07-16 22:08:38 python scrapy scrapy-splash splash-js-render Python

获取脚本标签内的内容

大家好，我正在尝试获取脚本标签内的内容. http://www.teknosa.com/urunler/145051447/samsung-hm1500-bluetooth-kulaklik 这是网站. 这也是我想在里面输入的脚本标签. $.Teknosa.ProductDetail = {"ProductComputedIndex":145051447,"ProductNam ..

发布时间：2021-07-16 22:08:06 javascript scrapy web-crawler scrapy-splash splash-js-render 前端开发

CrawlSpider with Splash 在第一个 URL 后卡住

我正在编写一个爬虫蜘蛛，我需要在其中渲染一些带有飞溅的响应.我的蜘蛛基于 CrawlSpider.我需要呈现我的 start_url 响应来喂养我的爬行蜘蛛.不幸的是，我的爬行蜘蛛在呈现第一个响应后停止了.知道出了什么问题吗? class VideoSpider(CrawlSpider):start_urls = ['https://juke.com/de/de/search?q=1+Mord+ ..

发布时间：2021-07-16 22:08:00 scrapy scrapy-spider scrapy-splash 其他开发

Scrapy Splash 不会执行 lua 脚本

我遇到了 Lua 脚本拒绝执行的问题.从 ScrapyRequest 调用返回的响应似乎是一个 HTML 正文，而我期待一个文档标题.我假设 Lua 脚本永远不会被调用，因为它似乎对响应没有明显影响.我已经通过文档挖掘了很多，似乎无法弄清楚这里缺少什么.有人有什么建议吗? from urlparse import urljoin导入scrapy从scrapy_splash 导入SplashReq ..

发布时间：2021-07-16 22:07:54 scrapy scrapy-splash splash-js-render 其他开发

scrapy-splash相关内容