如何使用javascript抓取弹出窗口中呈现的内容:使用scrapy的链接 [英] How to scrape content rendered in popup window with javascript: links using scrapy

查看：56 发布时间：2021/7/16 22:10:42 python ajax selenium web-scraping scrapy

本文介绍了如何使用javascript抓取弹出窗口中呈现的内容:使用scrapy的链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 scrapy 来获取仅在 javascript: 链接被点击后呈现的内容.由于链接似乎没有遵循系统的编号方案，我不知道如何

I'm trying to use scrapy to get content rendered only after a javascript: link is clicked. As the links don't appear to follow a systematic numbering scheme, I don't know how to

1 - 激活 javascript: 链接以展开折叠面板

1 - activate a javascript: link to expand a collapsed panel

2 - 激活一个(现在可见的)javascript: 链接使弹出窗口被渲染，这样它的内容(摘要)就可以被抓取

2 - activate a (now visible) javascript: link to cause the popup to be rendered so that its content (the abstract) can be scraped

网站 https://b-com.mci-group.com/EventProgramme/EHA19.aspx 包含指向将在我计划参加的会议上展示的摘要的链接.该网站导出为 PDF 有问题，因为它在 PDF 生成时复制了大量数据.我没有处理这个错误，而是转向了scrapy，结果才意识到我已经无法自拔了.我读过:

The site https://b-com.mci-group.com/EventProgramme/EHA19.aspx contains links to abstracts that will be presented at a conference I plan to attend. The site's export to PDF is buggy, in that it duplicates a lot of data at PDF generation time. Rather than dealing with the bug, I turned to scrapy only to realize that I'm in over my head. I've read:

可以scrapy 用于从使用 AJAX 的网站抓取动态内容?

和

如何抓取优惠券网站的优惠券代码(优惠券代码在点击按钮时出现)

但我认为我无法将这些点联系起来.我也看到了对 Selenium 的提及，但我不确定我是否必须诉诸那个.

But I don't think I'm able to connect the dots. I've also seen mentions to Selenium, but am not sure that I must resort to that.

我几乎没有取得什么进展，我想知道我是否可以朝着正确的方向前进，掌握以下信息:

I have made little progress, and wonder if I can get a push in the right direction, with the following information in hand:

为了发出将展开折叠面板(上面的第 1 项)的 POST 请求，我跟踪了页面上的 JS javascript:ShowCollapsiblePanel(116114,1695,44,191);将导致对 TARGETURLOFWEBSITE/EventSessionAjaxService/GetSessionDetailsHtml 的 POST 请求与有效负载:

In order to make the POST request that will expand the collapsed panel (item 1 above) I have a traced that the on-page JS javascript:ShowCollapsiblePanel(116114,1695,44,191); will result in a POST request to TARGETURLOFWEBSITE/EventSessionAjaxService/GetSessionDetailsHtml with payload:

{"eventSessionID":116114,"eventSessionWebSiteSetupViewID":191}

eventSessionID 和 eventSessionWebSiteSetupViewID 的参数在 javascript:ShowCollapsiblePanel 文本中很清楚.

The parameters for eventSessionID and eventSessionWebSiteSetupViewID are clearly in the javascript:ShowCollapsiblePanel text.

如何使用scrapy 遍历javascript:ShowCollapsiblePanel 表单的所有链接?我尝试使用 SgmlLinkExtractor，但没有返回任何 javascript:ShowCollapsiblePanel() 链接 - 我怀疑它们不符合链接"的标准.

How do I use scrapy to iterate over all of the links of form javascript:ShowCollapsiblePanel? I tried to use SgmlLinkExtractor, but that didn't return any of the javascript:ShowCollapsiblePanel() links - I suspect that they don't meet the criteria for "links".

更新

取得进展，我发现 SgmlLinkExtractor 不是正确的方法，而且要简单得多:

Making progress, I've found that SgmlLinkExtractor is not the right way to go, and the much simpler:

sel.xpath('//a[contains(@href, "javascript:ShowCollapsiblePanel")]').re('((\d+)\,(\d+)\,(\d+)\,(\d+)')

在scrapy 控制台中返回每个javascript:ShowCollapsiblePanel() 的所有数字参数(当然，现在它们都在一个长字符串中，但我只是在控制台中乱搞).

in scrapy console returns me all of the numeric parameters for each javascript:ShowCollapsiblePanel() (of course, right now they are all in one long string, but I'm just messing around in the console).

下一步将采用第一个 javascript:ShowCollapsiblePanel() 并生成 POST 请求并分析响应以查看响应是否包含我在浏览器中单击链接时看到的内容.

The next step will be to take the first javascript:ShowCollapsiblePanel() and generate the POST request and analyze the response to see if the response contains what I see when I click the link in the browser.

如何使用javascript抓取弹出窗口中呈现的内容:使用scrapy的链接 [英] How to scrape content rendered in popup window with javascript: links using scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用javascript抓取弹出窗口中呈现的内容:使用scrapy的链接 [英] How to scrape content rendered in popup window with javascript: links using scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭