使用 Python 抓取 JavaScript 生成的数据 [英] Scraping javascript-generated data using Python

查看：33 发布时间：2021/12/17 14:02:38 javascript python screen-scraping web-scraping

本文介绍了使用 Python 抓取 JavaScript 生成的数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用 Python 抓取以下 url 的一些数据.http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340

这是关于公司信息的摘要.

我要抓取的内容没有显示在第一页上.通过单击名为재무제표"的选项卡，您可以访问财务报表.然后点击名为현금흐름표"的标签，您可以访问现金流".

我想抓取现金流"数据.

但是，现金流数据是由 javascript 跨 url 生成的.以下链接是隐藏的网址，http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=>

现金流数据是通过向这个 url 提交一些选项值和 cookie 来生成的.

如您所见，第一个链接中的 itemcode=078340 表示股票代码，我想收集现金流数据的股票多达 1680 只.我想让它成为一个循环结构.

有什么好的方法可以抓取现金流数据?我尝试了scrapy，但scrapy 很难处理我已经在使用的另一个抓取代码.

解决方案

还有 dryscape(一个图书馆由我写的，所以建议有点偏颇，显然:) 它使用基于 Webkit 的快速内存浏览器来导航.它也能理解 Javascript，但比 Selenium 轻得多.

I want to scrape some data of following url using Python. http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340

It's about a summary of company information.

What I want to scrape is not shown on the first page. By clicking tab named "재무제표", you can access financial statement. And clicking tab named "현금흐름표', you can access "Cash Flow".

I want to scrape the "Cash Flow" data.

However, Cash flow data is generated by javascript across the url. The following link is that url which is hidden, http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=

Cash flow data is generated by submitting some option value and cookie to this url.

As you perceived, itemcode=078340 in the first link means stock code and there are as many as 1680 stocks that I want gather cash flow data. I want make it a loop structure.

Is there good way to scrape cash flow data? I tried scrapy but scrapy is difficult to cope with my another scraping code already I'm using.

解决方案

There's also dryscape (a library written by me, so the recommendation is a bit biased, obviously :) which uses a fast Webkit-based in-memory browser to navigate around. It understands Javascript, too, but is a lot more lightweight than Selenium.

这篇关于使用 Python 抓取 JavaScript 生成的数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Python 抓取 JavaScript 生成的数据 [英] Scraping javascript-generated data using Python

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用 Python 抓取 JavaScript 生成的数据 [英] Scraping javascript-generated data using Python

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭