使用Python刮擦javascript生成的数据 [英] Scraping javascript-generated data using Python

查看:190
本文介绍了使用Python刮擦javascript生成的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Python抓取以下网址的一些数据。
http://www.hankyung .com / stockplus / main.php?module = stock& mode = stock_analysis_infomation& itemcode = 078340

I want to scrape some data of following url using Python. http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340

这是关于公司信息的摘要。

It's about a summary of company information.

我想要刮的东西没有显示在第一页上。
点击名为재무제표的标签,即可访问财务报表。点击名为현금흐름표的标签,您可以访问现金流。

What I want to scrape is not shown on the first page. By clicking tab named "재무제표", you can access financial statement. And clicking tab named "현금흐름표', you can access "Cash Flow".

我想刮掉现金流数据。

但是,现金流量数据是通过网址上的javascript生成的。
以下链接是隐藏的网址, http://stock.kisline.com/compinfo/financial/main.action ?vhead = N& vfoot = N& vstay =& omit =& vwidth =

However, Cash flow data is generated by javascript across the url. The following link is that url which is hidden, http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=

通过提交一些选项值和cookie生成现金流量数据到这个网址。

Cash flow data is generated by submitting some option value and cookie to this url.

如您所知,第一个链接中的itemcode = 078340表示股票代码,我想要收集现金流量数据的股票数量多达1680个。我希望让它成为一个循环结构。

As you perceived, itemcode=078340 in the first link means stock code and there are as many as 1680 stocks that I want gather cash flow data. I want make it a loop structure.

有没有很好的方法来刮掉现金流量数据?
我试过scrapy但scrapy很难处理我的另一个刮擦代码我已经在使用了。

Is there good way to scrape cash flow data? I tried scrapy but scrapy is difficult to cope with my another scraping code already I'm using.

解决方案

还有 dryscape (我写的一个图书馆,所以这个建议有点偏颇,显然是:)它使用一个快速的基于Webkit的内存浏览器来导航。它也理解Javascript,但比Selenium更轻量级。

There's also dryscape (a library written by me, so the recommendation is a bit biased, obviously :) which uses a fast Webkit-based in-memory browser to navigate around. It understands Javascript, too, but is a lot more lightweight than Selenium.

这篇关于使用Python刮擦javascript生成的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆