使用 Python 抓取 JavaScript 生成的数据 [英] Scraping javascript-generated data using Python
问题描述
我想使用 Python 抓取以下 url 的一些数据.http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340
这是关于公司信息的摘要.
我要抓取的内容没有显示在第一页上.通过单击名为재무제표"的选项卡,您可以访问财务报表.然后点击名为현금흐름표"的标签,您可以访问现金流".
我想抓取现金流"数据.
但是,现金流数据是由 javascript 跨 url 生成的.以下链接是隐藏的网址,http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=>
现金流数据是通过向这个 url 提交一些选项值和 cookie 来生成的.
如您所见,第一个链接中的 itemcode=078340 表示股票代码,我想收集现金流数据的股票多达 1680 只.我想让它成为一个循环结构.
有什么好的方法可以抓取现金流数据?我尝试了scrapy,但scrapy 很难处理我已经在使用的另一个抓取代码.
还有 dryscape(一个图书馆由我写的,所以建议有点偏颇,显然:) 它使用基于 Webkit 的快速内存浏览器来导航.它也能理解 Javascript,但比 Selenium 轻得多.
I want to scrape some data of following url using Python. http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340
It's about a summary of company information.
What I want to scrape is not shown on the first page. By clicking tab named "재무제표", you can access financial statement. And clicking tab named "현금흐름표', you can access "Cash Flow".
I want to scrape the "Cash Flow" data.
However, Cash flow data is generated by javascript across the url. The following link is that url which is hidden, http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=
Cash flow data is generated by submitting some option value and cookie to this url.
As you perceived, itemcode=078340 in the first link means stock code and there are as many as 1680 stocks that I want gather cash flow data. I want make it a loop structure.
Is there good way to scrape cash flow data? I tried scrapy but scrapy is difficult to cope with my another scraping code already I'm using.
There's also dryscape (a library written by me, so the recommendation is a bit biased, obviously :) which uses a fast Webkit-based in-memory browser to navigate around. It understands Javascript, too, but is a lot more lightweight than Selenium.
这篇关于使用 Python 抓取 JavaScript 生成的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!