如何使用python从网站上抓取图表? [英] How to scrape charts from a website with python?

查看:419
本文介绍了如何使用python从网站上抓取图表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我将下面的脚本代码保存到了一个文本文件中,但是使用re提取数据仍然无法返回任何信息.我的代码是:

So I have save the script codes below to a text file but using re to extract the data still doesn't return me anything. My code is:

file_object = open('source_test_script.txt', mode="r")
soup = BeautifulSoup(file_object, "html.parser")
pattern = re.compile(r"^var (chart[0-9]+) = new Highcharts.Chart\(({.*?})\);$", re.MULTILINE | re.DOTALL)
scripts = soup.find("script", text=pattern)
profile_text = pattern.search(scripts.text).group(1)
profile = json.loads(profile_text)

print profile["data"], profile["categories"]


我想从网站上提取图表数据.以下是图表的源代码.


I would like to extract the chart's data from a website. The following is the source code of the chart.

  <script type="text/javascript">
    jQuery(function() {

    var chart1 = new Highcharts.Chart({

          chart: {
             renderTo: 'chart1',
              defaultSeriesType: 'column',
            borderWidth: 2
          },
          title: {
             text: 'Productions'
          },
          legend: {
            enabled: false
          },
          xAxis: [{
             categories: [1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016],

          }],
          yAxis: {
             min: 0,
             title: {
             text: 'Productions'
          }
          },

          series: [{
               name: 'Productions',
               data: [1,1,0,1,6,4,9,15,15,19,24,18,53,42,54,53,61,36]
               }]
       });
    });

    </script>

在网站上有一些类似的图表,分别称为"chart1","chart2"等.我想提取以下数据:每个图表的类别线和数据线:

There are several charts like that from the website, called "chart1", "chart2", etc. I would like to extract the following data: the categories line and the data line, for each chart:

categories: [1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016]

data: [1,1,0,1,6,4,9,15,15,19,24,18,53,42,54,53,61,36]

推荐答案

另一种方法是像在控制台中那样使用Highcharts的JavaScript库,并使用Selenium来拉取它.

Another way is to use Highcharts' JavaScript Library as one would in the console and pull that using Selenium.

import time
from selenium import webdriver

website = ""

driver = webdriver.Firefox()
driver.get(website)
time.sleep(5)

temp = driver.execute_script('return window.Highcharts.charts[0]'
                             '.series[0].options.data')
data = [item[1] for item in temp]
print(data)

根据要尝试提取案件的图表和系列,可能会略有不同.

Depending on what chart and series you are trying to pull your case might be slightly different.

这篇关于如何使用python从网站上抓取图表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆