Cheerio,Axios,ReactJS将Web上的表格从返回空列表的网页上抓下 [英] Cheerio, axios, reactjs to web scrape a table off a webpage returning empty list

查看:46
本文介绍了Cheerio,Axios,ReactJS将Web上的表格从返回空列表的网页上抓下的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试从以下网站上删除此表:

非常感谢您的帮助.

解决方案

如前所述,所讨论的表通过websocket连接不断更新.您可以尝试通过以下两种方式获取数据:1)连接到Websocket或2)抓取动态生成的html.

仅对于数据快照而不是连续的时间序列,您可以使用浏览器抓取扩展.这样,您将不必在乎websocket的实现.

我已经为您确定了价格数据CSS选择器,并创建了一个抓取配置,可与开源浏览器扩展 https://github.com/get-set-fetch/extension .

" eLtI4gnapZTLDsIgEEV/hejGLrC + F25N3OrCpUlD6FhIWmiY0f6 + 1Hd9EJsuSEguGRg4h8fSlS0Km/r3ZesjHR0g2zrtKzL2IYg1wOqLZ2hEicrSwxhFVOIyjquqGmpzAiRtsqG0RSxv5TVg7EDkvC7AD9etmqJlQBz9ONRW8HvgJ06UwD2HpCV/gtpFylFnC39A/s51A3qphMlg94ruBbtNCe5iMr5/EP/S3ICZf4H5myP/0tv3rSIm/oiQjBmlS0OKS6XzdDCJ9iYQT8PxLBzPw/Ei6rWwpZ0dZ2cMF5M = QUOT;

在扩展名内执行:新项目>配置哈希>粘贴上面的哈希(不带引号)>保存,抓取,查看结果>导出为csv.

免责声明:我是扩展程序作者.

Trying to scrape this table off this website: https://www.investing.com/commodities/real-time-futures

But for some reason when I try to get the data, I keep getting an empty list.

This is what I'm doing to get the data and parse it:

componentDidMount() {
    axios.get(`https://www.investing.com/commodities/real-time-futures`)
      .then(response => {
        if(response.status === 200)
          {
            const html = response.data;
            const $ = cheerio.load(html);
            let data = [];
            $('#cross_rate_1 tr').each((i, elem) => {
                data.push({
                  Month: $(elem).find('td#left noWrap').text()
                })
            });
            console.log(data);
          }
        }, (error) => console.log('err') );
  }

This is a screenshot of the particular part of the source code I'm trying to scrape.

Any help is much appreciated.

解决方案

As already mentioned, the table in question is constantly updating via a websocket connection. You can try getting the data by either 1) connecting to the websocket or 2) scraping the dynamically generated html.

Just for a data snapshot and not for a continuous time series, you can use a browser scraping extension. In this way you won't care about the websocket implementation.

I've identified the price data CSS selectors for you and created a scraping configuration to be used with the open source browser extension https://github.com/get-set-fetch/extension.

"eLtI4gnapZTLDsIgEEV/hejGLrC+F25N3OrCpUlD6FhIWmiY0f6+1Hd9EJsuSEguGRg4h8fSlS0Km/r3ZesjHR0g2zrtKzL2IYg1wOqLZ2hEicrSwxhFVOIyjquqGmpzAiRtsqG0RSxv5TVg7EDkvC7AD9etmqJlQBz9ONRW8HvgJ06UwD2HpCV/gtpFylFnC39A/s51A3qphMlg94ruBbtNCe5iMr5/EP/S3ICZf4H5myP/0tv3rSIm/oiQjBmlS0OKS6XzdDCJ9iYQT8PxLBzPw/Ei6rWwpZ0dZ2cMF5M="

Inside the extension do: new project > config hash > paste the above hash (without the quotes) > save, scrape, view results > export as csv.

Disclaimer: I'm the extension author.

这篇关于Cheerio,Axios,ReactJS将Web上的表格从返回空列表的网页上抓下的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆