Cheerio,Axios,ReactJS将Web上的表格从返回空列表的网页上抓下 [英] Cheerio, axios, reactjs to web scrape a table off a webpage returning empty list
问题描述
尝试从以下网站上删除此表:
非常感谢您的帮助.
如前所述,所讨论的表通过websocket连接不断更新.您可以尝试通过以下两种方式获取数据:1)连接到Websocket或2)抓取动态生成的html.
仅对于数据快照而不是连续的时间序列,您可以使用浏览器抓取扩展.这样,您将不必在乎websocket的实现.
我已经为您确定了价格数据CSS选择器,并创建了一个抓取配置,可与开源浏览器扩展 https://github.com/get-set-fetch/extension .
" eLtI4gnapZTLDsIgEEV/hejGLrC + F25N3OrCpUlD6FhIWmiY0f6 + 1Hd9EJsuSEguGRg4h8fSlS0Km/r3ZesjHR0g2zrtKzL2IYg1wOqLZ2hEicrSwxhFVOIyjquqGmpzAiRtsqG0RSxv5TVg7EDkvC7AD9etmqJlQBz9ONRW8HvgJ06UwD2HpCV/gtpFylFnC39A/s51A3qphMlg94ruBbtNCe5iMr5/EP/S3ICZf4H5myP/0tv3rSIm/oiQjBmlS0OKS6XzdDCJ9iYQT8PxLBzPw/Ei6rWwpZ0dZ2cMF5M = QUOT;
在扩展名内执行:新项目>配置哈希>粘贴上面的哈希(不带引号)>保存,抓取,查看结果>导出为csv.
免责声明:我是扩展程序作者.
Trying to scrape this table off this website: https://www.investing.com/commodities/real-time-futures
But for some reason when I try to get the data, I keep getting an empty list.
This is what I'm doing to get the data and parse it:
componentDidMount() {
axios.get(`https://www.investing.com/commodities/real-time-futures`)
.then(response => {
if(response.status === 200)
{
const html = response.data;
const $ = cheerio.load(html);
let data = [];
$('#cross_rate_1 tr').each((i, elem) => {
data.push({
Month: $(elem).find('td#left noWrap').text()
})
});
console.log(data);
}
}, (error) => console.log('err') );
}
This is a screenshot of the particular part of the source code I'm trying to scrape.
Any help is much appreciated.
As already mentioned, the table in question is constantly updating via a websocket connection. You can try getting the data by either 1) connecting to the websocket or 2) scraping the dynamically generated html.
Just for a data snapshot and not for a continuous time series, you can use a browser scraping extension. In this way you won't care about the websocket implementation.
I've identified the price data CSS selectors for you and created a scraping configuration to be used with the open source browser extension https://github.com/get-set-fetch/extension.
"eLtI4gnapZTLDsIgEEV/hejGLrC+F25N3OrCpUlD6FhIWmiY0f6+1Hd9EJsuSEguGRg4h8fSlS0Km/r3ZesjHR0g2zrtKzL2IYg1wOqLZ2hEicrSwxhFVOIyjquqGmpzAiRtsqG0RSxv5TVg7EDkvC7AD9etmqJlQBz9ONRW8HvgJ06UwD2HpCV/gtpFylFnC39A/s51A3qphMlg94ruBbtNCe5iMr5/EP/S3ICZf4H5myP/0tv3rSIm/oiQjBmlS0OKS6XzdDCJ9iYQT8PxLBzPw/Ei6rWwpZ0dZ2cMF5M="
Inside the extension do: new project > config hash > paste the above hash (without the quotes) > save, scrape, view results > export as csv.
Disclaimer: I'm the extension author.
这篇关于Cheerio,Axios,ReactJS将Web上的表格从返回空列表的网页上抓下的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!