回调cheerio node.js [英] call back on cheerio node.js

查看:135
本文介绍了回调cheerio node.js的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 request和 cheerio编写剪贴簿。我有100个网址的数组。我遍历数组,在每个URL上使用 request,然后执行cheerio.load(body)。如果我将i增大到3以上(即将其更改为i≤3以进行测试),则刮板将中断,因为var productNumber是未定义的,并且我无法对未定义的变量调用split。我认为for循环正在网页响应之前进行,并且有时间用cheerio加载正文,并且出现了以下问题: 似乎同意。

I'm trying to write a scrapper using 'request' and 'cheerio'. I have an array of 100 urls. I'm looping over the array and using 'request' on each url and then doing cheerio.load(body). If I increase i above 3 (i.e. change it to i < 3 for testing) the scraper breaks because var productNumber is undefined and I can't call split on undefined variable. I think that the for loop is moving on before the webpage responds and has time to load the body with cheerio, and this question: nodeJS - Using a callback function with Cheerio would seem to agree.

我的问题是我不明白如何确保网页已在循环的每次迭代中加载或被解析,以便不会得到任何未定义的变量。根据另一个答案,我不需要回调,但是该怎么办?

My problem is that I don't understand how I can make sure the webpage has 'loaded' or been parsed in each iteration of the loop so that I don't get any undefined variables. According to the other answer I don't need a callback, but then how do I do it?

for (var i = 0; i < productLinks.length; i++) {
    productUrl = productLinks[i];
    request(productUrl, function(err, resp, body) {
        if (err)
            throw err;
        $ = cheerio.load(body);
        var imageUrl = $("#bigImage").attr('src'),
            productNumber = $("#product").attr('class').split(/\s+/)[3].split("_")[1]
        console.log(productNumber);

    });
};

输出示例:

1461536
1499543

TypeError: Cannot call method 'split' of undefined


推荐答案

您正在抓取某些外部站点。您不能确定HTML是否完全适合相同的结构,因此您需要在如何遍历它方面保持防御。

You are scraping some external site(s). You can't be sure the HTML all fits exactly the same structure, so you need to be defensive on how you traverse it.

var product = $('#product');
if (!product) return console.log('Cannot find a product element');
var productClass = product.attr('class');
if (!productClass) return console.log('Product element does not have a class defined');
var productNumber = productClass.split(/\s+/)[3].split("_")[1];
console.log(productNumber);

这将帮助您调试出问题的地方,也许表明您无法抓取您的数据集就如您所愿。

This'll help you debug where things are going wrong, and perhaps indicate that you can't scrape your dataset as easily as you'd hoped.

这篇关于回调cheerio node.js的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆