Node.js + request + for循环:运行两次 [英] Node.js + request + for loop : Runs twice

查看:195
本文介绍了Node.js + request + for循环:运行两次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用cheerio创建了一个简单的刮板,并请求客户端,但是它不能按照我想要的方式工作。

I created a simple scraper using cheerio and request client but it doesn't work the way I want.

首先,我看到所有的 null返回,什么也不做,然后查看名称,所以我认为它首先检查所有返回null的URL,然后再返回非null。

First I see all the "null returned, do nothing" messages on the terminal and then see the names, so I think it first checks all the urls that returns a null, then non-nulls.

我希望它运行以正确的顺序从1到100。

I want it to run in the right order, from 1 to 100.

app.get('/back', function (req, res) {
  for (var y = 1; y < 100; y++) {
    (function () {
      var url = "example.com/person/" + y +;
      var options2 = {
        url: url,
        headers: {
          'User-Agent': req.headers['user-agent'],
          'Content-Type': 'application/json; charset=utf-8'
        }
      };
      request(options2, function (err, resp, body) {
        if (err) {
          console.log(err);
        } else {
          if ($ = cheerio.load(body)) {
            var links = $('#container');
            var name = links.find('span[itemprop="name"]').html(); // name
            if (name == null) {
              console.log("null returned, do nothing");
            } else {
              name = entities.decodeHTML(name);
              console.log(name);
            }
          }
          else {
            console.log("can't open");
          }
        }
      });
    }());
  }
});


推荐答案

如果您未使用诺言而要运行依次请求,那么这是运行顺序异步循环的常见设计模式:

If you are not using promises and you want to run the requests sequentially, then this is a common design pattern for running a sequential async loop:

app.get('/back', function (req, res) {
    var cntr = 1;

    function next() {
        if (cntr < 100) {
            var url = "example.com/person/" + cntr++;
            var options2 = {
                url: url,
                headers: {
                    'User-Agent': req.headers['user-agent'],
                    'Content-Type': 'application/json; charset=utf-8'
                }
            };
            request(options2, function (err, resp, body) {
                if (err) {
                    console.log(err);
                } else {
                    if ($ = cheerio.load(body)) {
                        var links = $('#container');
                        var name = links.find('span[itemprop="name"]').html(); // name
                        if (name == null) {
                            console.log("null returned, do nothing");
                        } else {
                            name = entities.decodeHTML(name);
                            console.log(name);
                        }
                    } else {
                        console.log("can't open");
                    }
                    // do the next iteration
                    next();
                }
            });
        }
    }
    // start the first iteration
    next();
});

如果您要并行发出所有请求(同时在飞行中有多个请求),则将是一个更快的最终结果,然后按顺序累加所有结果,您可以执行以下操作:

If you want to make all the requests in parallel (multiple requests in flight at the same time) which will be a faster end result and then accumulate all the results in order at the end, you can do this:

// create promisified version of request()
function requestPromise(options) {
    return new Promise(function(resolve, reject) {
        request(options2, function (err, resp, body) {
            if (err) return reject(err);
            resolve(body);
        });
    });
}

app.get('/back', function (req, res) {
    var promises = [];
    var headers = {
        'User-Agent': req.headers['user-agent'],
        'Content-Type': 'application/json; charset=utf-8'
    };
    for (var i = 1; i < 100; i++) {
        promises.push(requestPromise({url: "example.com/person/" + i, headers: headers}));
    }
    Promise.all(promises).then(function(data) {
        // iterate through all the data here
        for (var i = 0; i < data.length; i++) {
            if ($ = cheerio.load(data[i])) {
                var links = $('#container');
                var name = links.find('span[itemprop="name"]').html(); // name
                if (name == null) {
                    console.log("null returned, do nothing");
                } else {
                    name = entities.decodeHTML(name);
                    console.log(name);
                }
            } else {
                console.log("can't open");
            }
        }
    }, function(err) {
        // error occurred here
    });

});

这篇关于Node.js + request + for循环:运行两次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆