遍历具有可变URL的api get请求 [英] Loop through an api get request with variable URL

查看:77
本文介绍了遍历具有可变URL的api get请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图调用CompaniesHouse API,并获取11月至2月之间注册的公司.我采用的方法是选择一个起始索引(一个在11月注册的公司)和一个停止索引(一个在2月注册的公司),然后遍历以获取在起始索引和停止索引之间注册的公司.像这样:

I am trying to call CompaniesHouse API and fetch companies registered between November and February. The approach I took is to pick a starting index(a company registered in November) and a stop index(a company registered in February) and loop through to get the companies registered between the start and stop index. Like so:

var needle = require("needle");
var startIdx = 11059000;
var stopIdx  = 11211109;
for(idx = startIdx; idx < stopIdx; idx++)
{
    needle('get', "https://api.companieshouse.gov.uk/company/"+idx, { 
       username: key,password:"" 
    })
   .then(function(data) {

   })
  .catch(function(err) {
    console.log('Call the locksmith!' + err)
  })
}

但这不起作用,因为会给出超时或套接字挂起错误.

But this doesn't work as gives either a timeout or socket hangup error.

该API目前处于测试阶段,某些功能尚未实现.

The API is currently in beta and some features are still yet to be implemented.

推荐答案

由于for循环是同步运行的,并且您对needle()的调用是异步的,因此不会阻塞,您最终尝试启动100,000个以上的网络立即提出要求.这使您的本地计算机或目标服务器不堪重负,并且您开始收到套接字错误.

Because the for loop runs synchronously and your calls to needle() are asynchronous and therefore do not block, you end up attempting to start more than 100,000 network requests at once. This overwhelms either your local computer or the target server and you start getting socket errors.

对于这么多请求,您需要一次运行X次,因此一次运行的飞行次数不得超过X次.为了最大程度地提高性能,您将必须确定要使用的X值,因为它取决于目标服务器以及它如何处理大量同时请求.通常,从5开始,然后再从5开始测试更高的值是安全的.

For this many requests, you need to run them X at a time so no more than X are in flight at the same time. To maximize performance, you will have to figure out what value of X you want to use because it will depend upon the target server and how it handles lots of simultaneous requests. It is generally safe to start with a value of 5 and then increase it from there to test higher values.

如果要处理数组,则有许多预先构建的选项可以立即运行X请求.最简单的方法是使用预先建立的并发管理操作,例如Bluebird.或者您可以编写自己的.您可以在此处看到这两个示例:

If you were processing an array, there are a number of pre-built options to run X requests at once. The simplest is to use a pre-built concurrency management operation such as Bluebird. Or you can write your own. You can see examples of both here: Make several requests to an API that can only handle 20 request a minute

但是,由于您不处理数组,而只是为每个连续的请求增加一个数字,因此我找不到能做到这一点的预构建选项.因此,我编写了一个通用目的代码,您可以在其中填写将增加索引的函数:

But, since you are not processing an array, but are just incrementing a number for each successive request, I couldn't find a pre-built option that does that. So, I wrote a general purpose one where you can fill in the function that will increment your index:

// fn gets called on each iteration - must return a promise
// limit is max number of requests to be in flight at once
// cnt is number of times to call fn
// options is optional and can be {continueOnError: true}
// runN returns a promise that resolves with results array.  
// If continueOnError is set, then results array 
// contains error values too (presumed to be instanceof Error so caller can discern
// them from regular values)
function runN(fn, limit, cnt, options = {}) {
    return new Promise((resolve, reject) => {
        let inFlightCntr = 0;
        let results = [];
        let cntr = 0;
        let doneCnt = 0;

        function run() {
            while (inFlightCntr < limit && cntr < cnt) {
                let resultIndex = cntr++;
                ++inFlightCntr;
                fn().then(result => {
                    --inFlightCntr;
                    ++doneCnt;
                    results[resultIndex] = result;
                    run();          // run any more that still need to be run
                }).catch(err => {
                    --inFlightCntr;
                    ++doneCnt;
                    if (options.continueOnError) {
                        // assumes error is instanceof Error so caller can tell the
                        // difference between a genuine result and an error
                        results[resultIndex] = err;       
                        run();          // run any more that still need to be run
                    } else {
                        reject(err);
                    }
                });
            }
            if (doneCnt === cnt) {
                resolve(results);
            }
        }
        run();
    });
}

然后,您可以像这样使用它:

Then, you could use this like this:

const needle = require("needle");
const startIdx = 11059000;
const stopIdx  = 11211109;
const numConcurrent = 5;
let idxCntr = startIdx;

runN(function() {
    let idx = idxCntr++;
    return needle('get', "https://api.companieshouse.gov.uk/company/"+idx, { 
        username: key,password:"" 
    });
}, numConcurrent, stopIdx - startIdx + 1, {continueOnError: true}).then(results => {
    console.log(results);
}).catch(err => {
    console.log(err);
});


为了最大程度地减少内存使用,您可以在调用needle()时使用.then()处理程序,并将响应减少到最终数组中所需要的范围:


To minimize memory use, you can use a .then() handler on your call to needle() and trim down the response to only what you need in the final array:

const needle = require("needle");
const startIdx = 11059000;
const stopIdx  = 11211109;
const numConcurrent = 5;
let idxCntr = startIdx;

runN(function() {
    let idx = idxCntr++;
    return needle('get', "https://api.companieshouse.gov.uk/company/"+idx, { 
        username: key,password:"" 
    }).then(response => {
        // construct the smallest possible response here and then return it
        // to minimize memory use for your 100,000+ requests
        return response.someProperty;
    });
}, numConcurrent, stopIdx - startIdx + 1, {continueOnError: true}).then(results => {
    console.log(results);
}).catch(err => {
    console.log(err);
});

这篇关于遍历具有可变URL的api get请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆