循环使用可变 URL 的 api get 请求 [英] Loop through an api get request with variable URL

查看:16
本文介绍了循环使用可变 URL 的 api get 请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试调用 CompaniesHouse API 并获取在 11 月至 2 月期间注册的公司.我采取的方法是选择一个起始索引(11 月注册的公司)和一个停止索引(2 月注册的公司)并循环获取在起始索引和停止索引之间注册的公司.像这样:

I am trying to call CompaniesHouse API and fetch companies registered between November and February. The approach I took is to pick a starting index(a company registered in November) and a stop index(a company registered in February) and loop through to get the companies registered between the start and stop index. Like so:

var needle = require("needle");
var startIdx = 11059000;
var stopIdx  = 11211109;
for(idx = startIdx; idx < stopIdx; idx++)
{
    needle('get', "https://api.companieshouse.gov.uk/company/"+idx, { 
       username: key,password:"" 
    })
   .then(function(data) {

   })
  .catch(function(err) {
    console.log('Call the locksmith!' + err)
  })
}

但这不起作用,因为会出现超时或套接字挂断错误.

But this doesn't work as gives either a timeout or socket hangup error.

该 API 目前处于测试阶段,一些功能仍有待实现.

The API is currently in beta and some features are still yet to be implemented.

推荐答案

因为 for 循环同步运行,而您对 needle() 的调用是异步的,因此不会阻止,您最终会尝试一次启动超过 100,000 个网络请求.这会使您的本地计算机或目标服务器不堪重负,并且您开始收到套接字错误.

Because the for loop runs synchronously and your calls to needle() are asynchronous and therefore do not block, you end up attempting to start more than 100,000 network requests at once. This overwhelms either your local computer or the target server and you start getting socket errors.

对于这么多请求,您需要一次运行 X 次,以便同时执行的请求不超过 X 个.为了最大限度地提高性能,您必须弄清楚要使用的 X 值,因为它取决于目标服务器以及它如何处理大量并发请求.通常可以安全地从 5 开始,然后从那里增加它以测试更高的值.

For this many requests, you need to run them X at a time so no more than X are in flight at the same time. To maximize performance, you will have to figure out what value of X you want to use because it will depend upon the target server and how it handles lots of simultaneous requests. It is generally safe to start with a value of 5 and then increase it from there to test higher values.

如果您正在处理一个数组,则有许多预先构建的选项可以一次运行 X 请求.最简单的是使用预先构建的并发管理操作,例如 Bluebird.或者你可以自己写.您可以在此处查看两者的示例:向一个每分钟只能处理 20 个请求的 API 发出多个请求

If you were processing an array, there are a number of pre-built options to run X requests at once. The simplest is to use a pre-built concurrency management operation such as Bluebird. Or you can write your own. You can see examples of both here: Make several requests to an API that can only handle 20 request a minute

但是,由于您不是在处理数组,而只是为每个连续请求增加一个数字,因此我找不到可以执行此操作的预构建选项.所以,我写了一个通用的,你可以在其中填写增加索引的函数:

But, since you are not processing an array, but are just incrementing a number for each successive request, I couldn't find a pre-built option that does that. So, I wrote a general purpose one where you can fill in the function that will increment your index:

// fn gets called on each iteration - must return a promise
// limit is max number of requests to be in flight at once
// cnt is number of times to call fn
// options is optional and can be {continueOnError: true}
// runN returns a promise that resolves with results array.  
// If continueOnError is set, then results array 
// contains error values too (presumed to be instanceof Error so caller can discern
// them from regular values)
function runN(fn, limit, cnt, options = {}) {
    return new Promise((resolve, reject) => {
        let inFlightCntr = 0;
        let results = [];
        let cntr = 0;
        let doneCnt = 0;

        function run() {
            while (inFlightCntr < limit && cntr < cnt) {
                let resultIndex = cntr++;
                ++inFlightCntr;
                fn().then(result => {
                    --inFlightCntr;
                    ++doneCnt;
                    results[resultIndex] = result;
                    run();          // run any more that still need to be run
                }).catch(err => {
                    --inFlightCntr;
                    ++doneCnt;
                    if (options.continueOnError) {
                        // assumes error is instanceof Error so caller can tell the
                        // difference between a genuine result and an error
                        results[resultIndex] = err;       
                        run();          // run any more that still need to be run
                    } else {
                        reject(err);
                    }
                });
            }
            if (doneCnt === cnt) {
                resolve(results);
            }
        }
        run();
    });
}

然后,您可以像这样使用它:

Then, you could use this like this:

const needle = require("needle");
const startIdx = 11059000;
const stopIdx  = 11211109;
const numConcurrent = 5;
let idxCntr = startIdx;

runN(function() {
    let idx = idxCntr++;
    return needle('get', "https://api.companieshouse.gov.uk/company/"+idx, { 
        username: key,password:"" 
    });
}, numConcurrent, stopIdx - startIdx + 1, {continueOnError: true}).then(results => {
    console.log(results);
}).catch(err => {
    console.log(err);
});

<小时>

为了最大限度地减少内存使用,您可以在调用 needle() 时使用 .then() 处理程序,并将响应缩减为仅在最终数组:


To minimize memory use, you can use a .then() handler on your call to needle() and trim down the response to only what you need in the final array:

const needle = require("needle");
const startIdx = 11059000;
const stopIdx  = 11211109;
const numConcurrent = 5;
let idxCntr = startIdx;

runN(function() {
    let idx = idxCntr++;
    return needle('get', "https://api.companieshouse.gov.uk/company/"+idx, { 
        username: key,password:"" 
    }).then(response => {
        // construct the smallest possible response here and then return it
        // to minimize memory use for your 100,000+ requests
        return response.someProperty;
    });
}, numConcurrent, stopIdx - startIdx + 1, {continueOnError: true}).then(results => {
    console.log(results);
}).catch(err => {
    console.log(err);
});

这篇关于循环使用可变 URL 的 api get 请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆