Web Worker限制数量 [英] Number of Web Workers Limit

查看:269
本文介绍了Web Worker限制数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题



我发现浏览器可以生成的Web Workers数量有限制。



示例



主要HTML / JavaScript

 < script type =text / javascript> 
$(document).ready(function(){
var workers = new Array();
var worker_index = 0;
for(var i = 0; i< 25; i ++){
workers [worker_index] = new Worker('test.worker.js');
workers [worker_index] .onmessage = function(event){
$(# debug)。append('worker.onmessage i ='+ event.data +< br>);
};
workers [worker_index] .postMessage(i); //启动工人。

worker_index ++;
}
});
< / head>
< body>
< div id =debug>
< / div>

test.worker.js

  self.onmessage = function(event){
var i = event.data;

self.postMessage(i);
};

使用Firefox时,这将在容器中仅生成20个输出行(版本14.0.1,Windows 7 )。



问题



有解决方法吗?我能想到的唯一两个想法是:



1)雏菊链接网络工作者,即让每个网络工作者产生下一个



示例:

 < script type =text / javascript> 
$(document).ready(function(){
createWorker(0);
});

函数createWorker(i){

var worker = new Worker('test.worker.js');
worker.onmessage = function(event){
var index = event.data;

$(#debug)。append('worker.onmessage i ='+ index +< br>);

if(index< 25){
index ++;
createWorker(index);
}
};
worker.postMessage(i); //启动工人
}
< / script>
< / head>
< body>
< div id =debug>< / div>

2)将网络工作者的数量限制为有限数量并修改我的代码以使用该限制(即,分担有限数量的网络工作者的工作量) - 如下所示: http://www.smartjava.org/content/html5-easily-parallelize-jobs-using-web-workers-and-threadpool



不幸的是,#1似乎不起作用(只有有限数量的网络工作者会在页面加载时产生)。还有其他我应该考虑的解决方案吗?

解决方案

老问题,让我们复活吧! 准备肾上腺素



我一直在寻找使用Web Workers隔离第三方插件,因为网络工作者无法访问主机页面。我会用你的方法帮助你,我相信你现在已经解决了,但这是为了互联网。然后我将从我的研究中提供一些相关信息。



免责声明:在我使用您的代码的示例中,我'修改并清理了代码以提供没有jQuery的完整源代码,以便您和其他人可以轻松地运行它。我还添加了一个计时器,它以ms为单位提醒执行代码的时间。



在所有示例中,我们引用以下 genericWorker.js file。



genericWorker.js

  self.onmessage = function(event){
self.postMessage(event.data);
};



方法1(线性执行)



你的第一种方法几乎正常。它仍然失败的原因是,一旦你完成它们,你就不会删除任何工人。这意味着会发生相同的结果(崩溃),只是更慢。您需要解决的是在创建新工作程序以从内存中删除旧工作程序之前添加 worker.terminate(); 。请注意,这将导致应用程序运行较慢,因为必须在下一个工作程序运行之前创建,运行和销毁每个工作程序。



Linear.html

 <!DOCTYPE html> 
< html>
< head>
< title> Linear< / title>
< / head>
< body>
< pre id =debug>< / pre>
< script type =text / javascript>
var debug = document.getElementById('debug');
var totalWorkers = 250;
var index = 0;
var start =(new Date).getTime();

函数createWorker(){
var worker = new Worker('genericWorker.js');
worker.onmessage = function(event){
debug.appendChild(document.createTextNode('worker.onmessage i ='+ event.data +'\ n'));
worker.terminate();
if(index< totalWorkers)createWorker(index);
else alert((new Date).getTime() - start);
};
worker.postMessage(index ++); //启动工人
}

createWorker();
< / script>
< / body>
< html>



方法2(线程池)



使用线程池应该可以大大提高运行速度。我们不是使用一些带有复杂术语的库,而是简化它。所有线程池均意味着有一定数量的工作程序同时运行。我们实际上只需从线性示例中修改几行代码即可获得多线程示例。下面的代码将找到您拥有的内核数量(如果您的浏览器支持此内容),或者默认为4.我发现此代码的运行速度比我的机器上的原始代码快8倍。



ThreadPool.html

 <!DOCTYPE html> ; 
< html>
< head>
< title>线程池< / title>
< / head>
< body>
< pre id =debug>< / pre>
< script type =text / javascript>
var debug = document.getElementById('debug');
var maxWorkers = navigator.hardwareConcurrency || 4;
var totalWorkers = 250;
var index = 0;
var start =(new Date).getTime();

函数createWorker(){
var worker = new Worker('genericWorker.js');
worker.onmessage = function(event){
debug.appendChild(document.createTextNode('worker.onmessage i ='+ event.data +'\ n'));
worker.terminate();
if(index< totalWorkers)createWorker();
else if( - maxWorkers === 0)alert((new Date).getTime() - start);
};
worker.postMessage(index ++); //启动工人
}

for(var i = 0; i< maxWorkers; i ++)createWorker();
< / script>
< / body>
< html>



其他方法



方法3(单一)工人,重复的任务)



在你的例子中,你一遍又一遍地使用同一个工人。我知道你正在简化一个可能更复杂的用例,但有些人在查看时会看到这个并且当他们只使用一个工作人员执行所有任务时应用这个方法。



基本上,我们将实例化一个worker,发送数据,等待数据,然后重复发送/等待步骤,直到所有数据都被处理完毕。



在我的电脑上,它的运行速度大约是线程池的两倍。那真让我感到惊讶。我认为线程池的开销会导致它比速度的1/2慢。



RepeatedWorker.html

 <!DOCTYPE html> 
< html>
< head>
< title>重复工人< / title>
< / head>
< body>
< pre id =debug>< / pre>
< script type =text / javascript>
var debug = document.getElementById('debug');
var totalWorkers = 250;
var index = 0;
var start =(new Date).getTime();
var worker = new Worker('genericWorker.js');

函数runWorker(){
worker.onmessage = function(event){
debug.appendChild(document.createTextNode('worker.onmessage i ='+ event.data + '\\\
'));
if(index< totalWorkers)runWorker();
else {
alert((new Date).getTime() - start);
worker.terminate();
}
};
worker.postMessage(index ++); //启动工人
}

runWorker();
< / script>
< / body>
< html>



方法4(带线程池的重复工人)



现在,如果我们将前一个方法与线程池方法结合起来怎么办?从理论上讲,它应该比以前更快。有趣的是,它的运行速度与我之前的机器上的运行速度几乎相同。



也许这是每次调用时发送worker引用的额外开销。也许这是额外的工人在执行期间被终止(在我们得到时间之前只有一个工人不会被终止)。谁知道。找到这个是另一个工作。



RepeatedThreadPool.html

 <!DOCTYPE html> 
< html>
< head>
< title>重复的帖子池< / title>
< / head>
< body>
< pre id =debug>< / pre>
< script type =text / javascript>
var debug = document.getElementById('debug');
var maxWorkers = navigator.hardwareConcurrency || 4;
var totalWorkers = 250;
var index = 0;
var start =(new Date).getTime();

函数runWorker(worker){
worker.onmessage = function(event){
debug.appendChild(document.createTextNode('worker.onmessage i ='+ event.data +'\ n'));
if(index< totalWorkers)runWorker(worker);
else {
if( - maxWorkers === 0)alert((new Date).getTime() - start);
worker.terminate();
}
};
worker.postMessage(index ++); //启动工人
}

for(var i = 0; i< maxWorkers; i ++)runWorker(new Worker('genericWorker.js'));
< / script>
< / body>
< html>



现在用于一些现实世界的shtuff



还记得我是怎么说我正在使用工作人员在我的代码中实现第三方插件吗?这些插件具有跟踪状态。我可以启动插件并希望它们不会为应用程序加载太多错误,我可以跟踪主线程中的插件状态并将该状态发送回插件,如果插件需要重新加载。我更喜欢第二个。



我已经写了几个有状态,无状态和状态恢复工作的例子,但我会免除你的痛苦和公正做一些简短的解释和一些较短的片段。



首先,一个简单的有状态工作者看起来像这样:



StatefulWorker.js

  var i = 0; 

self.onmessage = function(e){
switch(e.data){
case'increment':
self.postMessage(++ i);
休息;
case'减量':
self.postMessage( - i);
休息;
}
};

它根据收到的消息执行一些操作并在内部保存数据。这很棒。它允许mah插件开发者完全控制他们的插件。主应用程序实例化他们的插件一次,然后将发送消息给他们做一些动作。



当我们想要一次加载多个插件时出现问题。我们不能这样做,所以我们能做什么?



让我们考虑一些解决方案。



解决方案1(无状态)



让这些插件无状态。基本上,每次我们想让插件执行某些操作时,我们的应用程序应该实例化插件,然后根据其旧状态发送数据。



发送的数据

  {
action:'increment',
value:7
}

StatelessWorker.js

  self.onmessage = function(e){
switch(e.data.action){
case'increration ':
e.data.value ++;
休息;
case'减量':
e.data.value--;
休息;
}
self.postMessage({
value:e.data.value,
i:e.data.i
});
};

这可能有效,但如果我们处理大量数据,这将开始显现就像一个不太完美的解决方案。另一个类似的解决方案可能是为每个插件安装几个较小的工作人员,并且每个插件只发送少量数据,但我对此也感到不安。



解决方案2(状态恢复)



如果我们尽可能长时间地将工作人员留在内存中会怎么样,但如果我们确实丢失了它,我们可以恢复其状态?我们可以使用某种调度程序来查看用户使用的插件(可能还有一些奇特的算法来猜测用户将来会使用什么)并将这些插件保存在内存中。



关于这一点很酷的部分是我们不再看每个核心的一名工人了。由于工作人员处于活动状态的大部分时间都是闲置的,我们只需要担心它占用的内存。对于大量工人(10到20左右),这根本不会很大。我们可以保持主要插件的加载,而不经常使用的插件可以根据需要进行切换。 所有插件仍然需要某种状态恢复。



让我们使用以下工作者并假设我们要么发送'increment','减量' ',或者包含它应该处于的状态的整数。



StateRestoreWorker.js

  var i = 0; 

self.onmessage = function(e){
switch(e.data){
case'increment':
self.postMessage(++ i);
休息;
case'减量':
self.postMessage( - i);
休息;
默认值:
i = e.data;
}
};

这些都是非常简单的例子,但我希望我能帮助理解有效使用多个工人的方法!我很可能正在为这些东西编写一个调度程序和优化器,但谁知道我什么时候能达到这一点。



祝你好运,编码愉快! / p>

PROBLEM

I've discovered that there is a limit on the number of Web Workers that can be spawned by a browser.

Example

main HTML / JavaScript

<script type="text/javascript">
$(document).ready(function(){
    var workers = new Array();
    var worker_index = 0;
    for (var i=0; i < 25; i++) {
        workers[worker_index] = new Worker('test.worker.js');
        workers[worker_index].onmessage = function(event) {
            $("#debug").append('worker.onmessage i = ' + event.data + "<br>");
        };
        workers[worker_index].postMessage(i); // start the worker.      

        worker_index++;
    }   
});
</head>
<body>
<div id="debug">
</div>

test.worker.js

self.onmessage = function(event) {
    var i = event.data; 

    self.postMessage(i);
};

This will generate only 20 output lines in the container when using Firefox (version 14.0.1, Windows 7).

QUESTION

Is there a way around this? The only two ideas I can think of are:

1) Daisy chaining the web workers, i.e., making each web worker spawn the next one

Example:

<script type="text/javascript">
$(document).ready(function(){
    createWorker(0);
});

function createWorker(i) {

    var worker = new Worker('test.worker.js');
    worker.onmessage = function(event) {
        var index = event.data;

        $("#debug").append('worker.onmessage i = ' + index + "<br>");

        if ( index < 25) {
            index++;
            createWorker(index);
        } 
    };
    worker.postMessage(i); // start the worker.
}
</script>
</head>
<body>
<div id="debug"></div>

2) Limit the number of web workers to a finite number and modify my code to work with that limit (i.e., share the work load across a finite number of web workers) - something like this: http://www.smartjava.org/content/html5-easily-parallelize-jobs-using-web-workers-and-threadpool

Unfortunately #1 doesn't seem to work (only a finite number of web workers will get spawned on a page load). Are there any other solutions I should consider?

解决方案

Old question, let's revive it! readies epinephrine

I've been looking into using Web Workers to isolate 3rd party plugins since web workers can't access the host page. I'll help you out with your methods which I'm sure you've solved by now, but this is for teh internetz. Then I'll give some relevant information from my research.

Disclaimer: In the examples that I used your code, I've modified and cleaned the code to provide a full source code without jQuery so that you and others can run it easily. I've also added a timer which alerts the time in ms to execute the code.

In all examples, we reference the following genericWorker.js file.

genericWorker.js

self.onmessage = function(event) {
    self.postMessage(event.data);
};

Method 1 (Linear Execution)

Your first method is nearly working. The reason why it still fails is that you aren't deleting any workers once you finish with them. This means the same result (crashing) will happen, just slower. All you need to fix it is to add worker.terminate(); before creating a new worker to remove the old one from memory. Note that this will cause the application to run much slower as each worker must be created, run, and be destroyed before the next can run.

Linear.html

<!DOCTYPE html>
<html>
<head>
    <title>Linear</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                worker.terminate();
                if (index < totalWorkers) createWorker(index);
                else alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        createWorker();
    </script>
</body>
<html>

Method 2 (Thread Pool)

Using a thread pool should greatly increase running speed. Instead of using some library with complex lingo, lets simplify it. All the thread pool means is having a set number of workers running simultaneously. We can actually just modify a few lines of code from the linear example to get a multi-threaded example. The code below will find how many cores you have (if your browser supports this), or default to 4. I found that this code ran about 6x faster than the original on my machine with 8 cores.

ThreadPool.html

<!DOCTYPE html>
<html>
<head>
    <title>Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                worker.terminate();
                if (index < totalWorkers) createWorker();
                else if(--maxWorkers === 0) alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) createWorker();
    </script>
</body>
<html>

Other Methods

Method 3 (Single worker, repeated task)

In your example, you're using the same worker over and over again. I know you're simplifying a probably more complex use case, but some people viewing will see this and apply this method when they could be using just one worker for all the tasks.

Essentially, we'll instantiate a worker, send data, wait for data, then repeat the send/wait steps until all data has been processed.

On my computer, this runs at about twice the speed of the thread pool. That actually surprised me. I thought the overhead from the thread pool would have caused it to be slower than just 1/2 the speed.

RepeatedWorker.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Worker</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();
        var worker = new Worker('genericWorker.js');

        function runWorker() {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                if (index < totalWorkers) runWorker();
                else {
                    alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        runWorker();
    </script>
</body>
<html>

Method 4 (Repeated Worker w/ Thread Pool)

Now, what if we combine the previous method with the thread pool method? Theoretically, it should run quicker than the previous. Interestingly, it runs at just about the same speed as the previous on my machine.

Maybe it's the extra overhead of sending the worker reference on each time it's called. Maybe it's the extra workers being terminated during execution (only one worker won't be terminated before we get the time). Who knows. Finding this out is a job for another time.

RepeatedThreadPool.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function runWorker(worker) {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                if (index < totalWorkers) runWorker(worker);
                else {
                    if(--maxWorkers === 0) alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) runWorker(new Worker('genericWorker.js'));
    </script>
</body>
<html>

Now for some real world shtuff

Remember how I said I was using workers to implement 3rd party plugins into my code? These plugins have a state to keep track of. I could start the plugins and hope they don't load too many for the application to crash, or I could keep track of the plugin state within my main thread and send that state back to the plugin if the plugin needs to be reloaded. I like the second one better.

I had written out several more examples of stateful, stateless, and state-restore workers, but I'll spare you the agony and just do some brief explaining and some shorter snippets.

First-off, a simple stateful worker looks like this:

StatefulWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
    }
};

It does some action based on the message it receives and holds data internally. This is great. It allows for mah plugin devs to have full control over their plugins. The main app instantiates their plugin once, then will send messages for them to do some action.

The problem comes in when we want to load several plugins at once. We can't do that, so what can we do?

Let's think about a few solutions.

Solution 1 (Stateless)

Let's make these plugins stateless. Essentially, every time we want to have the plugin do something, our application should instantiate the plugin then send it data based on its old state.

data sent

{
    action: 'increment',
    value: 7
}

StatelessWorker.js

self.onmessage = function(e) {
    switch(e.data.action) {
        case 'increment':
            e.data.value++;
            break;
        case 'decrement':
            e.data.value--;
            break;
    }
    self.postMessage({
        value: e.data.value,
        i: e.data.i
    });
};

This could work, but if we're dealing with a good amount of data this will start to seem like a less-than-perfect solution. Another similar solution could be to have several smaller workers for each plugin and sending only a small amount of data to and from each, but I'm uneasy with that too.

Solution 2 (State Restore)

What if we try to keep the worker in memory as long as possible, but if we do lose it, we can restore its state? We can use some sort of scheduler to see what plugins the user has been using (and maybe some fancy algorithms to guess what the user will use in the future) and keep those in memory.

The cool part about this is that we aren't looking at one worker per core anymore. Since most of the time the worker is active will be idle, we just need to worry about the memory it takes up. For a good number of workers (10 to 20 or so), this won't be substantial at all. We can keep the primary plugins loaded while the ones not used as often get switched out as needed. All the plugins will still need some sort of state restore.

Let's use the following worker and assume we either send 'increment', 'decrement', or an integer containing the state it's supposed to be at.

StateRestoreWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
        default:
            i = e.data;
    }
};

These are all pretty simple examples, but I hope I helped understand methods of using multiple workers efficiently! I'll most likely be writing a scheduler and optimizer for this stuff, but who knows when I'll get to that point.

Good luck, and happy coding!

这篇关于Web Worker限制数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆