Java ThreadPool使用 [英] Java ThreadPool usage

查看:204
本文介绍了Java ThreadPool使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要写一个多线程网络爬虫。

I'm trying to write a multithreaded web crawler.

我的主要入门类有以下代码:

My main entry class has the following code:

ExecutorService exec = Executors.newFixedThreadPool(numberOfCrawlers);
while(true){
    URL url = frontier.get();
    if(url == null)
         return;
exec.execute(new URLCrawler(this, url));
}

URLCrawler获取指定的URL,解析HTML提取的链接,

The URLCrawler fetches the specified URL, parses the HTML extracts links from it, and schedules unseen links back to frontier.

前沿是未抓取的网址的队列。问题是如何编写get()方法。
如果队列为空,它应该等待,直到任何URLCrawlers完成,然后重试。
只有当队列为空且没有当前活动的URLCrawler时,它才会返回null。

A frontier is a queue of uncrawled URLs. The problem is how to write the get() method. If the queue is empty, it should wait until any URLCrawlers finish and then try again. It should return null only when the queue is empty and there is no currently active URLCrawler.

我的第一个想法是使用AtomicInteger计算当前数工作URLCrawlers和notifyAll()/ wait()调用的辅助对象。每个爬虫开始增加当前工作的URLCrawlers的数量,并在退出时递减它,并通知对象它已经完成。

My first idea was to use an AtomicInteger for counting current number of working URLCrawlers and an auxiliary object for notifyAll()/wait() calls. Each crawler on start increments the number of current working URLCrawlers, and on exit decrements it, and notify the object that it has completed.

但是我看到notify() notifyAll()和wait()是一些不赞成的方法来执行线程通信。

But I read that notify()/notifyAll() and wait() are somewhat deprecated methods to do thread communication.

我应该在这个工作模式中使用什么?它类似于M生产者和N个消费者,问题是如何处理生产者的exaustion。

What should I use in this work pattern? It is similar to M producers and N consumers, the question is how to deal with exaustion of producers.

推荐答案

在这种情况下等待/通知是合理的。不能想到任何直接的方法来使用juc

在类中,我们调用协调器:

I think use of wait/notify is justified in this case. Can't think of any straight forward way to do this using j.u.c.
In a class, let's call Coordinator:

private final int numOfCrawlers;
private int waiting;

public boolean shouldTryAgain(){
    synchronized(this){
        waiting++;
        if(waiting>=numOfCrawlers){
            //Everybody is waiting, terminate
            return false;
        }else{
            wait();//spurious wake up is okay
            //waked up for whatever reason. Try again
            waiting--;
            return true;
        }
    }

public void hasEnqueued(){
    synchronized(this){
        notifyAll();
    }
} 

那么,

ExecutorService exec = Executors.newFixedThreadPool(numberOfCrawlers);
while(true){
    URL url = frontier.get();
    if(url == null){
        if(!coordinator.shouldTryAgain()){
            //all threads are waiting. No possibility of new jobs.
            return;
        }else{
            //Possible that there are other jobs. Try again
            continue;
        }
    }
    exec.execute(new URLCrawler(this, url));
}//while(true)

这篇关于Java ThreadPool使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆