多线程搜索操作 [英] Multithreaded search operation

查看:96
本文介绍了多线程搜索操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个方法,它需要一个查询数组,我需要对不同的搜索引擎Web API,如谷歌或雅虎的运行它们。为了并行化进程,为每个查询生成一个线程,然后 join ed,因为我的申请只能在之后继续具有每个查询的结果。我现在有这样的行:

I have a method that takes an array of queries, and I need to run them against different search engine Web API's, such as Google's or Yahoo's. In order to parallelize the process, a thread is spawned for each query, which are then joined at the end, since my application can only continue after I have the results of every query. I currently have something along these lines:

public abstract class class Query extends Thread {
    private String query;

    public abstract Result[] querySearchEngine();
    @Override
    public void run() {
        Result[] results = querySearchEngine(query);
        Querier.addResults(results);
    }

}

public class GoogleQuery extends Query {
    public Result querySearchEngine(String query) { 
        // access google rest API
    }
}

public class Querier {
    /* Every class that implements Query fills this array */
    private static ArrayList<Result> aggregatedResults;

    public static void addResults(Result[]) { // add to aggregatedResults }

    public static Result[] queryAll(Query[] queries) {
        /* for each thread, start it, to aggregate results */
        for (Query query : queries) {
            query.start();
        }
        for (Query query : queries) {
            query.join();
        }
        return aggregatedResults;
    }
}

最近我发现有一个 Java中的新 API用于执行并行作业。也就是说, Callable 接口, FutureTask ExecutorService 。我想知道这个新的API是否应该使用,如果它们比传统的更有效, Runnable Thread

Recently, I have found that there's a new API in Java for doing concurrent jobs. Namely, the Callable interface, FutureTask and ExecutorService. I was wondering if this new API is the one that should be used, and if they are more efficient than the traditional ones, Runnable and Thread.

学习这个新的API后,我想出了以下代码(简化版):

After studying this new API, I came up with the following code (simplified version):

   public abstract class Query implements Callable<Result[]> {
        private final String query; // gets set in the constructor

        public abstract Result[] querySearchEngine();
        @Override
        public Result[] call() {
            return querySearchEngine(query);
        }
    }

public class Querier {   
        private ArrayList<Result> aggregatedResults;

        public Result[] queryAll(Query[] queries) {
            List<Future<Result[]>> futures = new ArrayList<Future<Result[]>>(queries.length);
            final ExecutorService service = Executors.newFixedThreadPool(queries.length);  
            for (Query query : queries) {
                futures.add(service.submit(query));  
            }
            for (Future<Result[]> future : futures) {  
                aggregatedResults.add(future.get());  // get() is somewhat similar to join?
            }  
            return aggregatedResults;
        }
    }

我是这个并发API的新手,想知道在上述代码中是否可以改进,如果它比第一个选项(使用 Thread )更好。有一些类,我没有探索,例如 FutureTask 等等。

I'm new to this concurrency API, and I'd like to know if there's something that can be improved in the above code, and if it's better than the first option (using Thread). There are some classes which I didn't explore, such as FutureTask, et cetera. I'd love to hear any advice on that as well.

推荐答案

您的代码有几个问题。


  1. 您可能应该使用ExecutorService.invokeAll()方法。
    创建新线程和新线程池的成本可能很大(虽然可能与调用外部搜索引擎相比)。

  2. 您可能不想混合数组和泛型。

  3. 您正在调用aggregatedResults.add ()而不是addAll()。

  4. 在queryAll()函数调用时,您不需要使用成员变量。

  1. You should probably be using the ExecutorService.invokeAll() method. The cost of creating new threads and a new thread pool can be significant (though maybe not compared to calling external search engines). invokeAll() can manage the threads for you.
  2. You probably don't want to mix arrays and generics.
  3. You are calling aggregatedResults.add() instead of addAll().
  4. You don't need to use member variables when they could be local to the queryAll() function call.

所以,像下面这样应该工作:

So, something like the following should work:

public abstract class Query implements Callable<List<Result>> {
    private final String query; // gets set in the constructor

    public abstract List<Result> querySearchEngine();
    @Override
    public List<Result> call() {
        return querySearchEngine(query);
    }
}

public class Querier {   
    private static final ExecutorService executor = Executors.newCachedThreadPool();

    public List<Result> queryAll(List<Query> queries) {
        List<Future<List<Result>>> futures = executor.submitAll(queries);
        List<Result> aggregatedResults = new ArrayList<Result>();
        for (Future<List<Result>> future : futures) {  
            aggregatedResults.addAll(future.get());  // get() is somewhat similar to join?
        }  
        return aggregatedResults;
    }
}

这篇关于多线程搜索操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆