在Java中处理HTTP调用的大文件 [英] Process Large File for HTTP Calls in Java

查看:145
本文介绍了在Java中处理HTTP调用的大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数百万行的文件,我需要处理它。该文件的每一行都将导致HTTP调用。我正在试图找出解决问题的最佳方法。

I have a file with millions of lines in it that I need to process. Each line of the file will result in an HTTP call. I'm trying to figure out the best way to attack the problem.

我显然只能读取文件并顺序拨打电话,但速度会非常慢。我想并行化调用,但我不确定是否应该将整个文件读入内存(我不是很喜欢的东西)或尝试并行化文件的读取(我是我不确定是否有意义。)

I obviously could just read the file and make the calls sequentially, but it would be incredibly slow. I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory (something I'm not a huge fan of) or try to parallelize the reading of the file as well (which I'm not sure would make sense).

在这里寻找一些关于解决问题的最佳方法的想法。如果现有的框架或库做了类似的事情,我也很乐意使用它。

Just looking for some thoughts here on the best way to attack the problem. If there is an existing framework or library that does something similar I'm happy to use that as well.

谢谢。

推荐答案


我想并行化调用,但我不确定是否应将整个文件读入内存

I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory

您应该使用 ExecutorService 并使用有界 BlockingQueue 。当您读入百万行时,您将作业提交到线程池,直到 BlockingQueue 已满。通过这种方式,您可以同时运行100(或任何数量最佳)的HTTP请求,而无需事先读取文件的所有行。

You should used an ExecutorService with a bounded BlockingQueue. As you read in your million lines you submit jobs to the thread-pool until the BlockingQueue is full. This way you will be able to run 100 (or whatever number is optimal) of HTTP requests simultaneously without having to read all of the lines of the file beforehand.

你'我需要设置一个 RejectedExecutionHandler ,阻止队列是否已满。这比调用者运行处理程序更好。

You'll need to set up a RejectedExecutionHandler that blocks if the queue is full. This is better than a caller runs handler.

BlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(100);
// NOTE: you want the min and max thread numbers here to be the same value
ThreadPoolExecutor threadPool =
    new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue);
// we need our RejectedExecutionHandler to block if the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
       @Override
       public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
           try {
                // this will block the producer until there's room in the queue
                executor.getQueue().put(r);
           } catch (InterruptedException e) {
                throw new RejectedExecutionException(
                   "Unexpected InterruptedException", e);
           }
    }
});

// now read in the urls
while ((String url = urlReader.readLine()) != null) {
    // submit them to the thread-pool.  this may block.
    threadPool.submit(new DownloadUrlRunnable(url));
}
// after we submit we have to shutdown the pool
threadPool.shutdown();
// wait for them to complete
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);

...
private class DownloadUrlRunnable implements Runnable {
    private final String url;
    public DownloadUrlRunnable(String url) {
       this.url = url;
    }
    public void run() {
       // download the URL
    }
}

这篇关于在Java中处理HTTP调用的大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆