为什么在 redis 中使用管道时 100,000 条记录如此之慢? [英] why it is so slow with 100,000 records when using pipeline in redis?

查看:22
本文介绍了为什么在 redis 中使用管道时 100,000 条记录如此之慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据说在redis需要很多set/get时,pipeline是更好的方式,所以这是我的测试代码:

It is said that pipeline is a better way when many set/get is required in redis, so this is my test code:

public class TestPipeline {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
        List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
        list.add(si);
        ShardedJedis jedis = new ShardedJedis(list);
        long startTime = System.currentTimeMillis();
        ShardedJedisPipeline pipeline = jedis.pipelined();
        for (int i = 0; i < 100000; i++) {
            Map<String, String> map = new HashMap<String, String>();
            map.put("id", "" + i);
            map.put("name", "lyj" + i);
            pipeline.hmset("m" + i, map);
        }
        pipeline.sync();
        long endTime = System.currentTimeMillis();
        System.out.println(endTime - startTime);
    }
}

当我运行它时,这个程序有一段时间没有响应,但是当我不使用pipe时,它只需要20073毫秒,所以我很困惑为什么它是偶数没有 pipeline 更好,差距有多大!

When I ran it, there is no response with this program for a while, but when I don't work with pipe, it takes only 20073 ms, so I am confused why it is even better without pipeline and how a wide gap!

感谢回答我,几个问题,6MB的数据是怎么计算的?当我发送 10K 数据时,pipeline 总是比正常模式快,但是使用 100k 时,pipeline 不会响应.我认为 100-1000 次操作是一个明智的选择,如下所述.JIT 有什么问题,因为我不明白吗?

Thanks for answer me, a few questions, how do you calculate 6MB data? When I send 10K data, pipeline is always faster than normal mode, but with 100k, pipeline would no response.I think 100-1000 operations is a advisable choice as below said.Is there anyting with JIT since I don't understand it?

推荐答案

在编写此类基准测试(尤其是使用 JVM 的基准测试)之前,您需要考虑以下几点:

There are a few points you need to consider before writing such a benchmark (and especially a benchmark using the JVM):

  • 在大多数(物理)机器上,当使用流水线时,Redis 能够处理超过 100K ops/s.您的基准测试仅处理 10 万个项目,因此它的持续时间不足以产生有意义的结果.此外,没有时间让 JIT 的后续阶段开始.

  • on most (physical) machines, Redis is able to process more than 100K ops/s when pipelining is used. Your benchmark only deals with 100K item, so it does not last long enough to produce meaningful results. Furthermore, there is no time for the successive stages of the JIT to kick in.

绝对时间不是一个非常相关的指标.在保持基准测试运行至少 10 秒的同时显示吞吐量(即每秒的操作数)将是一个更好、更稳定的指标.

the absolute time is not a very relevant metric. Displaying the throughput (i.e. the number of operation per second) while keeping the benchmark running for at least 10 seconds would be a better and more stable metric.

你的内部循环产生了很多垃圾.如果您计划对 Jedis+Redis 进行基准测试,那么您需要将自己的程序的开销保持在较低水平.

your inner loop generates a lot of garbage. If you plan to benchmark Jedis+Redis, then you need to keep the overhead of your own program low.

因为您已将所有内容都定义到 main 函数中,所以 JIT 将不会编译您的循环(取决于您使用的 JVM).只有内部方法调用可能.如果您希望 JIT 高效,请确保将您的代码封装到 JIT 可以编译的方法中.

because you have defined everything into the main function, your loop will not be compiled by the JIT (depending on the JVM you use). Only the inner method calls may be. If you want the JIT to be efficient, make sure to encapsulate your code into methods that can be compiled by the JIT.

或者,您可能希望在执行实际测量之前添加一个预热阶段,以避免计算使用准系统解释器运行第一次迭代的开销以及 JIT 本身的成本.

optionally, you may want to add a warm-up phase before performing the actual measurement to avoid accounting the overhead of running the first iterations with the bare-bone interpreter, and the cost of the JIT itself.

现在,关于 Redis 流水线,您的流水线太长了.管道中有 10 万条命令意味着 Jedis 必须在向 Redis 发送任何内容之前构建一个 6MB 缓冲区.这意味着套接字缓冲区(在客户端,也可能是服务器端)将饱和,Redis 也将不得不处理 6 MB 的通信缓冲区.

Now, regarding Redis pipelining, your pipeline is way too long. 100K commands in the pipeline means Jedis has to build a 6MB buffer before sending anything to Redis. It means the socket buffers (on client side, and perhaps server-side) will be saturated, and that Redis will have to deal with 6 MB communication buffers as well.

此外,您的基准测试仍然是同步的(使用管道并不会神奇地使其异步).换句话说,在您的管道的最后一个查询发送到 Redis 之前,Jedis 不会开始阅读回复.当管道太长时,它有可能阻塞.

Furthermore, your benchmark is still synchronous (using a pipeline does not magically make it asynchronous). In other words, Jedis will not start reading replies until the last query of your pipeline has been sent to Redis. When the pipeline is too long, it has the potential to block things.

考虑将管道的大小限制为 100-1000 次操作.当然,它会产生更多的往返次数,但是通信栈的压力会降低到可以接受的水平.例如,考虑以下程序:

Consider limiting the size of the pipeline to 100-1000 operations. Of course, it will generate more roundtrips, but the pressure on the communication stack will be reduced to an acceptable level. For instance, consider the following program:

import redis.clients.jedis.*;
import java.util.*;

public class TestPipeline {

    /**
     * @param args
     */

    int i = 0; 
    Map<String, String> map = new HashMap<String, String>();
    ShardedJedis jedis;  

    // Number of iterations
    // Use 1000 to test with the pipeline, 100 otherwise
    static final int N = 1000;

    public TestPipeline() {
      JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
      List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
      list.add(si);
      jedis = new ShardedJedis(list);
    } 

    public void push( int n ) {
     ShardedJedisPipeline pipeline = jedis.pipelined();
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      pipeline.hmset("m" + i, map);
      ++i;
     }
     pipeline.sync(); 
    }

    public void push2( int n ) {
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      jedis.hmset("m" + i, map);
      ++i;
     }
    }

    public static void main(String[] args) {
      TestPipeline obj = new TestPipeline();
      long startTime = System.currentTimeMillis();
      for ( int j=0; j<N; j++ ) {
       // Use push2 instead to test without pipeline
       obj.push(1000); 
       // Uncomment to see the acceleration
       //System.out.println(obj.i);
     }
     long endTime = System.currentTimeMillis();
     double d = 1000.0 * obj.i;
     d /= (double)(endTime - startTime);
     System.out.println("Throughput: "+d);
   }
 }

使用此程序,您可以使用或不使用流水线进行测试.使用流水线时一定要增加迭代次数(N 参数),使其至少运行 10 秒.如果您取消循环中 println 的注释,您将意识到程序在开始时很慢,并且会随着 JIT 开始优化事物而变得更快(这就是为什么程序应该至少运行几秒钟才能给出有意义的结果).

With this program, you can test with or without pipelining. Be sure to increase the number of iterations (N parameter) when pipelining is used, so that it runs for at least 10 seconds. If you uncomment the println in the loop, you will realize that the program is slow at the begining and will get quicker as the JIT starts to optimize things (that's why the program should run at least several seconds to give a meaningful result).

在我的硬件(一个旧的 Athlon 机器)上,当使用管道时,我可以获得 8-9 倍的吞吐量.可以通过优化内循环中的键/值格式并添加预热阶段来进一步改进程序.

On my hardware (an old Athlon box), I can get 8-9 times more throughput when the pipeline is used. The program could be further improved by optimizing key/value formatting in the inner loop and adding a warm-up phase.

这篇关于为什么在 redis 中使用管道时 100,000 条记录如此之慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆