在redis中使用管道时,为什么有100,000条记录这么慢? [英] why it is so slow with 100,000 records when using pipeline in redis?

查看:99
本文介绍了在redis中使用管道时,为什么有100,000条记录这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据说当redis需要很多set/get时,pipeline是更好的方法,所以这是我的测试代码:

public class TestPipeline {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
        List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
        list.add(si);
        ShardedJedis jedis = new ShardedJedis(list);
        long startTime = System.currentTimeMillis();
        ShardedJedisPipeline pipeline = jedis.pipelined();
        for (int i = 0; i < 100000; i++) {
            Map<String, String> map = new HashMap<String, String>();
            map.put("id", "" + i);
            map.put("name", "lyj" + i);
            pipeline.hmset("m" + i, map);
        }
        pipeline.sync();
        long endTime = System.currentTimeMillis();
        System.out.println(endTime - startTime);
    }
}

当我运行它时,该程序暂时没有任何响应,但是当我不使用pipe时,它只需要20073毫秒,所以我很困惑为什么没有pipeline时它会更好以及多么大的差距!

感谢您回答我几个问题,您如何计算6MB数据? 当我发送10K数据时,流水线总是比正常模式快,但是当发送100K数据时,流水线将无响应.我认为100-1000的操作是一个明智的选择,如下所述.由于我不理解,JIT是否有用?

解决方案

在编写此类基准测试(尤其是使用JVM的基准测试)之前,需要注意以下几点:

  • 在大多数(物理)机器上,当使用流水线操作时,Redis能够处理超过100K ops/s.您的基准测试仅处理10万个项目,因此持续时间不足以产生有意义的结果.此外,JIT的后续阶段没有时间开始.

  • 绝对时间不是一个非常相关的指标.在保持基准运行至少10秒的同时显示吞吐量(即每秒的操作数)将是一个更好,更稳定的指标.

  • 您的内部循环会产生大量垃圾.如果您打算对Jedis + Redis进行基准测试,则需要将自己程序的开销保持在较低水平.

  • 因为您已将所有内容定义到main函数中,所以JIT不会编译您的循环(取决于您使用的JVM).只能是内部方法调用.如果您希望JIT高效,请确保将代码封装到JIT可以编译的方法中.

  • (可选),您可能需要在执行实际测量之前添加一个预热阶段,以避免考虑使用准系统解释器运行第一次迭代的开销以及JIT本身的成本.

  • p>

现在,关于Redis管道,您的管道太长了.流水线中的100K命令意味着Jedis必须先构建6MB的缓冲区,然后再向Redis发送任何内容.这意味着套接字缓冲区(在客户端,甚至在服务器端)将达到饱和,并且Redis也必须处理6 MB的通信缓冲区.

此外,您的基准测试仍然是同步的(使用管道并不能使它神奇地变为异步).换句话说,在管道的最后一个查询发送到Redis之前,Jedis将不会开始阅读答复.当管道太长时,它有可能阻塞事物.

请考虑将管道的大小限制为100-1000个操作.当然,它将产生更多的往返,但是对通信堆栈的压力将降低到可接受的水平.例如,考虑以下程序:

import redis.clients.jedis.*;
import java.util.*;

public class TestPipeline {

    /**
     * @param args
     */

    int i = 0; 
    Map<String, String> map = new HashMap<String, String>();
    ShardedJedis jedis;  

    // Number of iterations
    // Use 1000 to test with the pipeline, 100 otherwise
    static final int N = 1000;

    public TestPipeline() {
      JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
      List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
      list.add(si);
      jedis = new ShardedJedis(list);
    } 

    public void push( int n ) {
     ShardedJedisPipeline pipeline = jedis.pipelined();
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      pipeline.hmset("m" + i, map);
      ++i;
     }
     pipeline.sync(); 
    }

    public void push2( int n ) {
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      jedis.hmset("m" + i, map);
      ++i;
     }
    }

    public static void main(String[] args) {
      TestPipeline obj = new TestPipeline();
      long startTime = System.currentTimeMillis();
      for ( int j=0; j<N; j++ ) {
       // Use push2 instead to test without pipeline
       obj.push(1000); 
       // Uncomment to see the acceleration
       //System.out.println(obj.i);
     }
     long endTime = System.currentTimeMillis();
     double d = 1000.0 * obj.i;
     d /= (double)(endTime - startTime);
     System.out.println("Throughput: "+d);
   }
 }

使用此程序,您可以进行流水线测试或不进行流水线测试.使用流水线操作时,请确保增加迭代次数(N参数),以使其运行至少10秒钟.如果您在循环中取消对println的注释,您将意识到该程序在开始时会很慢,并且在JIT开始优化事物时会变得更快(这就是为什么该程序至少应运行几秒钟才能得出有意义的结果). /p>

在我的硬件(旧的Athlon盒)上,使用管道时,吞吐量可以提高8-9倍.通过优化内部循环中的键/值格式并添加预热阶段,可以进一步改进该程序.

It is said that pipeline is a better way when many set/get is required in redis, so this is my test code:

public class TestPipeline {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
        List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
        list.add(si);
        ShardedJedis jedis = new ShardedJedis(list);
        long startTime = System.currentTimeMillis();
        ShardedJedisPipeline pipeline = jedis.pipelined();
        for (int i = 0; i < 100000; i++) {
            Map<String, String> map = new HashMap<String, String>();
            map.put("id", "" + i);
            map.put("name", "lyj" + i);
            pipeline.hmset("m" + i, map);
        }
        pipeline.sync();
        long endTime = System.currentTimeMillis();
        System.out.println(endTime - startTime);
    }
}

When I ran it, there is no response with this program for a while, but when I don't work with pipe, it takes only 20073 ms, so I am confused why it is even better without pipeline and how a wide gap!

Thanks for answer me, a few questions, how do you calculate 6MB data? When I send 10K data, pipeline is always faster than normal mode, but with 100k, pipeline would no response.I think 100-1000 operations is a advisable choice as below said.Is there anyting with JIT since I don't understand it?

解决方案

There are a few points you need to consider before writing such a benchmark (and especially a benchmark using the JVM):

  • on most (physical) machines, Redis is able to process more than 100K ops/s when pipelining is used. Your benchmark only deals with 100K item, so it does not last long enough to produce meaningful results. Furthermore, there is no time for the successive stages of the JIT to kick in.

  • the absolute time is not a very relevant metric. Displaying the throughput (i.e. the number of operation per second) while keeping the benchmark running for at least 10 seconds would be a better and more stable metric.

  • your inner loop generates a lot of garbage. If you plan to benchmark Jedis+Redis, then you need to keep the overhead of your own program low.

  • because you have defined everything into the main function, your loop will not be compiled by the JIT (depending on the JVM you use). Only the inner method calls may be. If you want the JIT to be efficient, make sure to encapsulate your code into methods that can be compiled by the JIT.

  • optionally, you may want to add a warm-up phase before performing the actual measurement to avoid accounting the overhead of running the first iterations with the bare-bone interpreter, and the cost of the JIT itself.

Now, regarding Redis pipelining, your pipeline is way too long. 100K commands in the pipeline means Jedis has to build a 6MB buffer before sending anything to Redis. It means the socket buffers (on client side, and perhaps server-side) will be saturated, and that Redis will have to deal with 6 MB communication buffers as well.

Furthermore, your benchmark is still synchronous (using a pipeline does not magically make it asynchronous). In other words, Jedis will not start reading replies until the last query of your pipeline has been sent to Redis. When the pipeline is too long, it has the potential to block things.

Consider limiting the size of the pipeline to 100-1000 operations. Of course, it will generate more roundtrips, but the pressure on the communication stack will be reduced to an acceptable level. For instance, consider the following program:

import redis.clients.jedis.*;
import java.util.*;

public class TestPipeline {

    /**
     * @param args
     */

    int i = 0; 
    Map<String, String> map = new HashMap<String, String>();
    ShardedJedis jedis;  

    // Number of iterations
    // Use 1000 to test with the pipeline, 100 otherwise
    static final int N = 1000;

    public TestPipeline() {
      JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
      List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
      list.add(si);
      jedis = new ShardedJedis(list);
    } 

    public void push( int n ) {
     ShardedJedisPipeline pipeline = jedis.pipelined();
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      pipeline.hmset("m" + i, map);
      ++i;
     }
     pipeline.sync(); 
    }

    public void push2( int n ) {
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      jedis.hmset("m" + i, map);
      ++i;
     }
    }

    public static void main(String[] args) {
      TestPipeline obj = new TestPipeline();
      long startTime = System.currentTimeMillis();
      for ( int j=0; j<N; j++ ) {
       // Use push2 instead to test without pipeline
       obj.push(1000); 
       // Uncomment to see the acceleration
       //System.out.println(obj.i);
     }
     long endTime = System.currentTimeMillis();
     double d = 1000.0 * obj.i;
     d /= (double)(endTime - startTime);
     System.out.println("Throughput: "+d);
   }
 }

With this program, you can test with or without pipelining. Be sure to increase the number of iterations (N parameter) when pipelining is used, so that it runs for at least 10 seconds. If you uncomment the println in the loop, you will realize that the program is slow at the begining and will get quicker as the JIT starts to optimize things (that's why the program should run at least several seconds to give a meaningful result).

On my hardware (an old Athlon box), I can get 8-9 times more throughput when the pipeline is used. The program could be further improved by optimizing key/value formatting in the inner loop and adding a warm-up phase.

这篇关于在redis中使用管道时,为什么有100,000条记录这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆