使用 StackExchange.Redis 的 RedisTimeoutException 突发 [英] Bursts of RedisTimeoutException using StackExchange.Redis

查看:368
本文介绍了使用 StackExchange.Redis 的 RedisTimeoutException 突发的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 StackExchange Redis 库追踪间歇性的突发"超时.这里有一些关于我们的设置:我们的 API 是用 C# 编写的,并在 Windows 2008 和 IIS 上运行.我们有 4 台 API 服务器在生产中,我们有 4 台 Redis 机器(运行 Linux 最新 LTS),每台机器有 2 个 Redis 实例(一个在端口 7000 上的主服务器,在端口 7001 上的一个从服务器).我几乎查看了 Redis 服务器的每个方面,它们看起来都很棒.日志中没有错误,CPU 和网络都很棒,服务器端的一切似乎都很棒.我可以在发生这种情况时 tail -f Redis 日志,并且看不到任何异常(例如重写 AOF 文件或任何内容).我不认为问题出在 Redis 上.

I'm trying to track down intermittent "bursts" of timeouts using the StackExchange Redis library. Here's a bit about our setup: Our API is written in C# and runs on Windows 2008 and IIS. We have 4 API servers in production, and we have 4 Redis machines (Running Linux latest LTS), each with 2 instances of Redis (one master on port 7000, one slave on port 7001). I've looked at pretty much every aspect of the Redis servers and they look fantastic. No errors in the logs, CPU and network is great, everything with the server side of things seem fantastic. I can tail -f the Redis logs while this is happening and don't see anything out of the ordinary (such as rewriting AOF files or anything). I don't think the problem is with Redis.

以下是我目前所知道的:

Here's what I know so far:

  • 我们每小时会看到几次这些超时异常.通常在一分钟内有 40-50 次超时,有时高达 80-90 次.然后,他们会离开几分钟.在过去 24 小时内,此类事件大约有 5,000 个,并且它们是从单个 API 客户端突发发生的.
  • 这些超时发生在 Redis master 节点上,而不会发生在 slave 节点上.但是,它们会在各种 Redis 命令(例如 GET 和 SET)中发生.
  • 当这些超时突发发生时,调用来自单个 API 服务器,但碰巧与各种 Redis 节点通信.例如,API3 可能有一堆超时试图调用 Cache1、Cache2 和 Cache3.这是强有力的证据,表明问题与 API 服务器有关,与 Redis 服务器无关.
  • Redis 主节点有 108 个连接的客户端.我记录了当前连接,这个数字保持稳定.连接中没有大的峰值,而且看起来没有任何错误代码创建了太多连接或不共享 ConnectionMultiplexer 实例(我有一个并且它是静态的)
  • Redis 从节点有 58 个连接的客户端,这看起来也非常稳定.
  • 我们使用的是 StackExchange.Redis 1.2.6 版
  • Redis 使用 AOF 模式,磁盘大小约为 195MB
  • We see these timeout exceptions several times an hour. Usually between 40-50 timeouts in a minute, sometimes up to 80-90. Then, they'll go away for several minutes. There were about 5,000 of these events in the past 24 hours, and they happen in bursts from a single API client.
  • These timeouts only happen against Redis master nodes, never against slave nodes. However, they happen with various Redis commands such as GETs and SETs.
  • When a burst of these timeouts happen, the calls are coming from a single API server but happen talking to various Redis nodes. For example, API3 might have a bunch of timeouts trying to call Cache1, Cache2 and Cache3. This is strong evidence that the issue is related to the API servers, not the Redis servers.
  • The Redis master nodes have 108 connected clients. I log current connections, and this number remains stable. There are no big spikes in connections, and it doesn't look like there's any bad code creating too many connections or not sharing ConnectionMultiplexer instances (I have one and it's static)
  • The Redis slave nodes have 58 connected clients, and this also looks completely stable as well.
  • We're using StackExchange.Redis version 1.2.6
  • Redis is using AOF mode, and size on disk is about 195MB

这是一个超时异常示例.大多数看起来和这个差不多:

Here's an example timeout exception. Most look pretty much the same as this:

Type=StackExchange.Redis.RedisTimeoutException,Message=超时执行 GET limade:allActivities, inst: 1, mgr: ExecuteSelect,错误:从不,队列:0,qu:0,qs:0,qc:0,wr:0,wq:0,in:0,ar:0,客户端名称:LIMEADEAPI4,服务器端点:10.xx.xx.11:7000,keyHashSlot: 1295, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER:(Busy=9,Free=32758,Min=2,Max=32767)(请看这个有关可能导致超时的一些常见客户端问题的文章:http://stackexchange.github.io/StackExchange.Redis/Timeouts),StackTrace=在StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](消息消息,ResultProcessor1 处理器,ServerEndPoint 服务器)在StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](消息消息,ResultProcessor1 个处理器,ServerEndPoint 服务器)在StackExchange.Redis.RedisBase.ExecuteSync[T](Message message,ResultProcessor1 处理器,ServerEndPoint 服务器)在StackExchange.Redis.RedisDatabase.StringGet(RedisKey 键,CommandFlags标志)在 Limeade.Caching.Providers.RedisCacheProvider1.Get[T](KcacheKey、CacheItemVersion&cacheItemVersion) 在 ...

Type=StackExchange.Redis.RedisTimeoutException,Message=Timeout performing GET limeade:allActivities, inst: 1, mgr: ExecuteSelect, err: never, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName: LIMEADEAPI4, serverEndpoint: 10.xx.xx.11:7000, keyHashSlot: 1295, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=9,Free=32758,Min=2,Max=32767) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts),StackTrace= at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags) at Limeade.Caching.Providers.RedisCacheProvider1.Get[T](K cacheKey, CacheItemVersion& cacheItemVersion) in ...

我对追踪这些超时异常做了一些研究,但令人惊讶的是所有数字都为零.队列中没有任何内容,没有等待处理的内容,我有大量空闲线程并且没有做任何事情.一切看起来都很棒.

I've done a bit of research on tracing down these timeout exceptions, but what's rather surprising is all the numbers are all zeros. Nothing in the queue, nothing waiting to be processed, I have tons of threads free and not doing anything. Everything looks great.

有人对如何解决这个问题有任何想法吗?问题是这些突发的缓存超时导致我们的数据库被更多地命中,在某些情况下这是一件坏事.我很乐意添加任何其他人认为有帮助的信息.

Anyone have any ideas on how to fix this? The problem is these bursts of cache timeouts cause our database to be hit more, and in certain circumstances this is a bad thing. I'm happy to add any more info that anyone would find helpful.

更新:连接代码

连接到 Redis 的代码是一个相当复杂的系统的一部分,该系统支持各种缓存环境和配置,但我可能可以将其归结为基础.首先,有一个 CacheFactory 类:

The code to connect to Redis is part of a fairly complex system that supports various cache environments and configuration, but I can probably boil it down to the basics. First, there's a CacheFactory class:

public class CacheFactory : ICacheFactory
{
    private static readonly ILogger log = LoggerManager.GetLogger(typeof(CacheFactory));
    private static readonly ICacheProvider<CacheKey> cache;

    static CacheFactory()
    {
        ICacheFactory<CacheKey> configuredFactory = CacheFactorySection.Current?.CreateConfiguredFactory<CacheKey>();
        if (configuredFactory == null)
        {
           // Some error handling, not important
        }

        cache = configuredFactory.GetDefaultCache();
    }

    // ...
}

ICacheProvider 实现了一种与特定缓存系统通信的方法,可以对其进行配置.在本例中,configuredFactory 是一个 RedisCacheFactory,如下所示:

The ICacheProvider is what implements a way to talk to a certain cache system, which can be configured. In this case, the configuredFactory is a RedisCacheFactory which looks like this:

public class RedisCacheFactory<T> : ICacheFactory<T> where T : CacheKey, ICacheKeyRepository
{
    private RedisCacheProvider<T> provider;
    private readonly RedisConfiguration configuration;

    public RedisCacheFactory(RedisConfiguration config)
    {
        this.configuration = config;
    }

    public ICacheProvider<T> GetDefaultCache()
    {
        return provider ?? (provider = new RedisCacheProvider<T>(configuration));
    }
}

GetDefaultCache 方法在静态构造函数中被调用一次,并返回一个 RedisCacheProvider.这个类是真正连接到Redis的:

The GetDefaultCache method is called once, in the static constructor, and returns a RedisCacheProvider. This class is what actually connects to Redis:

public class RedisCacheProvider<K> : ICacheProvider<K> where K : CacheKey, ICacheKeyRepository
{
    private readonly ConnectionMultiplexer redisConnection;
    private readonly IDatabase db;
    private readonly RedisCacheSerializer serializer;
    private static readonly ILog log = Logging.RedisCacheProviderLog<K>();
    private readonly CacheMonitor<K> cacheMonitor;
    private readonly TimeSpan defaultTTL;
    private int connectionErrors;

    public RedisCacheProvider(RedisConfiguration options)
    {
        redisConnection = ConnectionMultiplexer.Connect(options.EnvironmentOverride ?? options.Connection);
        db = redisConnection.GetDatabase();
        serializer = new RedisCacheSerializer(options.SerializationBinding);
        cacheMonitor = new CacheMonitor<K>();
        defaultTTL = options.DefaultTTL;

        IEnumerable<string> hosts = options.Connection.EndPoints.Select(e => (e as DnsEndPoint)?.Host);
        log.InfoFormat("Created Redis ConnectionMultiplexer connection.  Hosts=({0})", String.Join(",", hosts));
    }

    // ...
 }

构造函数基于配置的 Redis 端点(在一些配置文件中)创建一个 ConnectionMultiplexer.每次创建连接时我也会记录.我们没有看到过多的这些日志语句,并且与 Redis 的连接保持稳定.

The constructor creates a ConnectionMultiplexer based on the configured Redis endpoints (which are in some config file). I also log every time I create a connection. We don't see any excessive number of these log statements, and the connections to Redis remains stable.

推荐答案

global.asax 中,尝试添加:

protected void Application_Start(object sender, EventArgs e)
{
    ThreadPool.SetMinThreads(200, 200);
}

对我们来说,这将错误从每天 50-100 次减少到零.我相信没有关于设置什么数字的一般规则,因为它取决于系统(200 对我们有用),因此您可能需要进行一些实验.

For us, this reduced errors from ~50-100 daily to zero. I believe there is no general rule for what numbers to set as it's system dependant (200 works for us) so might require some experimenting on your end.

我也相信这提高了网站的性能.

I also believe this has improved the performance of the site.

这篇关于使用 StackExchange.Redis 的 RedisTimeoutException 突发的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆