并行执行与StackExchange.Redis? [英] Parallel execution with StackExchange.Redis?

查看:1278
本文介绍了并行执行与StackExchange.Redis?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 A 1M物品商店名单,LT;人> 我敢序列化,以便插入到Redis的。 (2.8)



我把 10 任务和LT之间的工作;> 每个需要它自己的节( 列表<> 是线程安全的只读的(It是安全的一个列表 的)

简化:



例如:



有关产品= 100 THREADS = 10 ,每个工作将捕捉自己的页面,并处理相关的范围



有关的exaple:

 无效的主要()
{
VAR ITEMS = 100;
VAR THREADS = 10;
VAR PAGE = 4;

文件清单< INT> LST = Enumerable.Range(0项).ToList();

的for(int i = 0; I<项目/线程;我++)
{
LST [PAGE *(个/线程)+ I]使用.dump();
}
}




  • PAGE = 0 将处理: 0,1,2,3,4 ,5,6,7,8,9

  • PAGE = 4 将处理: 40,41,42,43,44,45,46,47,48,49



一切ok。



现在回到SE.redis。



我想实现这个模式,所以我做:(与产品= 1,000,000





我的测试:



(这是 dbsize 检查每个秒):





正如你所看到的,是通过10个线程加入1M记录。



现在,我不知道,如果它的速度快,但是当我改变从 1M 项目 10M - 事情变得确实的慢,我得到异常:



唯一的例外是在循环



< BLOCKQUOTE>

未处理的异常:System.AggregateException:一个或多个错误
发生。 ---




System.TimeoutException:超时执行SET金塔:用户> 288257,研究所:1,queu E:11,曲= 0 ,QS = 11,QC = 0,WR = 0/0,在
StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl [T](消息
messa GE,ResultProcessor 1个处理器,ServerEndPoint服务器)在
C:\TeamCity\buildAgen
t\work\58bc9a6df18a3782\StackExchange.Redis\StackExchange\Redis\ConnectionMultip
lexer.cs :在
StackExchange.Redis.RedisBase.ExecuteSync [T](消息消息,
ResultProces SOR
1个处理器,ServerEndPoint服务器)在
C线1722:\TeamCity \buildAgent\work\58bc9a6df
18a3782\StackExchange.Redis\StackExchange\Redis\RedisBase.cs:行79
......按任意键继续。 。





问:




  • 是我的划分工作的方式是正确的方式(最快)

  • 我怎样才能得到的东西快(示例代码是多少赞赏)

  • 我怎样才能解决这个例外



相关信息:



< gcAllowVeryLargeObjects启用=真/> 是目前在App.config中(否则我得到OutOfMemoryException异常)也 - 建立x64bit,我有16GB,SSD硬盘,酷睿i7 CPU)


解决方案

目前,您的代码使用的是同步API( StringSet ),并正在由10线程同时加载。这将显示没有明显的挑战SE.Redis - 它工作得很好这里。我的犯罪嫌疑人的,这确实是服务器已经采取的时间比你想处理一些数据,很可能也涉及到服务器的分配器超时。一个选择的话,是简单的增加超时有点的。不是很多...尝试5秒而不是默认的1秒。 。可能的是,大部分的操作都在非常快反正



至于加速起来:这里一个选择是的迫不及待的 - 即保持流水线的数据。如果你的内容不检查每一个消息错误状态,那么一个简单的方法来做到这一点是添加,国旗:CommandFlags.FireAndForget 来后,你的 StringSet 电话。在我的本地测试,这加快了1M例如25%(我怀疑很多时候实际上是在串序列度过剩下的)。



我曾与10M比如最大的问题是根本的的开销与10M例如工作的 - 尤其是因为这需要大量内存同时为 Redis的服务器和应用程序,它(模仿您的设置),在同一台机器上。这就造成竞争的内存压力,与GC暂停等的托管代码。但也许更重要的是:它只是需要永远开始做任何事情的。因此,我重构使用并行收益回报率发电机,而不是一个单独的列表中的代码。例如:

 静态的IEnumerable<&人GT; InventPeople(INT种子,诠释计数)
{
的for(int i = 0; I<计数;我++)
{
INT F = 1 +种子+我;
VAR项目=新的Person
{
编号= F,
名称= Path.GetRandomFileName()。替换(。,).Substring(0,appRandom。 Value.Next(3,6))++ Path.GetRandomFileName()。替换(。,).Substring(0,新的随机(Guid.NewGuid()。GetHashCode()方法),接着(3 ,6)),
年龄= F%90,
好​​友= ParallelEnumerable.Range(0,100)。选择(N =方式> appRandom.Value.Next(1,f))的ToArray的( )
};
收益回报的项目;
}
}

静态的IEnumerable< T> Batchify< T>(这个IEnumerable的< T>源,诠释计数)
{
无功名单=新名单< T>(计数);
的foreach(源VAR项)
{
list.Add(项目);
如果(list.Count ==计数)
{
的foreach(在列表变种x)的产量返回X;
list.Clear();
}
}
的foreach(在列表VAR项)收益回报的项目;
}





  

$ b $的foreach(在InventPeople(PER_THREAD * C1的,PER_THREAD).Batchify(1000)VAR元素) b

在这里, Batchify 的目的是确保我们不是通过为各操作之间明显的时间帮助服务器太多 - 数据,是发明于1000批次,每批提供非常迅速



我也很关心JSON的表现,所以我切换到JIL:

 公共静态字符串的toJSON< T>(这件T OBJ)
{
返回Jil.JSON.Serialize< T>(OBJ );
}



然后只是为了好玩,我搬到了JSON工作纳入配料(所以实际处理循环:

 的foreach(在InventPeople(PER_THREAD * C1的,PER_THREAD)
。选择(VAR元素X =方式>新{x.Id,JSON = x.ToJSON()})Batchify(1000))

这得到了次下来多一点,这样我就可以3分57秒,速度的42194 ROPS,其中大部分时间其实是在应用程序内的本地处理负荷10M。如果我改变它,这样每个线程加载的相同的项目产品/线程次,那么这变为1分48秒 - 的92592 ROPS率



我不知道如果我真的回答什么,但短版本可能只是尝试更长的超时。考虑使用发射后不管)


I have a 1M items store in List<Person> Which I'm serializing in order to insert to Redis. (2.8)

I divide work among 10 Tasks<> where each takes its own section ( List<> is thread safe for readonly ( It is safe to perform multiple read operations on a List)

Simplification :

example:

For ITEMS=100, THREADS=10 , each Task will capture its own PAGE and deal with the relevant range.

For exaple :

void Main()
{
    var ITEMS=100;
    var THREADS=10;
    var PAGE=4;

    List<int> lst = Enumerable.Range(0,ITEMS).ToList();

    for (int i=0;i< ITEMS/THREADS ;i++)
    {
      lst[PAGE*(ITEMS/THREADS)+i].Dump();
    }
}

  • PAGE=0 will deal with : 0,1,2,3,4,5,6,7,8,9
  • PAGE=4 will deal with : 40,41,42,43,44,45,46,47,48,49

All ok.

Now back to SE.redis.

I wanted to implement this pattern and so I did : (with ITEMS=1,000,000)

My testing :

(Here is dbsize checking each second) :

As you can see , 1M records were added via 10 threads.

Now , I don't know if it's fast but , when I change ITEMS from 1M to 10M -- things get really slow and I get exception :

The exception is on the for loop.

Unhandled Exception: System.AggregateException: One or more errors occurred. ---

System.TimeoutException: Timeout performing SET urn:user>288257, inst: 1, queu e: 11, qu=0, qs=11, qc=0, wr=0/0, in=0/0 at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message messa ge, ResultProcessor1 processor, ServerEndPoint server) in c:\TeamCity\buildAgen t\work\58bc9a6df18a3782\StackExchange.Redis\StackExchange\Redis\ConnectionMultip lexer.cs:line 1722 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProces sor1 processor, ServerEndPoint server) in c:\TeamCity\buildAgent\work\58bc9a6df 18a3782\StackExchange.Redis\StackExchange\Redis\RedisBase.cs:line 79 ... ... Press any key to continue . . .

Question:

  • Is my way of dividing work is the RIGHT way (fastest)
  • How can I get things faster ( a sample code would be much appreciated)
  • How can I resolve this exception?

Related info :

<gcAllowVeryLargeObjects enabled="true" /> Is present in App.config ( otherwise i'm getting outOfmemoryException ) , also - build for x64bit, I have 16GB , , ssd drive , i7 cpu).

解决方案

Currently, your code is using the synchronous API (StringSet), and is being loaded by 10 threads concurrently. This will present no appreciable challenge to SE.Redis - it works just fine here. I suspect that it genuinely is a timeout where the server has taken longer than you would like to process some of the data, most likely also related to the server's allocator. One option, then, is to simply increase the timeout a bit. Not a lot... try 5 seconds instead of the default 1 second. Likely, most of the operations are working very fast anyway.

With regards to speeding it up: one option here is to not wait - i.e. keep pipelining data. If you are content not to check every single message for an error state, then one simple way to do this is to add , flags: CommandFlags.FireAndForget to the end of your StringSet call. In my local testing, this sped up the 1M example by 25% (and I suspect a lot of the rest of the time is actually spent in string serialization).

The biggest problem I had with the 10M example was simply the overhead of working with the 10M example - especially since this takes huge amounts of memory for both the redis-server and the application, which (to emulate your setup) are on the same machine. This creates competing memory pressure, with GC pauses etc in the managed code. But perhaps more importantly: it simply takes forever to start doing anything. Consequently, I refactored the code to use parallel yield return generators rather than a single list. For example:

    static IEnumerable<Person> InventPeople(int seed, int count)
    {
        for(int i = 0; i < count; i++)
        {
            int f = 1 + seed + i;
            var item = new Person
            {
                Id = f,
                Name = Path.GetRandomFileName().Replace(".", "").Substring(0, appRandom.Value.Next(3, 6)) + " " + Path.GetRandomFileName().Replace(".", "").Substring(0, new Random(Guid.NewGuid().GetHashCode()).Next(3, 6)),
                Age = f % 90,
                Friends = ParallelEnumerable.Range(0, 100).Select(n => appRandom.Value.Next(1, f)).ToArray()
            };
            yield return item;
        }
    }

    static IEnumerable<T> Batchify<T>(this IEnumerable<T> source, int count)
    {
        var list = new List<T>(count);
        foreach(var item in source)
        {
            list.Add(item);
            if(list.Count == count)
            {
                foreach (var x in list) yield return x;
                list.Clear();
            }
        }
        foreach (var item in list) yield return item;
    }

with:

foreach (var element in InventPeople(PER_THREAD * counter1, PER_THREAD).Batchify(1000))

Here, the purpose of Batchify is to ensure that we aren't helping the server too much by taking appreciable time between each operation - the data is invented in batches of 1000 and each batch is made available very quickly.

I was also concerned about JSON performance, so I switched to JIL:

    public static string ToJSON<T>(this T obj)
    {
        return Jil.JSON.Serialize<T>(obj);
    }

and then just for fun, I moved the JSON work into the batching (so that the actual processing loops :

 foreach (var element in InventPeople(PER_THREAD * counter1, PER_THREAD)
     .Select(x => new { x.Id, Json = x.ToJSON() }).Batchify(1000))

This got the times down a bit more, so I can load 10M in 3 minutes 57 seconds, a rate of 42,194 rops. Most of this time is actually local processing inside the application. If I change it so that each thread loads the same item ITEMS / THREADS times, then this changes to 1 minute 48 seconds - a rate of 92,592 rops.

I'm not sure if I've answered anything really, but the short version might be simply "try a longer timeout; consider using fire-and-forget).

这篇关于并行执行与StackExchange.Redis?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆