RavenDb 中的 Map Reduce,更新 1 [英] Map reduce in RavenDb, update 1

查看:53
本文介绍了RavenDb 中的 Map Reduce,更新 1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新1 ,遵循Ayende的回答

这是我第一次进入RavenDb,并尝试使用它编写了一个较小的map/reduce,但不幸的是结果还是空的?

我在 RavenDb 中加载了大约 160 万个文档

文档:

 公共类打勾{公共DateTime时间;公众小数公开十进制出价;公共双AskVolume;公开双BidVolume;} 

,并希望在特定时间段内获得最低和最高要求.

按时间收集的定义为:

  var ticks = session.Query< Tick>().Where(x => x.Time> new DateTime(2012,4,23)&& x.Time< new DateTime(2012,4,24,00,0,0)).ToList(); 

到目前为止,这给了我90280个文档.

但随后地图/缩小:

  Map =行=>从行到行选择新的{Max = row.Bid,最小值=行.出价,时间=行.时间,计数= 1};减少=结果=>从结果到结果将结果按新的{result.MaxBid,result.Count}分组为g选择新的{最大值= g.Key.MaxBid,Min = g.Min(x => x.MaxBid),时间= g.Key.Time,计数= g.Sum(x => x.Count)}; 

...

 私有类TickAggregationResult{公开的十进制MaxBid {get;放;}公共十进制MinBid {get;放;}public int Count {get;放;}} 

然后我创建索引并尝试查询它:

  Raven.Client.Indexes.IndexCreation.CreateIndexes(typeof(TickAggregation).Assembly,documentStore);var session = documentStore.OpenSession();var g1 = session.Query< TickAggregationResult>(typeof(TickAggregation).Name);var group = session.Query< Tick,TickAggregation>()其中(x => x.Time> new DateTime(2012,4,23)&&时间x新的DateTime(2012,4,24,00,0,0)).Customize(x => x.WaitForNonStaleResults()).AsProjection< TickAggregationResult>();. 

但是该组只是空的:(

如您所见,我已经尝试了两种不同的查询,但是我不确定是否有区别,有人可以解释吗?

现在我得到一个错误:

该组仍然为空:(

让我解释一下我要在纯sql中完成的工作:

 从Ticks中选择min(Ask),count(*)作为TickCount在"2012-04-23"和"2012-04-24"之间的时间) 

解决方案

不幸的是,Map/Reduce无法那样工作.好吧,至少其中的减少部分没有.为了减少您的设置,您必须预先定义特定的时间范围以进行分组,例如,每天,每周,每月等.如果每天减少,则每天可以获取最小/最大/计数.

有一种获取所需内容的方法,但是它有一些性能方面的考虑.基本上,您根本不减少数据,但可以按时间编制索引,然后在转换结果时进行汇总.这类似于您运行第一个查询进行过滤,然后在客户端代码中进行汇总的情况.唯一的好处是聚合是在服务器端完成的,因此您不必将所有数据都传输到客户端.

这里的性能问题是您要过滤的时间范围有多大,或更准确地说,您的过滤范围内有多少个项目?如果相对较小,则可以使用此方法.如果太大,服务器将在等待结果集时等待.

以下是说明此技术的示例程序:

 使用系统;使用System.Linq;使用Raven.Client.Document;使用Raven.Client.Indexes;使用Raven.Client.Linq;命名空间ConsoleApplication1{公开课 打勾{公共字符串ID {get;放;}公共DateTime时间{get;放;}公开十进制出价{放;}}///< summary>///该索引是一个真实的map/reduce,但其总和为所有时间.///您不能按时间范围对其进行过滤.///</summary>class Ticks_Aggregate:AbstractIndexCreationTask< Tick,Ticks_Aggregate.Result>{公开课结果{公用十进制Min {get;放;}公开的十进制最大值{放;}public int Count {get;放;}}公共Ticks_Aggregate(){地图=刻度=>从滴答滴答选择新的{最小=勾号.最高=报价.计数= 1};减少=结果=>从结果到结果将结果分组为0入g选择新的{Min = g.Min(x => x.Min),最大值= g.Max(x => x.Max),计数= g.Sum(x => x.Count)};}}///< summary>///可以按时间范围过滤此索引,但不会减少任何内容///因此,如果过滤器中有很多项目,它就不会起作用.///</summary>class Ticks_ByTime:AbstractIndexCreationTask< Tick>{公开课结果{公用十进制Min {get;放;}公开的十进制最大值{放;}public int Count {get;放;}}公共Ticks_ByTime(){地图=刻度=>从滴答滴答选择新的{tick.Time};TransformResults =(数据库,滴答)=>从滴答滴答按0分组入g选择新的{Min = g.Min(x => x.Bid),最大值= g.最大值(x => x.Bid),计数= g.Count()};}}班级计划{私有静态void Main(){var documentStore =新DocumentStore {Url ="http://localhost:8080"};documentStore.Initialize();IndexCreation.CreateIndexes(typeof(Program).Assembly,documentStore);var today = DateTime.Today;var rnd = new Random();使用(var session = documentStore.OpenSession()){//产生100个随机滴答声对于(var i = 0; i< 100; i ++){var tick = new Tick {时间=今天.AddMinutes(i),出价= rnd.Next(100,1000)/100m};session.Store(tick);}session.SaveChanges();}使用(var session = documentStore.OpenSession()){//使用过滤器查询项目.这将创建一个动态索引.var fromTime = today.AddMinutes(20);var toTime = today.AddMinutes(80);var ticks = session.Query< Tick>()在哪里(x => x.Time> = fromTime&& x.Time< = toTime).OrderBy(x => x.Time);//输出上述查询的结果foreach(以刻度为单位的可变刻度)Console.WriteLine("{0} {1}",tick.Time,tick.Bid);//获取所有时间的聚合var total = session.Query< Tick,Ticks_Aggregate>().As< Ticks_Aggregate.Result>().单身的();Console.WriteLine();Console.WriteLine("Totals");Console.WriteLine("Min:{0}",total.Min);Console.WriteLine("Max: {0}", total.Max);Console.WriteLine("Count:{0}",total.Count);//使用过滤器获取聚合var 过滤 = session.Query()在哪里(x => x.Time> = fromTime&& x.Time< = toTime).As< Ticks_ByTime.Result>().Take(1024)//一次最多可以拍摄一次.ToList()//必须!.单身的();Console.WriteLine();Console.WriteLine("Filtered");Console.WriteLine("Min:{0}",filtered.Min);Console.WriteLine("Max:{0}",filtered.Max);Console.WriteLine("Count:{0}",filtered.Count);}Console.ReadLine();}}} 

我可以设想一个解决方案,以解决可能具有较大范围的时间过滤器上的聚合问题.减少将不得不将事物分解成在不同级别上逐渐减小的时间单位.用于此的代码有点复杂,但是我正在为自己的目的而工作.完成后,我将在知识库中发布www.ravendb.net.


更新

我正在玩这个游戏,并且在最后一个查询中注意到了两件事.

  1. 您必须在调用single之前执行 ToList()才能获得完整的结果集.
  2. 即使它在服务器上运行,结果范围内的最大值也可以是1024,并且您必须指定 Take(1024),否则将获得默认的最大值128.由于这是在服务器上运行的,所以我没想到这一点.但是我想是因为您通常不在TransformResults部分进行聚合.

我已经为此更新了代码.但是,除非您可以保证范围足够小,否则我将等待更好的完整地图/缩小.我在做这个工作.:)

Update 1 , following Ayende's answer

This is my first journey into RavenDb and to experiment with it I wrote a small map/ reduce, but unfortunately the result is empty?

I have around 1.6 million documents loaded into RavenDb

A document:

public class Tick
{
    public DateTime Time;
    public decimal Ask;
    public decimal Bid;
    public double AskVolume;
    public double BidVolume;
}

and wanted to get Min and Max of Ask over a specific period of Time.

The collection by Time is defined as:

var ticks = session.Query<Tick>().Where(x => x.Time > new DateTime(2012, 4, 23) && x.Time < new DateTime(2012, 4, 24, 00, 0, 0)).ToList();

Which gives me 90280 documents, so far so good.

But then the map/ reduce:

Map = rows => from row in rows 
                          select new
                          {
                              Max = row.Bid,
                              Min = row.Bid, 
                              Time = row.Time,
                              Count = 1
                          };

Reduce = results => from result in results
                                group result by new{ result.MaxBid, result.Count} into g
                                select new
                                {
                                    Max = g.Key.MaxBid,
                                    Min = g.Min(x => x.MaxBid),
                                    Time = g.Key.Time,
                                    Count = g.Sum(x => x.Count)

                                };

...

private class TickAggregationResult
{
    public decimal MaxBid { get; set; }
        public decimal MinBid { get; set; }
        public int Count { get; set; }

    }

I then create the index and try to Query it:

Raven.Client.Indexes.IndexCreation.CreateIndexes(typeof(TickAggregation).Assembly, documentStore);


        var session = documentStore.OpenSession();

        var g1 = session.Query<TickAggregationResult>(typeof(TickAggregation).Name);


        var group = session.Query<Tick, TickAggregation>()
                         .Where(x => x.Time > new DateTime(2012, 4, 23) && 
                                     x.Time < new DateTime(2012, 4, 24, 00, 0, 0)
                                  )
            .Customize(x => x.WaitForNonStaleResults())
                                           .AsProjection<TickAggregationResult>();

But the group is just empty :(

As you can see I've tried two different Queries, I'm not sure about the difference, can someone explain?

Now I get an error:

The group are still empty :(

Let me explain what I'm trying to accomplish in pure sql:

select min(Ask), count(*) as TickCount from Ticks 
where Time between '2012-04-23' and '2012-04-24)

解决方案

Unfortunately, Map/Reduce doesn't work that way. Well, at least the Reduce part of it doesn't. In order to reduce your set, you would have to predefine specific time ranges to group by, for example - daily, weekly, monthly, etc. You could then get min/max/count per day if you reduced daily.

There is a way to get what you want, but it has some performance considerations. Basically, you don't reduce at all, but you index by time and then do the aggregation when transforming results. This is similar to if you ran your first query to filter and then aggregated in your client code. The only benefit is that the aggregation is done server-side, so you don't have to transmit all of that data to the client.

The performance concern here is how big of a time range are you filtering to, or more precisely, how many items will there be inside your filter range? If it's relatively small, you can use this approach. If it's too large, you will be waiting while the server goes through the result set.

Here is a sample program that illustrates this technique:

using System;
using System.Linq;
using Raven.Client.Document;
using Raven.Client.Indexes;
using Raven.Client.Linq;

namespace ConsoleApplication1
{
  public class Tick
  {
    public string Id { get; set; }
    public DateTime Time { get; set; }
    public decimal Bid { get; set; }
  }

  /// <summary>
  /// This index is a true map/reduce, but its totals are for all time.
  /// You can't filter it by time range.
  /// </summary>
  class Ticks_Aggregate : AbstractIndexCreationTask<Tick, Ticks_Aggregate.Result>
  {
    public class Result
    {
      public decimal Min { get; set; }
      public decimal Max { get; set; }
      public int Count { get; set; }
    }

    public Ticks_Aggregate()
    {
      Map = ticks => from tick in ticks
               select new
                    {
                      Min = tick.Bid,
                      Max = tick.Bid,
                      Count = 1
                    };

      Reduce = results => from result in results
                group result by 0
                  into g
                  select new
                         {
                           Min = g.Min(x => x.Min),
                           Max = g.Max(x => x.Max),
                           Count = g.Sum(x => x.Count)
                         };
    }
  }

  /// <summary>
  /// This index can be filtered by time range, but it does not reduce anything
  /// so it will not be performant if there are many items inside the filter.
  /// </summary>
  class Ticks_ByTime : AbstractIndexCreationTask<Tick>
  {
    public class Result
    {
      public decimal Min { get; set; }
      public decimal Max { get; set; }
      public int Count { get; set; }
    }

    public Ticks_ByTime()
    {
      Map = ticks => from tick in ticks
               select new {tick.Time};

      TransformResults = (database, ticks) =>
                 from tick in ticks
                 group tick by 0
                 into g
                 select new
                      {
                        Min = g.Min(x => x.Bid),
                        Max = g.Max(x => x.Bid),
                        Count = g.Count()
                      };
    }
  }

  class Program
  {
    private static void Main()
    {
      var documentStore = new DocumentStore { Url = "http://localhost:8080" };
      documentStore.Initialize();
      IndexCreation.CreateIndexes(typeof(Program).Assembly, documentStore);


      var today = DateTime.Today;
      var rnd = new Random();

      using (var session = documentStore.OpenSession())
      {
        // Generate 100 random ticks
        for (var i = 0; i < 100; i++)
        {
          var tick = new Tick { Time = today.AddMinutes(i), Bid = rnd.Next(100, 1000) / 100m };
          session.Store(tick);
        }

        session.SaveChanges();
      }


      using (var session = documentStore.OpenSession())
      {
        // Query items with a filter.  This will create a dynamic index.
        var fromTime = today.AddMinutes(20);
        var toTime = today.AddMinutes(80);
        var ticks = session.Query<Tick>()
          .Where(x => x.Time >= fromTime && x.Time <= toTime)
          .OrderBy(x => x.Time);

        // Ouput the results of the above query
        foreach (var tick in ticks)
          Console.WriteLine("{0} {1}", tick.Time, tick.Bid);

        // Get the aggregates for all time
        var total = session.Query<Tick, Ticks_Aggregate>()
          .As<Ticks_Aggregate.Result>()
          .Single();
        Console.WriteLine();
        Console.WriteLine("Totals");
        Console.WriteLine("Min: {0}", total.Min);
        Console.WriteLine("Max: {0}", total.Max);
        Console.WriteLine("Count: {0}", total.Count);

        // Get the aggregates with a filter
        var filtered = session.Query<Tick, Ticks_ByTime>()
          .Where(x => x.Time >= fromTime && x.Time <= toTime)
          .As<Ticks_ByTime.Result>()
          .Take(1024)  // max you can take at once
          .ToList()    // required!
          .Single();
        Console.WriteLine();
        Console.WriteLine("Filtered");
        Console.WriteLine("Min: {0}", filtered.Min);
        Console.WriteLine("Max: {0}", filtered.Max);
        Console.WriteLine("Count: {0}", filtered.Count);
      }

      Console.ReadLine();
    }
  }
}

I can envision a solution to the problem of aggregating over a time filter with a potentially large scope. The reduce would have to break things down into decreasingly smaller units of time at different levels. The code for this is a bit complex, but I am working on it for my own purposes. When complete, I will post over in the knowledge base at www.ravendb.net.


UPDATE

I was playing with this a bit more, and noticed two things in that last query.

  1. You MUST do a ToList() before calling single in order to get the full result set.
  2. Even though this runs on the server, the max you can have in the result range is 1024, and you have to specify a Take(1024) or you get the default of 128 max. Since this runs on the server, I didn't expect this. But I guess its because you don't normally do aggregations in the TransformResults section.

I've updated the code for this. However, unless you can guarantee that the range is small enough for this to work, I would wait for the better full map/reduce that I spoke of. I'm working on it. :)

这篇关于RavenDb 中的 Map Reduce,更新 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆