RavenDB:如何在 map-reduce 中正确索引笛卡尔积? [英] RavenDB: How can I properly index a cartesian product in a map-reduce?

查看:65
本文介绍了RavenDB:如何在 map-reduce 中正确索引笛卡尔积?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是 RavenDB:为什么我在这个多映射/减少索引中得到空值?,但我意识到,问题是另一个.

This question is a spin-off of RavenDB: Why do I get null-values for fields in this multi-map/reduce index?, but I realized, the problem was another.

考虑我极其简化的域,重写为电影租赁商店场景以进行抽象:

Consider my extremely simplified domain, rewritten to a movie rental store scenario for abstraction:

public class User
{
    public string Id { get; set; }
}

public class Movie
{
    public string Id { get; set; }
}

public class MovieRental
{
    public string Id { get; set; }
    public string MovieId { get; set; }
    public string UserId { get; set; }
}

这是一个教科书多对多的例子.

It's a text-book many-to-many example.

我要创建的索引是这样的:

The index I want to create is this:

对于给定的用户,给我一个数据库中每部电影的列表(过滤/搜索暂时忽略),以及一个描述用户租借这部电影的次数(或零次)的整数.

For a given user, give me a list of every movie in the database (filtering/search left out for the moment) along with an integer describing how many times (or zero) the user has rented this movie.

基本上是这样的:

用户:

| Id     |
|--------|
| John   |
| Lizzie |
| Albert |

电影:

| Id           |
|--------------|
| Robocop      |
| Notting Hill |
| Inception    |

电影租赁:

| Id        | UserId | MovieId      |
|-----------|--------|--------------|
| rental-00 | John   | Robocop      |
| rental-01 | John   | Notting Hill |
| rental-02 | John   | Notting Hill |
| rental-03 | Lizzie | Robocop      |
| rental-04 | Lizzie | Robocop      |
| rental-05 | Lizzie | Inception    |

理想情况下,我想要一个索引来查询,看起来像这样:

Ideally, I want an index to query, that would look like this:

| UserId | MovieId      | RentalCount |
|--------|--------------|-------------|
| John   | Robocop      | 1           |
| John   | Notting Hill | 2           |
| John   | Inception    | 0           |
| Lizzie | Robocop      | 2           |
| Lizzie | Notting Hill | 0           |
| Lizzie | Inception    | 1           |
| Albert | Robocop      | 0           |
| Albert | Notting Hill | 0           |
| Albert | Inception    | 0           |

或声明式:

  • 我总是想要所有电影的完整列表(最终我会添加过滤/搜索) - 即使向从未租借过一部电影的用户提供
  • 我想要每个用户的租金计数,只是整数
  • 我希望能够按租借次数排序 - 即在列表顶部显示给定用户租借次数最多的电影

然而,我找不到一种方法来制作上面的交叉连接"并将其保存在索引中.相反,我最初认为我在下面的这个操作中做得对,但它不允许我进行排序(请参阅失败的测试):

However, I can't find a way to make the "cross-join" above and save it in the index. Instead, I initially thought I got it right with this maneuver below, but it does not allow me to sort (see failing test):

{"不支持计算:x.UserRentalCounts.SingleOrDefault(rentalCount => (rentalCount.UserId == value(UnitTestProject2.MovieRentalTests+<>c__DisplayClass0_0).user_john.Id)).Count.你不能在 RavenDB 中使用计算查询(只允许简单的成员表达式)."}

{"Not supported computation: x.UserRentalCounts.SingleOrDefault(rentalCount => (rentalCount.UserId == value(UnitTestProject2.MovieRentalTests+<>c__DisplayClass0_0).user_john.Id)).Count. You cannot use computation in RavenDB queries (only simple member expressions are allowed)."}

我的问题基本上是:我如何 - 或者我完全可以 - 索引以便满足我的要求?

下面是我提到的例子,它不能满足我的要求,但这就是我现在的情况.它使用以下包(VS2015):

Below is my mentioned example, that does not fulfill my requirements, but that's where I am right now. It uses the following packages (VS2015):

packages.config

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Microsoft.Owin.Host.HttpListener" version="3.0.1" targetFramework="net461" />
  <package id="NUnit" version="3.5.0" targetFramework="net461" />
  <package id="RavenDB.Client" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Database" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Tests.Helpers" version="3.5.2" targetFramework="net461" />
</packages>

MovieRentalTests.cs

using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
using Raven.Client.Indexes;
using Raven.Client.Linq;
using Raven.Tests.Helpers;

namespace UnitTestProject2
{
    [TestFixture]
    public class MovieRentalTests : RavenTestBase
    {
        [Test]
        public void DoSomeTests()
        {
            using (var server = GetNewServer())
            using (var store = NewRemoteDocumentStore(ravenDbServer: server))
            {
                //Test-data
                var user_john = new User { Id = "John" };
                var user_lizzie = new User { Id = "Lizzie" };
                var user_albert = new User { Id = "Albert" };


                var movie_robocop = new Movie { Id = "Robocop" };
                var movie_nottingHill = new Movie { Id = "Notting Hill" };
                var movie_inception = new Movie { Id = "Inception" };

                var rentals = new List<MovieRental>
                {
                    new MovieRental {Id = "rental-00", UserId = user_john.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-01", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-02", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-03", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-04", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-05", UserId = user_lizzie.Id, MovieId = movie_inception.Id}
                };

                //Init index
                new Movies_WithRentalsByUsersCount().Execute(store);

                //Insert test-data in db
                using (var session = store.OpenSession())
                {
                    session.Store(user_john);
                    session.Store(user_lizzie);
                    session.Store(user_albert);

                    session.Store(movie_robocop);
                    session.Store(movie_nottingHill);
                    session.Store(movie_inception);

                    foreach (var rental in rentals)
                    {
                        session.Store(rental);
                    }

                    session.SaveChanges();

                    WaitForAllRequestsToComplete(server);
                    WaitForIndexing(store);
                }

                //Test of correct rental-counts for users
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .ToList();

                    var robocopWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_robocop.Id);
                    Assert.AreEqual(1, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(2, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);

                    var nottingHillWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_nottingHill.Id);
                    Assert.AreEqual(2, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
                }

                // Test that you for a given user can sort the movies by view-count
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .OrderByDescending(x => x.UserRentalCounts.SingleOrDefault(rentalCount => rentalCount.UserId == user_john.Id).Count)
                            .ToList();

                    Assert.AreEqual(movie_nottingHill.Id, allMoviesWithRentalCounts[0].MovieId);
                    Assert.AreEqual(movie_robocop.Id, allMoviesWithRentalCounts[1].MovieId);
                    Assert.AreEqual(movie_inception.Id, allMoviesWithRentalCounts[2].MovieId);
                }
            }
        }

        public class Movies_WithRentalsByUsersCount :
            AbstractMultiMapIndexCreationTask<Movies_WithRentalsByUsersCount.ReducedResult>
        {
            public Movies_WithRentalsByUsersCount()
            {
                AddMap<MovieRental>(rentals =>
                    from r in rentals
                    select new ReducedResult
                    {
                        MovieId = r.MovieId,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = r.UserId, Count = 1 } }
                    });

                AddMap<Movie>(movies =>
                    from m in movies
                    select new ReducedResult
                    {
                        MovieId = m.Id,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = null, Count = 0 } }
                    });

                Reduce = results =>
                    from result in results
                    group result by result.MovieId
                    into g
                    select new
                    {
                        MovieId = g.Key,
                        UserRentalCounts = (
                                from userRentalCount in g.SelectMany(x => x.UserRentalCounts)
                                group userRentalCount by userRentalCount.UserId
                                into subGroup
                                select new UserRentalCount { UserId = subGroup.Key, Count = subGroup.Sum(b => b.Count) })
                            .ToArray()
                    };
            }

            public class ReducedResult
            {
                public string MovieId { get; set; }
                public UserRentalCount[] UserRentalCounts { get; set; }
            }

            public class UserRentalCount
            {
                public string UserId { get; set; }
                public int Count { get; set; }
            }
        }

        public class User
        {
            public string Id { get; set; }
        }

        public class Movie
        {
            public string Id { get; set; }
        }

        public class MovieRental
        {
            public string Id { get; set; }
            public string MovieId { get; set; }
            public string UserId { get; set; }
        }
    }
}

推荐答案

由于您的要求是针对给定用户",如果您确实只查找单个用户,则可以使用 Multi-Map 索引来实现.使用 Movies 表本身生成基线零计数记录,然后在此基础上映射用户的实际 MovieRentals 记录.

Since your requirement says "for a given user", if you really are looking only for a single user, you can do this with a Multi-Map index. Use the Movies table itself to produce the baseline zero-count records and then map in the actual MovieRentals records for the user on top of that.

如果你真的需要所有电影的所有用户都需要它,我不相信有一种方法可以用 RavenDB 干净地做到这一点,因为这会被认为是 报告被认为是 RavenDB 的问题之一.

If you really need it for all users crossed with all movies, I don't believe there is a way to do this cleanly with RavenDB as this would be considered reporting which is noted as one of the sour spots for RavenDB.

如果您真的想尝试使用 RavenDB 执行此操作,这里有一些选项:

Here are some options if you really want to try to do this with RavenDB:

1) 在数据库中为每个用户和每部电影创建虚拟记录,并在索引中使用 0 计数.每当添加/更新/删除电影或用户时,相应地更新虚拟记录.

1) Create dummy records in the DB for every user and every movie and use those in your index with a 0 count. Whenever a movie or user is added/updated/deleted, update the dummy records accordingly.

2) 根据请求自己在内存中生成零计数记录,并将该数据与 RavenDB 为您提供的非零计数数据合并.查询所有用户,查询所有电影,创建基线零计数记录,然后对非零计数进行实际查询并在其上分层.最后,应用分页/过滤/排序逻辑.

2) Generate the zero-count records yourself in memory on request and merge that data with the data that RavenDB gives you back for the non-zero counts. Query for all users, query for all movies, create the baseline zero-count records, then do the actual query for non-zero counts and layer that on top. Finally, apply paging/filtering/sorting logic.

3) 使用 SQL 复制包将用户、电影和 MovieRental 表复制到 SQL 并使用 SQL 进行此报告"查询.

3) Use the SQL replication bundle to replicate the Users, Movies, and MovieRental tables out to SQL and use SQL for this "reporting" query.

这篇关于RavenDB:如何在 map-reduce 中正确索引笛卡尔积?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆