我应该在DocumentDb中取消规范化或运行多个查询吗? [英] Should I denormalize or run multiple queries in DocumentDb?

查看:95
本文介绍了我应该在DocumentDb中取消规范化或运行多个查询吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习DocumentDb中的数据建模.这是我需要一些建议的地方

I'm learning about data modeling in DocumentDb. Here's where I need some advice

请在下面查看我的文档.

Please see what my documents look like down below.

在这里我可以采取两种方法,各有利弊.

I can take two approaches here both with pros and cons.

场景1:

如果我通过将项目团队成员信息(例如名字,姓氏,电子邮件等)保留在与项目相同的文件中来保持数据的非规范化(请参见下面的文档),则可以在一个查询中获得所需的信息,但当Jane Doe结婚并且她的姓氏更改时,我必须更新Projects集合中的许多文档.我还必须非常小心,以确保所有包含员工信息的文档集合也要更新.例如,如果我在Projects集合中更新了Jane Doe的名字,却忘了更新TimeSheets集合,那就麻烦了!

If I keep the data denormalized (see my documents below) by keeping project team member information i.e. first, last name, email, etc. in the same document as the project, I can get the information I need in one query BUT when Jane Doe gets married and her last name changes, I'd have to update a lot of documents in the Projects collection. I'd also have to be extremely careful in making sure that all collections with documents that contain employee information get updated as well. If, for example, I update Jane Doe's name in Projects collection but forget to update the TimeSheets collection, I'd be in trouble!

方案2:

如果我将数据进行某种程度的标准化,并且仅在项目文档中保留EmployeeId,那么我每当想要获得项目列表时都可以运行三个查询:

If I keep data somewhat normalized and keep only EmployeeId in the project documents, I can then run three queries whenever I want to get a projects list:

  • 查询1返回项目列表
  • 查询2会给我第一个查询中出现的所有项目团队成员的EmployeeId's
  • 查询3以获取员工信息,例如名字,姓氏,电子邮件等.我将使用查询2的结果来运行此信息

然后我可以合并应用程序中的所有数据.

I can then combine all the data in my application.

这里的问题是DocumentDb现在似乎有很多限制.我可能正在阅读数百个项目,并且项目团队中有数百名员工.似乎没有一种有效的方法来获取其ID出现在第二个查询中的所有员工信息.再次提醒您,我可能需要在此处提取数百名员工信息.如果下面的SQL查询是我要用于员工数据的查询,则可能必须运行几次相同的查询才能获取所需的所有信息,因为我认为我无法拥有数百个OR语句:

The problem here is that DocumentDb seems to have a lot of limitations now. I may be reading hundreds of projects with hundreds of employees in project teams. Looks like there's no efficient way to get all employee information whose Id's appear in my second query. Again, please keep in mind that I may need to pull hundreds of employee information here. If the following SQL query is what I'd use for employee data, I may have to run the same query a few times to get all the information I need because I don't think I can have hundreds of OR statements:

SELECT e.Id, e.firstName, e.lastName, e.emailAddress
FROM Employees e
WHERE e.Id = 1111 OR e.Id = 2222

我了解到DocumentDb仍在预览中,其中一些限制将得到修复.话虽如此,我应该如何解决这个问题?如何有效地存储/管理和检索我需要的所有项目数据-包括项目团队信息?方案1是更好的解决方案还是方案2,还是有更好的第三选择?

I understand that DocumentDb is still in preview and some of these limitations will be fixed. With that said, how should I approach this problem? How can I efficiently both store/manage and retrieve all project data I need -- including project team information? Is Scenario 1 a better solution or Scenario 2 or is there a better third option?

这是我的文档的样子.一,项目文件:

Here's what my documents look like. First, the project document:

{
   id: 789,
   projectName: "My first project",
   startDate: "9/6/2014",
   projectTeam: [
      { id: 1111, firstName: "John", lastName: "Smith", position: "Sr. Engineer" },
      { id: 2222, firstName: "Jane", lastName: "Doe", position: "Project Manager" }
   ]
}

这是雇员集合中的两个雇员文档:

And here are two employee documents which reside in the Employees collection:

{
   id: 1111,
   firstName: "John",
   lastName: "Smith",
   dateOfBirth: "1/1/1967',
   emailAddresses: [
      { email: "jsmith@domain1.com", isPrimary: "true" },
      { email: "john.smith@domain2.com", isPrimary: "false" }
   ]
},
{
   id: 2222,
   firstName: "Jane",
   lastName: "Doe",
   dateOfBirth: "3/8/1975',
   emailAddresses: [
      { email: "jane@domain1.com", isPrimary: "true" }
   ]
}

推荐答案

我认为您在考虑对项目和员工数据进行规范化或反规范化之间的权衡是正确的.正如您提到的:

I believe you're on the right track in considering the trade-offs between normalizing or de-normalizing your project and employee data. As you've mentioned:

场景1):如果您对数据模型进行非规范化(将项目和员工数据耦合在一起)-您可能会发现自己更新许多项目时, >更新一名员工.

Scenario 1) If you de-normalize your data model (couple projects and employee data together) - you may find yourself having to update many projects when you update an employee.

方案2):如果您规范化数据模型(将项目和员工数据分离)-您必须查询项目以检索employeeId,然后如果想要获取列表,则查询员工.项目雇员的数量.

Scenario 2) If you normalize your data model (decouple projects and employee data) - you would have to query for projects to retrieve employeeIds and then query for the employees if you wanted to get the list of employees belonging to a project.

鉴于您的应用程序的用例,我会选择适当的权衡.通常,当您有大量读取应用程序时,我更喜欢取消规范化;而对于有大量写入应用程序,我更喜欢进行规范化.

I would pick the appropriate trade-off given your application's use case. In general, I prefer de-normalizing when you have a read-heavy application and normalizing when you have a write-heavy application.

请注意,您可以通过利用DocumentDB的存储过程来避免在应用程序和数据库之间进行多次往返(查询将在DocumentDB服务器端执行).

Note that you can avoid having to make multiple roundtrips between your application and the database by leveraging DocumentDB's store procedures (queries would be performed on DocumentDB-server-side).

这是一个示例存储过程,用于检索属于特定projectId的员工:

Here's an example store procedure for retrieving employees belonging to a specific projectId:

function(projectId) {
  /* the context method can be accessed inside stored procedures and triggers*/
  var context = getContext();
  /* access all database operations - CRUD, query against documents in the current collection */
  var collection = context.getCollection();
  /* access HTTP response body and headers from the procedure */
  var response = context.getResponse();

  /* Callback for processing query on projectId */
  var projectHandler = function(documents) {
    var i;
    for (i = 0; i < documents[0].projectTeam.length; i++) {
      // Query for the Employees
      queryOnId(documents[0].projectTeam[i].id, employeeHandler);
    }
  };

  /* Callback for processing query on employeeId */
  var employeeHandler = function(documents) {
    response.setBody(response.getBody() + JSON.stringify(documents[0]));
  };

  /* Query on a single id and call back */
  var queryOnId = function(id, callbackHandler) {
    collection.queryDocuments(collection.getSelfLink(),
      'SELECT * FROM c WHERE c.id = \"' + id + '\"', {},
      function(err, documents) {
        if (err) {
          throw new Error('Error' + err.message);
        }
        if (documents.length < 1) {
          throw 'Unable to find id';
        }
        callbackHandler(documents);
      }
    );
  };

  // Query on the projectId
  queryOnId(projectId, projectHandler);
}

即使DocumentDB在预览期间支持有限的OR语句,您仍然可以通过将employeeId查找拆分为一堆异步服务器端查询来获得相对较好的性能.

Even though DocumentDB supports limited OR statements during the preview - you can still get relatively good performance by splitting the employeeId-lookups into a bunch of asynchronous server-side queries.

这篇关于我应该在DocumentDb中取消规范化或运行多个查询吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆