Spring Data Mongo - 在嵌入式文档中应用唯一的组合字段 [英] Spring Data Mongo - apply unique combination fields in embedded document

查看:48
本文介绍了Spring Data Mongo - 在嵌入式文档中应用唯一的组合字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发 Spring Boot v2.1.3.RELEASE &Spring Data Mongo.在这个例子中,我想在电子邮件和应用程序上应用唯一性.部门名称.email & 的组合deptName 必须是唯一的,有没有办法将电子邮件发送出去,因为它在每个数组对象中重复?

我在下面尝试过,但它不起作用!

@CompoundIndexes({@CompoundIndex(name = "email_deptName_idx", def = "{'email' : 1, 'technologyEmployeeRef.technologyCd' : 1}")})

示例数据

<代码>{"_id" : ObjectId("5ec507c72d8c2136245d35ce"),........"firstName": "约翰","lastName" : "母鹿","email" : "john.doe@gmail.com",...............技术员工参考":[{"technologyCd" : "john.doe@gmail.com","technologyName": "咨询",.........状态":A"},{"technologyCd" : "john.doe@gmail.com","technologyName": "税收",..........状态":A"}],电话代码":[+352"],........}

Technology.java

@Data@Builder@AllArgsConstructor@NoArgsConstructor@文档公开课技术{@Indexed(name = "technologyCd", unique = true, sparse = true)私有字符串技术CD;@Indexed(name = "technologyName", unique = true, sparse = true)私有字符串技术名称;私有字符串状态;}

EmployeeTechnologyRef.java

@Data@Builder@AllArgsConstructor@NoArgsConstructor公共类员工技术参考 {私有字符串技术CD;私有字符串初级技术;私人字符串状态;}

员工.java

@Data@Builder@AllArgsConstructor@NoArgsConstructor@文档@CompoundIndexes({@CompoundIndex(name="emp_tech_indx", def = "{'employeeTechnologyRefs.primaryTechnology' : 1, 'employeeTechnologyRefs.technologyCd' : 1}" ,unique = true, sparse = true)})公共类员工{私人字符串名字;私人字符串姓氏;私人字符串电子邮件;私人列表员工技术参考;}

我使用了下面的代码,但它没有给我任何重复的错误.我们该怎么做?

Technology java8 = Technology.builder().technologyCd("Java").technologyName("Java8").status("A").build();Technology spring = Technology.builder().technologyCd("Spring").technologyName("Spring Boot2").status("A").build();列表<技术>技术 = 新的 ArrayList();技术.添加(java8);技术.添加(弹簧);technologyRepository.saveAll(技术);EmployeeTechnologyRef t1 = EmployeeTechnologyRef.builder().technologyCd("Java").primaryTechnology("Y").status("A").建造();EmployeeTechnologyRef t2 = EmployeeTechnologyRef.builder().technologyCd("Spring").primaryTechnology("Y").status("A").建造();ListemployeeTechnologyRefs = new ArrayList<>();员工TechnologyRefs.add(t1);员工技术参考.add(t2);员工TechnologyRefs.add(t1);雇员雇员 = Employee.builder().firstName("John").lastName("Kerr").email("john.kerr@gmail.com").employeeTechnologyRefs(employeeTechnologyRefs).build();employeeRepository.save(员工);

解决方案

在 MongoDB 中,唯一索引可确保字段中的特定值不会出现在多个文档中.它保证一个值在单个文档中的数组中是唯一的.这在此处中有解释MongoDB 手册,其中讨论了唯一的多键索引.

因此,唯一索引不能满足您的要求.它将防止单独的文档包含重复的组合,但它仍然允许单个文档在数组中包含重复的值.

您拥有的最佳选择是更改数据模型,以便将 technologyEmployeeRef 对象数组拆分为单独的文档.将其拆分为单独的文档将允许您使用唯一索引来强制唯一性.

针对此数据模型更改应采用的特定实现将取决于您的访问模式(这超出了本问题的范围).

<小时>

可以实现的一种方法是创建一个 TechnologyEmployee 集合,该集合包含当前存在于 technologyEmployeeRef 数组中的所有字段.此外,此 TechnologyEmployee 集合将有一个字段,例如电子邮件,允许您将其与 Employee 集合中的文档相关联.

示例员工文档

<代码>{........"firstName": "约翰","lastName" : "母鹿","email" : "john.doe@gmail.com",...............}

员工技术文档示例

<代码>{"email" : "john.doe@gmail.com","technologyCd" : "Java","technologyName" : "Java8",.........状态":A"}

EmployeeTechnology 集合中的索引

{'email' : 1, 'technologyCd' : 1}, {unique: true}

这种方法的缺点是您需要从两个集合中读取所有数据.如果您很少需要同时从两个集合中检索数据,那么这个缺点可能不是什么大问题.如果您确实需要所有数据,则可以通过使用索引来加快速度.使用索引,可以通过使用 covered 进一步加速查询.

<小时>

另一种选择是对数据进行非规范化.为此,您可以复制需要与技术数据同时访问的员工数据.

示例文档

<预><代码>[{...."firstName": "约翰","lastName" : "母鹿","email" : "john.doe@gmail.com",....."technologyCd" : "Java","technologyName" : "Java8",....状态":A"},{...."firstName": "约翰","lastName" : "母鹿","email" : "john.doe@gmail.com",....."technologyCd" : "春天","technologyName": "Spring Boot2",....状态":A"}]

这篇 MongoDB 博客文章,他们说

<块引用>

您只会对经常读取的字段执行此操作,读取的频率比更新的要多,并且不需要强一致性,因为更新非规范化值更慢、更昂贵,并且不是原子的.

<小时>

或者正如您已经提到的,保留数据模型原样并在应用程序端执行唯一性检查可能是有意义的.这可能会为您提供最佳的读取性能,但它确实有一些缺点.首先,它会减慢写入操作的速度,因为应用程序在更新数据库之前需要运行一些检查.

这可能不太可能,但也有可能您最终仍会得到重复项.如果有两个背靠背请求将相同的 EmployeeTechnology 对象插入到数组中,则第二个请求的验证可能会在第一个请求写入数据库之前完成(并通过).我自己在我处理的应用程序中看到了类似的场景.即使应用程序正在检查唯一性,如果用户双击提交按钮,数据库中最终会出现重复的条目.在这种情况下,在第一次点击时禁用按钮会大大降低风险.这种小风险可能是可以容忍的,具体取决于您的要求和重复条目的影响.

<小时>

哪种方法最有意义在很大程度上取决于您的访问模式和要求.希望这会有所帮助.

I'm working on Spring Boot v2.1.3.RELEASE & Spring Data Mongo. In this example, I want to apply uniqueness on email & deptName. The combination of email & deptName must be unique and is there any way to get email out since its repeating in each array object ?

I tried below, but it's not working !

@CompoundIndexes({
    @CompoundIndex(name = "email_deptName_idx", def = "{'email' : 1, 'technologyEmployeeRef.technologyCd' : 1}")
})

Sample Data

{
    "_id" : ObjectId("5ec507c72d8c2136245d35ce"),
    ....
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "john.doe@gmail.com",
    .....
    .....
    .....
    "technologyEmployeeRef" : [ 
        {
            "technologyCd" : "john.doe@gmail.com",
            "technologyName" : "Advisory",
            ....
            .....
            "Status" : "A"
        }, 
        {
           "technologyCd" : "john.doe@gmail.com",
           "technologyName" : "Tax",
           .....
           .....
           "Status" : "A"
       }
    ],
    "phoneCodes" : [ 
        "+352"
    ],
    ....
    ....
}

Technology.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document
public class Technology {
    @Indexed(name = "technologyCd", unique = true, sparse = true)
    private String technologyCd;

    @Indexed(name = "technologyName", unique = true, sparse = true)
    private String technologyName;
    private String status;
}

EmployeeTechnologyRef.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class EmployeeTechnologyRef {
    private String technologyCd;
    private String primaryTechnology;
    private String status;
}

Employee.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document
@CompoundIndexes({
    @CompoundIndex(name="emp_tech_indx", def = "{'employeeTechnologyRefs.primaryTechnology' : 1, 'employeeTechnologyRefs.technologyCd' : 1}" ,unique = true, sparse = true)
})
public class Employee {
    private String firstName;
    private String lastName;
    private String email;
    private List<EmployeeTechnologyRef> employeeTechnologyRefs;
}

I used below code but its not giving me any error of duplicate. How can we do this ?

Technology java8 = Technology.builder().technologyCd("Java").technologyName("Java8").status("A").build();
Technology spring = Technology.builder().technologyCd("Spring").technologyName("Spring Boot2").status("A").build();
List<Technology> technologies = new ArrayList<>();
technologies.add(java8);
technologies.add(spring);

technologyRepository.saveAll(technologies);

EmployeeTechnologyRef t1 = EmployeeTechnologyRef.builder().technologyCd("Java").primaryTechnology("Y")
        .status("A")
        .build();
EmployeeTechnologyRef t2 = EmployeeTechnologyRef.builder().technologyCd("Spring").primaryTechnology("Y")
        .status("A")
        .build();
List<EmployeeTechnologyRef> employeeTechnologyRefs = new ArrayList<>();
employeeTechnologyRefs.add(t1);
employeeTechnologyRefs.add(t2);
employeeTechnologyRefs.add(t1);

Employee employee = Employee.builder().firstName("John").lastName("Kerr").email("john.kerr@gmail.com")
        .employeeTechnologyRefs(employeeTechnologyRefs).build();
employeeRepository.save(employee);

解决方案

In MongoDB, a unique index ensures that a particular value in a field is not present in more than one document. It will not guarantee that a value is unique across an array within a single document. This is explained here in the MongoDB Manual where it discusses unique multikey Indexes.

Thus, a unique index will not satisfy your requirement. It will prevent seperate documents from containing duplicate combinations, but it will still allow a single document to contain duplicate values across an array.

The best option you have is to change your data model so as to split the array of technologyEmployeeRef objects into separate documents. Splitting it up into separate documents will allow you to use a unique index to enforce uniqueness.

The particular implementation that should be taken for this data model change would depend upon your access pattern (which is out of the scope of this question).


One such way this could be done is to create a TechnologyEmployee collection that has all of the fields that currently exist in the technologyEmployeeRef array. Additionally, this TechnologyEmployee collection would have a field, such as email, which would allow you to associate it with a document in the Employee collection.

Sample Employee Document

{
  ....
  ....
  "firstName" : "John",
  "lastName" : "Doe",
  "email" : "john.doe@gmail.com",
  .....
  .....
  .....
}

Sample EmployeeTechnology Document

{
  "email" : "john.doe@gmail.com",
  "technologyCd" : "Java",
  "technologyName" : "Java8",
  ....
  .....
  "status" : "A"
}

Index in EmployeeTechnology collection

{'email' : 1, 'technologyCd' : 1}, {unique: true}

The disadvantage of this approach is that you would need to read from two collections to have all of the data. This drawback may not be a big deal if you rarely need to retrieve the data from both collections at the same time. If you do need all the data, it can be sped up through use of indexes. With the indexes, it could be furthered sped up through the use of covered queries.


Another option is to denormalize the data. You would do this by duplicating the Employee data that you need to access at the same time as the Technology data.

Sample Documents

[
  {
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "john.doe@gmail.com",
    .....
    "technologyCd" : "Java",
    "technologyName" : "Java8",
    ....
    "status" : "A"
  },
  {
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "john.doe@gmail.com",
    .....
    "technologyCd" : "Spring",
    "technologyName" : "Spring Boot2",
    ....
    "status" : "A"
  }
]

In this MongoDB blog post,they say that

You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic.


Or as you've already mentioned, it may make sense to leave the data model as it is and to perform the check for uniqueness on the application side. This could likely give you the best read performance, but it does come with some disadvantages. First, it will slow down write operations because the application will need to run some checks before it can update the database.

It may be unlikely, but there is also a possibility that you could still end up with duplicates. If there are two back-to-back requests to insert the same EmployeeTechnology object into the array, then the validation of the second request may finish (and pass) before the first request has written to the database. I have seen a similar scenario myself with an application I worked on. Even though the application was checking for uniqueness, if a user double-clicked a submit button there would end up being duplicate entries in the database. In this case, disabling the button on the first click drastically reduced the risk. This small risk may be tolerable, depending on your requirements and the impact of having duplicate entries.


Which approach makes the most sense largely depends on your access pattern and requirements. Hope this helps.

这篇关于Spring Data Mongo - 在嵌入式文档中应用唯一的组合字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆