如何规范化大型用户生成的公司名称数据集？ [英] how do I normalize a large, user-generated data-set of company names?

查看：136 发布时间：2017/3/22 1:37:06 database-design normalization

本文介绍了如何规范化大型用户生成的公司名称数据集？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用案例：用户1上传100个公司名称（例如Microsoft，Bank of Sierra）

Use case: User 1 uploads 100 company names (e.g. Microsoft, Bank of Sierra)

用户2上传了100个公司名称（例如差距，用户声明， Inc。）

User 2 uploads 100 company names (e.g. The Gap, Uservoice, Microsoft, Inc.)

我希望用户1的Microsoft概念和用户2的Microsoft概念映射到具有Microsoft唯一索引的集中维护的实体。

I want User 1's notion of Microsoft and User 2's notion of Microsoft to map to a centrally maintained entity with a unique index for Microsoft.

如果有人上传不在中央存储库中的名称，我想我希望按原样输入。但是，如果第一个条目拼写不正确（例如Vergin Mobile，而不是Virgin Mobile）会怎么样？）我们如何最好地纠正它并将新上传与相同的索引相关联？

If someone uploads a name which isn't in the central repository, I guess I'd like it to be entered as is. But then what happens if that first entry is incorrectly spelled (e.g. Vergin Mobile instead of Virgin Mobile?) How can we best correct it and correlate new uploads to that same index?

技术上，中央存储库应该是一个单独的数据库吗？甚至用户生成的信息是否应该在一个单独的数据库中，以及与之相反的业务交易？

Technically, should the central repository be a separate database altogether? Should even the user generated information be in a separate database, as well, from the business transactions that will occur against it?

从大量的问题定义开始，希望与您的输入一块，谢谢。

Starting out with a large definition of the problem and hoping to chunk it up with your input, thanks.

如何规范化大型用户生成的公司名称数据集？ [英] how do I normalize a large, user-generated data-set of company names?

问题描述

推荐答案

相关文章

其他数据库最新文章

热门教程

热门工具

登录关闭

如何规范化大型用户生成的公司名称数据集？ [英] how do I normalize a large, user-generated data-set of company names?

问题描述

推荐答案

相关文章

其他数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭