处理“超正规化”数据 [英] Dealing with "hypernormalized" data

查看:195
本文介绍了处理“超正规化”数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的雇主是一家小型办公用品公司,正在转换供应商,我正在查看他们的电子内容,提出一个强大的数据库模式;我们以前的模式几乎只是抛在脑后,没有任何想法,它几乎导致一个无法忍受的数据模型,具有腐败,不一致的信息。



新的供应商的数据比旧的好多了,但是他们的数据就是我称之为超正常化的数据。例如,他们的产品类别结构有5个级别:硕士,部门,类,子类,产品块。此外,产品块内容具有较长的描述,产品的搜索术语和图像名称(其想法是产品块包含产品和所有变体 - 例如特定的笔可能会出现黑色,蓝色或红色墨水;所有这些项目基本上是一样的,所以它们适用于单个产品块)。在我给出的数据中,这表示为产品表(我说表,但它是一个带有数据的平面文件)引用了产品块的唯一ID。



我试图提出一个强大的模式来适应我提供的数据,因为我需要相对较快地加载数据,而且他们给我的数据似乎没有匹配他们在示例网站上提供的用于演示的数据类型( http://www.iteminfo.com ) 。无论如何,我不想重用他们的演示文稿结构,所以这是一个模糊的观点,但是我正在浏览网站以获得一些如何构建事物的想法。



我不确定的是我是否应该保留这种格式的数据,或者例如使用自引用关系将Master / Department / Class / Subclass整合到一个Categories表中,并链接到产品块(产品块应保持分开,因为它不是类别,而是一组给定类别的相关产品)。目前,产品块表引用了子类表,所以如果我们将它们合并在一起,这将变为category_id。



我可能要创建一个e-商业店面利用Ruby on Rails使用这些数据(或者这是我的计划,无论如何),所以我试图避免后来陷入困境或有膨胀的应用程序 - 也许我给它太多的想法,但我' d相当安全而不是抱歉;我们以前的数据是一个真正的混乱,由于不一致和不准确的数据,公司成本损失了数万美元。此外,我将通过确保我的数据库稳健并强制执行约束(我也计划在应用程序级别进行),我将从Rails约定中解脱出来,这也是我需要考虑的一个问题。 >

您将如何处理这样的情况?请记住,我已经将数据加载到模拟表结构的平面文件中(我有文档说明哪些列是哪些和什么参考设置);我试图决定我是否应该保持他们现在的规范化,或者我应该寻求巩固;我需要知道每个方法如何影响我使用Rails编程站点的方式,因为如果我整合,单个表中将会有基本上4个级别的类别,但是绝对比单独的表更易于管理每个级别,除了Subclass(直接链接到产品块)之外,它们不会执行任何操作,除非显示下一级别的类别。我总是以最好的方式来处理这样的数据 - 我知道这句话:规范化直到它受到伤害,然后对它进行非规范化,但直到现在我从来没有实现过。 p>

解决方案

我宁愿在超正态数据模型中使用超正规化方法。您提到的自引用表可能会减少表的数量,并以某种方式简化生命,但一般来说,这种类型的关系可能难以处理。分层查询变得很痛苦,像将对象模型映射到这个(如果你决定去那条路线)也是如此。



几个额外的连接不会受伤,将使应用程序更加可维护。除非因连接数量过多而导致性能下降,否则我会选择离开它们。另外,如果这些级别的表需要添加额外的功能,您将不会遇到问题,因为您将它们全部合并到自引用表中。


My employer, a small office supply company, is switching suppliers and I am looking through their electronic content to come up with a robust database schema; our previous schema was pretty much just thrown together without any thought at all, and it's pretty much led to an unbearable data model with corrupt, inconsistent information.

The new supplier's data is much better than the old one's, but their data is what I would call hypernormalized. For example, their product category structure has 5 levels: Master Department, Department, Class, Subclass, Product Block. In addition the product block content has the long description, search terms and image names for products (the idea is that a product block contains a product and all variations - e.g. a particular pen might come in black, blue or red ink; all of these items are essentially the same thing, so they apply to a single product block). In the data I've been given, this is expressed as the products table (I say "table" but it's a flat file with the data) having a reference to the product block's unique ID.

I am trying to come up with a robust schema to accommodate the data I'm provided with, since I'll need to load it relatively soon, and the data they've given me doesn't seem to match the type of data they provide for demonstration on their sample website (http://www.iteminfo.com). In any event, I'm not looking to reuse their presentation structure so it's a moot point, but I was browsing the site to get some ideas of how to structure things.

What I'm unsure of is whether or not I should keep the data in this format, or for example consolidate Master/Department/Class/Subclass into a single "Categories" table, using a self-referencing relationship, and link that to a product block (product block should be kept separate as it's not a "category" as such, but a group of related products for a given category). Currently, the product blocks table references the subclass table, so this would change to "category_id" if I consolidate them together.

I am probably going to be creating an e-commerce storefront making use of this data with Ruby on Rails (or that's my plan, at any rate) so I'm trying to avoid getting snagged later on or having a bloated application - maybe I'm giving it too much thought but I'd rather be safe than sorry; our previous data was a real mess and cost the company tens of thousands of dollars in lost sales due to inconsistent and inaccurate data. Also I am going to break from the Rails conventions a little by making sure that my database is robust and enforces constraints (I plan on doing it at the application level, too), so that's something I need to consider as well.

How would you tackle a situation like this? Keep in mind that I have the data to be loaded already in flat files that mimic a table structure (I have documentation saying which columns are which and what references are set up); I'm trying to decide if I should keep them as normalized as they currently are, or if I should look to consolidate; I need to be aware of how each method will affect the way I program the site using Rails since if I do consolidate, there will be essentially 4 "levels" of categories in a single table, but that definitely seems more manageable than separate tables for each level, since apart from Subclass (which directly links to product blocks) they don't do anything except show the next level of category under them. I'm always a loss for the "best" way to handle data like this - I know of the saying "Normalize until it hurts, then denormalize until it works" but I've never really had to implement it until now.

解决方案

I would prefer the "hypernormalized" approach over a denormal data model. The self referencing table you mentioned might reduce the number of tables down and simplify life in some ways, but in general this type of relationship can be tricky to deal with. Hierarchical queries become a pain, as does mapping an object model to this (if you decide to go that route).

A couple of extra joins is not going to hurt and will keep the application more maintainable. Unless performance degrades due to the excessive number of joins, I would opt to leave things like they are. As an added bonus if any of these levels of tables needed additional functionality added, you will not run into issues because you merged them all into the self referencing table.

这篇关于处理“超正规化”数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆