动态规范化表格是否可行? [英] Is it practical to dynamically normalize a table?

查看:101
本文介绍了动态规范化表格是否可行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们来看看我的数据库跟踪鸟类的情况(注意:我真的在桶的底部刮了个例子)。



字段为:

  sighting_id | common_name | park_name |位置|时间|等等.... 

虽然我假设一个公园将永远在同一个位置,该网站就像一个电子表格。用户为每个条目输入 park_name 位置。还请注意,我的实际模式还有其他依赖类似公园名称的字段(例如状态)。



我没有办法预先定义公园,所以我不能提前知道他们。我是否应该尝试动态规范化这些数据?例如,如果我的程序自动填充一个 parks 表,将鸟瞰图中的park_name和location列替换为 park_id



我担心的是性能,主要是。列出每一个目标都需要加入来填充公园和位置。此外,动态管理这一点几乎可以确定需要比保存更多的资源。我可能需要一个Cron工作来消除孤儿的公园,因为他们可能会被多次发现。

解决方案

这取决于一点点你的使用。标准化的方法(公园是一张表)将使以下查询变得更容易:




  • 每个公园有多少鸟的目光? li>
  • 在哪个公园你最有可能看到鸟XYZ

  • 可能还有更多的这样的查询



但是,你确实遇到了一些粘性问题。模式如果公园XYZ不存在,那么将其插入公园表遭受必须处理的竞争条件。



现在,如何关于反对正常化的一些论据在这里...大多数客户数据库可能将我的街道地址存储为123 Foo Street,而不动态地规范街道名称(我们可以有一个街道表,并将Foo Street放在那里,然后从其他地方引用表格,为什么我把它拿出来,很好地表明,即使那些讨厌任何重复的数据的人也许会承认你有一些线路不一定要交叉。



另一个愚蠢的例子是我们可能会分享姓氏,我们真的需要一个独一无二的名字的表,然后是其他表的外键吗?可能有一些应用程序是有帮助的,但99%的应用程序在这里,这太过分了,只是更多的工作,更少的表现,一点也没有收获。



所以我会考虑我如何能够查询数据退出桌面。老实说,在这种情况下,我可能会为公园做一个单独的表。但是在其他情况下,我选择的不是。



这是我的两美分,税后一分钱。


Let's say my database tracks bird sightings (Note: I'm really scraping the bottom of the barrel for examples).

The fields are:

sighting_id | common_name | park_name | location | time | etc....

Although I'm assuming that a park will always be in the same location, the website is like a spreadsheet. The user enters park_name and location for every entry. Also please note that my actual schema has other fields that are dependent on the analogous "park name" as well (e.g. state).

I do not have a way for the user to predefine parks, so I can't know them ahead of time. Should I even attempt to dynamically normalize this data? For example, should my program automatically populate a parks table, replacing the park_name and location column in the bird sighting table with a park_id?

I'm worried about performance, mostly. Listing every sighting would require a join to populate park and location. Also, dynamically managing this would almost certainty require more resources than it would save. I would probably need a Cron job to eliminate orphaned Parks, since they may be referenced in multiple sightings.

解决方案

It depends on a bit on your usage. The normalized approach (park is a table) will make the following queries easier:

  • How many bird sightings have there been for each park
  • At which park are you most likely to see bird XYZ
  • There are probably quite a few more queries like this

But yes, you do run into some sticky issues. The pattern "if park XYZ doesn't exist then insert it into the parks table" suffers from a race condition that you'll have to deal with.

Now, how about some arguments against normalization here... Most customer databases probably store my street address as "123 Foo Street", without dynamically normalizing the street name (we could have a street table and put "Foo Street" there, then reference it from other tables. Why do I bring this up, well to show that even the guys who hate any repeated data will probably acknowledge that there is some line you don't necessarily have to cross.

Another silly example would be that we might share last names. Do we really need a table for unique last names and then foreign key to it from other tables? There might be some applications where this is helpful but for 99% of application out there, this goes too far. It's just more work and less performant for little to no gain.

So I'd consider how I want to be able to query data back out of the table. Honestly in this case I'd probably do a separate table for parks. But in other cases I've chosen not to.

That's my two cents, one cent after taxes.

这篇关于动态规范化表格是否可行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆