从大表中获取不同值的最佳方法 [英] Best way to get distinct values from large table

查看:38
本文介绍了从大表中获取不同值的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约有 10 列左右的 db 表,其中两列是月份和年份.该表现在有大约 250k 行,我们预计它每月会增加大约 100-150k 条记录.许多查询涉及月份和年份列(例如,2010 年 3 月的所有记录),因此我们经常需要获取可用的月份和年份组合(即我们是否有 2010 年 4 月的记录?).

I have a db table with about 10 or so columns, two of which are month and year. The table has about 250k rows now, and we expect it to grow by about 100-150k records a month. A lot of queries involve the month and year column (ex, all records from march 2010), and so we frequently need to get the available month and year combinations (ie do we have records for april 2010?).

一位同事认为我们应该有一个与主表不同的表,其中只包含我们拥有数据的月份和年份.我们每个月只向主表添加一次记录,因此将新条目添加到第二个表中只是脚本末尾的一个小更新.每当我们需要在第一个表中找到可用的月/年条目时,就会查询第二个表.这个解决方案对我来说很笨拙,并且违反了 DRY.

A coworker thinks that we should have a separate table from our main one that only contains the months and years we have data for. We only add records to our main table once a month, so it would just be a small update on the end of our scripts to add the new entry to this second table. This second table would be queried whenever we need to find the available month/year entries on the first table. This solution feels kludgy to me and a violation of DRY.

您认为解决这个问题的正确方法是什么?还有比拥有两张桌子更好的方法吗?

What do you think is the correct way of solving this problem? Is there a better way than having two tables?

推荐答案

确保在这些列上有聚集索引.并在这些日期列上对表进行分区,将数据文件放在不同的磁盘驱动器上我相信保持较低的索引碎片是最好的选择.

Make sure to have an Clustered Index on those columns. and partition your table on these date columns an place the datafiles on different disk drives I Believe keeping your index fragmentation low is your best shot.

我也相信使用所需选择的物理视图不是一个好主意,因为它增加了插入/更新开销.平均每分钟有 3.5 次插入.或每次插入之间大约 17 秒(平均而言,如果我错了,请纠正我)

I also Believe having a physical view with the desired select is not a good idea, because it adds Insert/Update overhead. on average there's 3,5 insert's per minute. or about 17 seconds between each insert (on average please correct me if I'm wrong)

问题是您选择的频率是否超过每 17 秒一次?这就是关键思想.希望有帮助.

The question is are you selecting more often than every 17 seconds? That's the key thought. Hope it helped.

这篇关于从大表中获取不同值的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆