将记录移动到另一个表与“标志”列SQL性能 [英] Moving records to another table vs. "flag" column SQL performance

查看:102
本文介绍了将记录移动到另一个表与“标志”列SQL性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个表表示任务队列(数千万条记录)的表。



任务可以排队或完成 p>

如果我们想要处理10个任务,效果会更好吗?




  • 单一表格,其中包含 ENUM / BIT / 将任务标记为已完成或未完成(最终在列上建立索引)

  • 为排队任务和已完成任务分离表,



请注意,在开始时,我们有几个或没有已完成的任务,但随着处理的进行,完成任务。

解决方案

这可能没关系,但如果是我,我会使用一个表。这是我的推理:



首先,我们必须在这个表上假设好的索引,这将使查找更快。使用适当的索引,如果要查询排队的任务,完成任务的数量是否为10亿或100亿不重要,DBMS将只查看排队的任务。



其次,当任务从排队移动到完成时,您将要更新其状态。这需要DBMS对索引进行一些重组,但是没关系,他们一直在这样做,效率高达30年之久。



如果您将它们拆分为单独的表,本质上,将记录从一个位置移动到另一个位置的维护将放在您的代码上,而不是在DBMS索引重组代码中。哪些代码库经过更好的测试和更高的性能? :)



最后一个参数 - 如果你把它放在一个大表中,那么这些任务的管理的进一步性能调整就成为一个DBMS配置问题,而不是软件开发问题。这是我的书中的一大胜利。有各种各样的疯狂配置的东西,你可以做,以提高性能在任何DBMS,包括垂直和水平分区。如果你分发数据的方式是通过嵌入你的软件的一些方案,那些东西不会是选项。



所以底线 - 如果你做2表的方法,我认为如果你做一个表的方法,它将执行非常类似,一旦你考虑到额外的工作你的代码将需要做的移动记录。如果从一个表中删除打开任务并将其粘贴到完成表中,请记住DBMS仍然必须更新源表上的打开索引。因为可能不会有很大的性能差异,所以您应该使用单表方法,因为它不太适合您,并且为您提供更多的灵活性(通过配置,软件升级速度)


Suppose I have a table that represents "task queue" (tens of millions of records).

Tasks can be "queued" or "done".

Which performs better if we want to grab 10 task to process?

  • Single table with "flag" column like ENUM / BIT / TINYINT flagging task as done or not (eventually index on column)
  • Separate tables for queued tasks, and completed task and deleting each completed task from queued with insertion to completed

Note that at the begining, we have a few or none completed tasks, but as processing is going, there will be milions of already done tasks.

解决方案

It probably doesn't matter, but if it were me, I would use the one table. Here's my reasoning:

First and foremost, we must assume good indexes on this table, which is what's going to make the lookup fast. With appropriate indexes, if you want to query for queued tasks, it won't matter whether the number of "done" tasks is 10 or 10 billion, the DBMS will only look at the queued ones.

Second, as a task moves from "queued" to "done", you're going to update its status. And this requires a bit of reorganization of the index by the DBMS, but that's OK, they've been doing that with high efficiency for something like 30 years now.

If you were to split them out into separate tables, essentially the maintenance of moving a record from one place to another would be put on your code instead of in the DBMS index reorganization code. Which of those code bases is better tested and more performant? :)

One final argument - if you put it all in one big table, further performance tweaking of the management of these tasks becomes a DBMS configuration issue, as opposed to a software development issue. That's a big win in my book. There's all sorts of crazy configuration stuff you can do to improve performance in any DBMS, including things like vertical and horizontal partitioning. Those things won't be options if the way you've distributed your data is via some scheme that's embedded in your software.

So bottom line - if you do the 2 table approach, I think it's going to perform very similarly to if you do the one table approach, once you take into account the extra work your code will have to do to move records around. If you delete an "open" task from one table and stick it into a "done" table, keep in mind the DBMS still will have to update the "open" index on the source table. Because there's likely not going to be a big performance difference, you should use the one table approach because it's less work for you, and gives you more flexibility later (speed improvements via configuration, vice software)

这篇关于将记录移动到另一个表与“标志”列SQL性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆