使用日期时间浮点表示法作为主键 [英] Using datetime float representation as primary key

查看:190
本文介绍了使用日期时间浮点表示法作为主键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据我的经验,我已经知道使用一个代理INT数据类型列作为主键尤其如此。 IDENTITY键列提供比使用GUID或char / varchar数据类型列作为主键更好的性能。
尽可能使用IDENTITY键作为主键。但是最近我遇到了一个模式,在这个模式中,表是水平分区的,并且通过分区视图进行管理。所以这些表不能有一个IDENTITY列,因为这会使分区视图不可更新。其中一个解决方法是创建一个具有标识列的虚拟keygenerator表以生成主键的ID。但是这意味着每个分区视图都有一个keygenerator表。
我的下一个想法是使用float作为主键。原因是我设计的以下关键算法:

pre $ DECLARE @KEY FLOAT

SET @KEY = CONVERT(FLOAT,GETDATE())/ 100000.0

SET @KEY = @EMP_ID + @KEY

如何工作。

CONVERT(FLOAT,GETDATE())

datetime,因为内部所有日期时间都由SQL表示为浮点值。

  CONVERT(FLOAT,GETDATE())/ 100000.0 

将float表达式转换为完整的十进制值,即将所有数字都推到。的右侧。 p>

  @KEY = @EMP_ID + @KEY 

添加了这个十进制值的整数Employee ID。

逻辑是Employee ID保证是唯一的因为员工无法同时连接多个应用程序。对于同一位员工,每次生成密钥时,当前的日期时间将是唯一的。



跨越所有员工会话和跨时间的所有唯一键。



因此对于Emp Ids 11和12,我有如12.40046693321566357,11.40046693542361111这样的键值,但是我关心的是float数据类型作为与选择GUID或char / varchar作为主键相比的主要优势。同样重要的是因为分区float列将成为组合键的一部分。

解决方案


<同样重要的是因为分区float列将成为组合键的一部分。

什么?为什么?为了使这种以员工/时间为基础的价值观独一无二,您已经经历了巨大的痛苦,您还需要在主要关键点上做些什么?而在这个问题的另一方面,你的密钥的其他组件是否已经唯一?如果是这样,为什么不使用它们?



你的计划在我口中留下了一个不好的口味。我不太清楚为什么,因为我越想越看起来越坚实。


  • 起初我担心表现。但是一个浮点数只有8个字节(假设你的DBMS使用IEEE 754双精度),这并不是那么大。这并不比64位整数作为关键字还是两个32位整数差。您的密钥生成过程是唯一可能放慢速度的过程,但即便如此,也不是太多。

  • 然后我担心独特性。这个方案不保证你不会生成两次相同的密钥。但是,如果断言用户和日期时间的组合是唯一的,那么这可能实际上是有效的:


    • IEEE 754 double有53位精度。 / li>
    • 日期时间将使用42位。假设:


      • 日期时间的分辨率是1/300秒(3.33 ... ms)。至少在MS SQL Server中是这样。
      • ceiling(log 2 (86400 * 300 * 100000))= 42

    • 这会为您的员工ID留下9位数据。如果雇员ID大于511,那么你将失去一部分日期时间,但是它将在毫秒级。您的员工编号可能会达到131071,然后您的准确度将会超过一秒。


    • 然后我担心查找键值稍后。给定0.2!= 0.1 + 0.1的问题,浮点平等的担心总是浮现在脑海。但是没有理由对这个键值进行任何计算,并且可能在任何给定的时间(在表格中,在存储的proc变量中,或者在可执行文件中的变量中)都是IEEE 754双重格式。它不应该改变,可以被视为一个独特的64位值。



    考虑到所有这些,你的方案确实显得相对安全。 Edoode的建议关于不聚集索引是一个很好的一,考虑到这一点,以及我的警告以上关于你的员工ID的大小,你可以使用这种方案生成主键,以及任何其他方法。

    我仍然质疑是否是最好的方法,或者甚至是必要的。



    • 您可以,如你所建议的,在另一个表中保留一个连续的密钥种子。你只需要一张表,而不是每个分区一张表。你只需要在这个表中有两列:一个用于分区号,另一个用于当前分区的标识值。

    • 使用一个GUID或者varchar主键不是没有问题的。许多人在许多不同的桌子上这样做。它不会 你的表现。如果您的组合键已经包含了员工ID,那么您可以使用这个组合键。只需添加一个日期时间列的密钥,并称之为一天。或者,如果没有,你可以添加两列。




    HTH


    From my experience I have learn that using an surrogate INT data type column as primary key esp. an IDENTITY key column offers better performance than using GUID or char/varchar data type column as primary key. I try to use IDENTITY key as primary key wherever possible. But recently I came across a schema where the tables were horizontally partitioned and were managed via a Partitioned view. So the tables could not have an IDENTITY column since that would make the Partitioned View non updatable. One work around for this was to create a dummy 'keygenerator' table with an identity column to generate IDs for primary key. But this would mean having a 'keygenerator' table for each of the Partitioned View. My next thought was to use float as a primary key. The reason is the following key algorithm that I devised

    DECLARE @KEY FLOAT
    
    SET @KEY = CONVERT(FLOAT,GETDATE())/100000.0 
    
    SET @KEY = @EMP_ID + @KEY
    
    Heres how it works.
    
    CONVERT(FLOAT,GETDATE()) 
    

    gives float representation of current datetime since internally all datetime are represented by SQL as a float value.

    CONVERT(FLOAT,GETDATE())/100000.0 
    

    converts the float representation into complete decimal value i.e. all digits are pushed to right side of ".".

    @KEY = @EMP_ID + @KEY
    

    adds the Employee ID which is an integer to this decimal value.

    The logic is that the Employee ID is guaranteed to be unique across sessions since an employee cannot connect to an application more than once at the same time. And for the same employee each time a key will be generated the current datetime will be unique.

    In all an unique key across all employee sessions and across time.

    So for Emp Ids 11 and 12, I have key values like 12.40046693321566357, 11.40046693542361111

    But my concern whether float data type as primary key offer benefits compared to choosing GUID or char/varchar as primary keys. Also important thing is because of partitioning the float column is going to be part of a composite key.

    解决方案

    Also important thing is because of partitioning the float column is going to be part of a composite key.

    What? Why? You've gone through great pains in attempt to make this employee/time based value unique, what else would you need in the primary key? And on the other side of that question, are the other components of your key unique already? If so, why not just use them?

    Your scheme leaves a bad taste in my mouth. I'm not quite sure why, though, because, the more I think about it, the more solid it seems.

    • At first I worried about performance. But a float is just 8 bytes (assuming your DBMS uses IEEE 754 double), which just isn't all that big. That's no worse than having a 64-bit integer as a key, or two 32-bit ints. Your key generation process is the only thing that might be slowed down, but even that not by much.
    • I then worried about uniqueness. This scheme doesn't guarantee that you won't generate the same key twice. But given your assertion that the combination of user and datetime will be unique, then this might actually work:
      • An IEEE 754 double has 53 bits of precision.
      • The datetime will use 42 bits. Assumptions:
        • Resolution of datetime is 1/300 second (3.33... ms). This is true for MS SQL Server, at least.
        • ceiling(log2(86400 * 300 * 100000)) = 42
      • This leaves 9 bits for your employee ID. If the employee ID is greater than 511, then you will lose part of the datetime, but it will be on the order of milliseconds. Your employee ID can reach 131071 before you will lose accuracy of more than a second.
    • I then worried about the difficulty in looking up a key value later. Given the 0.2 != 0.1 + 0.1 problem, concerns of floating-point equality always come to mind. But there's no reason you would be performing any calculations on this key value, and presumably it would be in IEEE 754 double format at any given time (be it in the table, in stored proc variables, or in variables in your executable), then it should never change and can be treated as a unique 64-bit value.

    After considering all this, your scheme does appear relatively safe. Edoode's suggestion about not clustering the index is a good one, and with that in mind, as well as my caveats above about the size of your employee ID, you can use this scheme to generate primary keys just about as well as any other method.

    I still question whether it's the best method, though, or if it's even necessary.

    • Can the other components of the composite key not be used by themselves (i.e., as a natural key)?

    • You could, as you suggest, keep a sequential key seed in another table. And you would need only one table, not one table per partition as you assume. You would simply need two columns in this table: one for the partition number, and one for the current identity value of that partition.

    • Using a GUID or varchar primary key isn't out of the question. Many people do this on many different tables. It won't kill your performance. And it might be more straight-forward, or at least more easily understood, than this scheme.

    • If your composite key already includes the employee ID, you could just add a datetime column to the key and call it a day. Or if not, you could add both columns. There's no reason you have to mash the two together.

    HTH

    这篇关于使用日期时间浮点表示法作为主键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆