在固定间隔内生成不同的随机时间 [英] Distinct random time generation in the fixed interval

查看:42
本文介绍了在固定间隔内生成不同的随机时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图为从数据集中选择的每一行生成上午 8:00 到晚上 8:00 之间的随机时间,但是,我总是得到 每行相同的随机值——我希望每一行都不同.

I'm trying to generate a random time between 8:00 AM and 8:00 PM for each row that is selected from a data set, however, I always get the same random value for each row – I want it to be different for each row.

表模式 &数据:

╔══════╦════════════════╗
║  ID  ║  CREATED_DATE  ║
╠══════╬════════════════╣
║ ID/1 ║   26/04/2014   ║
║ ID/2 ║   26/04/2014   ║
║ ID/3 ║   26/04/2014   ║
║ ID/4 ║   26/04/2014   ║
║ ID/5 ║   26/04/2014   ║
╚══════╩════════════════╝

当前的SQL语句:

SELECT [ID]
     , MyFunction.dbo.AddWorkDays(14, [CREATED_DATE]) AS [New Date]
     , CONVERT(VARCHAR, DATEADD(MILLISECOND, CAST(43200000 * RAND() AS INT), CONVERT(TIME, '08:00')), 114) AS [New Time]
FROM [RandomTable]

当前结果([New Time] 列中每一行的相同时间):

Current results (same time for each row in the [New Time] column):

╔══════╦════════════════╦════════════════╗
║  ID  ║    New Date    ║    New Time    ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║   10/05/2014   ║    09:41:43    ║
║ ID/2 ║   10/05/2014   ║    09:41:43    ║
║ ID/3 ║   10/05/2014   ║    09:41:43    ║
║ ID/4 ║   10/05/2014   ║    09:41:43    ║
║ ID/5 ║   10/05/2014   ║    09:41:43    ║
╚══════╩════════════════╩════════════════╝

预期结果([New Time] 列中每一行的不同时间):

Desired results (different time for each row in the [New Time] column):

╔══════╦════════════════╦════════════════╗
║  ID  ║    New Date    ║    New Time    ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║   10/05/2014   ║    09:41:43    ║
║ ID/2 ║   10/05/2014   ║    15:05:23    ║
║ ID/3 ║   10/05/2014   ║    10:01:05    ║
║ ID/4 ║   10/05/2014   ║    19:32:45    ║
║ ID/5 ║   10/05/2014   ║    08:43:15    ║
╚══════╩════════════════╩════════════════╝

关于如何解决这个问题的任何想法?以上所有只是示例数据——我的真实表有大约 2800 条记录(不确定这是否会对任何人的建议产生影响).

Any ideas on how to fix this? All of the above is just sample data – my real table has around 2800 records (not sure if that will make a difference to anyone's suggestions).

推荐答案

原题解读:

问题说明:

  • 在上午 8:00 到晚上 8:00(即 12 小时窗口)之间生成随机时间
  • 它应该是不同的每一行(即在所有行中都是唯一的)
  • 真正的表有大约 2800 条记录
  • Generate a random time between 8:00 AM and 8:00 PM (i.e. a 12-hour window)
  • It should be different for each row (i.e. unique across all rows)
  • The real table has around 2800 records

现在考虑以下几点:

  • 示例数据仅显示一个日期
  • 24 小时有 86,400 秒,因此 12 小时有 43,200 秒

以下方面存在一些歧义:

There is some ambiguity in the following areas:

  • 在每行不同的"上下文中究竟什么是随机的,因为不能保证每一行的真正随机值都不同.事实上,真正随机数可以理论上对于每一行都是相同.那么是强调随机"还是不同"?或者我们真的在谈论不同但没有按顺序排列的(为了呈现随机性而不是实际上是随机的)?
  • 如果行数超过 2800 行怎么办?如果有 100 万行会怎样?
  • 如果可以有超过 43,200 行,如何处理每行不同的"(因为不可能在所有行中都具有唯一性)?
  • 日期会有所不同吗?如果是这样,我们是否真的在谈论每个日期的每一行都不同"?
  • 如果每行每个日期不同":
    • 每个日期的时间是否可以遵循相同的非顺序模式?还是每个日期的模式都需要不同?
    • 任何特定日期都会有超过 43,200 行吗?如果是这样,则时间只能是唯一的每组 43,200 行.
    • What exactly is random within the context of "different for every row", given that truly random values cannot be guaranteed to be different for every row. In fact, truly random numbers could theoretically be the same for every row. So is the emphasis on "random" or "different"? Or are we really talking about different but not sequentially ordered (to give the appearance of randomness without actually being random)?
    • What if there are ever more than 2800 rows? What if there are 1 million rows?
    • If there can be more than 43,200 rows, how to handle "different for each row" (since it is not possible to have unique across all rows)?
    • Will the date ever vary? If so, are we really talking about "different for each row per date"?
    • If "different for each row per date":
      • Can the times for each date follow the same, non-sequential pattern? Or does the pattern need to differ per each date?
      • Will there ever be more than 43,200 rows for any particular date? If so, the times can only be unique per each set of 43,200 rows.

      根据上述信息,有几种方法可以解释请求:

      Given the information above, there are a few ways to interpret the request:

      1. 强调随机":日期和行数无关紧要.使用其他答案中显示的三种方法之一生成很可能但不保证唯一的真正随机时间:
        • @notulysses: RAND(CAST(NEWID() AS VARBINARY)) * 43200
        • @Steve Ford: ABS(CHECKSUM(NewId()) % 43201)
        • @Vladimir Baranov : CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int)/4294967295.0 + 0.5) as int)
      1. Emphasis on "random": Dates and number of rows don't matter. Generate truly random times that are highly likely, but not guaranteed, to be unique using one of the three methods shown in the other answers:
        • @notulysses: RAND(CAST(NEWID() AS VARBINARY)) * 43200
        • @Steve Ford: ABS(CHECKSUM(NewId()) % 43201)
        • @Vladimir Baranov : CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int)

      因此,我的回答基于以下想法:

      Hence, I based my answer on the idea that:

      • 即使 OP 的行数从不超过 2800,大多数其他遇到类似随机性需求的人也更有可能拥有更大的数据集(即很容易有 100 万行,对于任意数量的日期:1、5000 等)
      • 样本数据在对所有 5 行使用相同日期时过于简单,或者即使在这种特殊情况下所有行的日期都相同,在大多数其他情况下这种情况不太可能发生
      • 独特性优于随机性
      • 如果每个日期的秒数的看似随机"排序存在某种模式,那么至少应该有一个不同的偏移量(当日期按顺序排列时)到任何一小组日期之间的随机性.

      如果情况需要唯一的时间,那么任何生成真正随机值的方法都不能保证这一点.我真的很喜欢@Vladimir Baranov 对 CRYPT_GEN_RANDOM 的使用,但几乎不可能生成一组唯一的值:

      If the situation requires unique times, that cannot be guaranteed with any method of generating truly random values. I really like the use of CRYPT_GEN_RANDOM by @Vladimir Baranov, but it is nearly impossible to get a unique set of values generated:

      DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
      
      INSERT INTO @Table (Col1)
          SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(4))
          FROM [master].sys.objects so
          CROSS JOIN [master].sys.objects so2
          CROSS JOIN [master].sys.objects so3;
          -- 753,571 rows
      

      将随机值增加到 8 个字节似乎确实有效:

      Increasing the random value to 8 bytes does seem to work:

      DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
      
      INSERT INTO @Table (Col1)
          SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(8))
          FROM [master].sys.objects so
          CROSS JOIN [master].sys.objects so2
          CROSS JOIN [master].sys.objects so3;
          -- 753,571 rows
      

      当然,如果我们生成到第二个,那么只有 86,400 个.缩小范围似乎有所帮助,因为以下方法偶尔会奏效:

      Of course, if we are generating down to the second, then there are only 86,400 of those. Reducing the scope seems to help as the following does occasionally work:

      DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
      
      INSERT INTO @Table (Col1)
          SELECT TOP (86400) CONVERT(BIGINT, CRYPT_GEN_RANDOM(4))
          FROM [master].sys.objects so
          CROSS JOIN [master].sys.objects so2
          CROSS JOIN [master].sys.objects so3;
      

      但是,如果独特性需要每天,事情会变得有点棘手(这似乎是此类项目的合理要求,而不是所有日子都是独特的).但是随机数生成器不会知道在每个新的一天重置.

      However, things get a bit trickier if the uniqueness needs per each day (which seems like a reasonable requirement of this type of project, as opposed to unique across all days). But a random number generator isn't going to know to reset at each new day.

      如果仅仅具有随机的外观是可以接受的,那么我们可以保证每个日期的唯一性,而无需:

      If it is acceptable to merely have the appearance of being random, then we can guarantee uniqueness per each date without:

      • 循环/游标结构
      • 将已使用的值保存在表格中
      • 使用 RAND()NEWID()CRYPT_GEN_RANDOM()

      以下解决方案使用了的概念我在这个答案中了解到的模块化乘法逆(MMI):在 SQL Server 中生成看似随机的唯一数字 ID .当然,这个问题并没有像我们这里每天只有 86,400 个这样的严格定义的值范围.因此,我使用了 86400 的范围(作为模数")并在 在线计算器获得他们的MMI:

      The following solution uses the concept of Modular Multiplicative Inverses (MMI) which I learned about in this answer: generate seemingly random unique numeric ID in SQL Server . Of course, that question did not have a tightly-defined range of values like we have here with only 86,400 of them per day. So, I used a range of 86400 (as "Modulo") and tried a few "coprime" values (as "Integer") in an online calculator to get their MMIs:

      • 13 (MMI = 39877)
      • 37 (MMI = 51373)
      • 59 (MMI = 39539)

      我在 CTE 中使用 ROW_NUMBER(),按 CREATED_DATE 分区(即分组)作为为一天中的每一秒分配一个值的方法.

      I use ROW_NUMBER() in a CTE, partitioned (i.e. grouped) by CREATED_DATE as a means of assigning each second of the day a value.

      但是,虽然为 0、1、2、... 等秒生成的值会按顺序出现随机,但在不同的日子里,该特定秒将映射到相同的值.因此,第二个 CTE(名为WhichSecond")通过将日期转换为 INT(将日期转换为 1900-01-01 的顺序偏移量)然后乘以 101 来移动每个日期的起点.

      But, while the values generated for seconds 0, 1, 2, ... and so on sequentially will appear random, across different days that particular second will map to the same value. So, the second CTE (named "WhichSecond") shifts the starting point for each date by converting the date to an INT (which converts dates to a sequential offset from 1900-01-01) and then multiply by 101.

      DECLARE @Data TABLE
      (
        ID INT NOT NULL IDENTITY(1, 1),
        CREATED_DATE DATE NOT NULL
      );
      
      INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
      INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
      INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
      INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
      INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
      INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15');
      INSERT INTO @Data (CREATED_DATE) VALUES ('2016-10-22');
      INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15');
      
      ;WITH cte AS
      (
        SELECT tmp.ID,
               CONVERT(DATETIME, tmp.CREATED_DATE) AS [CREATED_DATE],
               ROW_NUMBER() OVER (PARTITION BY tmp.CREATED_DATE ORDER BY (SELECT NULL))
                            AS [RowNum]
        FROM   @Data tmp
      ), WhichSecond AS
      (
        SELECT cte.ID,
               cte.CREATED_DATE,
               ((CONVERT(INT, cte.[CREATED_DATE]) - 29219) * 101) + cte.[RowNum]
                            AS [ThisSecond]
        FROM   cte
      )
      SELECT parts.*,
             (parts.ThisSecond % 86400) AS [NormalizedSecond], -- wrap around to 0 when
                                                               -- value goes above 86,400
             ((parts.ThisSecond % 86400) * 39539) % 86400 AS [ActualSecond],
             DATEADD(
                       SECOND,
                       (((parts.ThisSecond % 86400) * 39539) % 86400),
                       parts.CREATED_DATE
                    ) AS [DateWithUniqueTime]
      FROM WhichSecond parts
      ORDER BY parts.ID;
      

      返回:

      ID  CREATED_DATE  ThisSecond  NormalizedSecond  ActualSecond  DateWithUniqueTime
      1   2014-10-05    1282297     72697             11483         2014-10-05 03:11:23.000
      2   2014-10-05    1282298     72698             51022         2014-10-05 14:10:22.000
      3   2014-10-05    1282299     72699              4161         2014-10-05 01:09:21.000
      4   2014-10-05    1282300     72700             43700         2014-10-05 12:08:20.000
      5   2014-10-05    1282301     72701             83239         2014-10-05 23:07:19.000
      6   2015-03-15    1298558      2558             52762         2015-03-15 14:39:22.000
      7   2016-10-22    1357845     61845             83055         2016-10-22 23:04:15.000
      8   2015-03-15    1298559      2559              5901         2015-03-15 01:38:21.000
      

      <小时>

      如果我们只想生成上午 8:00 到晚上 8:00 之间的时间,我们只需要做一些小调整:


      If we want to only generate times between 8:00 AM and 8:00 PM, we only need to make a few minor adjustments:

      1. 将范围(如模数")从 86400 更改为其一半:43200
      2. 重新计算 MMI(可以使用与整数"相同的互质"值):39539(与之前相同)
      3. 28800 添加到 DATEADD 的第二个参数中,作为 8 小时的偏移量
      1. Change the range (as "Modulo") from 86400 to half of it: 43200
      2. Recalculate the MMI (can use the same "coprime" values as "Integer"): 39539 (same as before)
      3. Add 28800 to the second parameter of the DATEADD as an 8 hour offset

      结果将只更改一行(因为其他行是诊断性的):

      The result will be a change to just one line (since the others are diagnostic):

      -- second parameter of the DATEADD() call
      28800 + (((parts.ThisSecond % 43200) * 39539) % 43200)
      

      <小时>

      另一种以不太可预测的方式改变每一天的方法是利用 RAND() 通过在WhichSecond"中传入 CREATED_DATE 的 INT 形式CTE.这将为每个日期提供一个稳定的偏移量,因为 RAND(x) 将为传入的 x 的相同值返回相同的值 y,但对于传入的 x 的不同值,将返回不同的值 y. 含义:


      Another means of shifting each day in a less predictable fashion would be to make use of RAND() by passing in the INT form of CREATED_DATE in the "WhichSecond" CTE. This would give a stable offset per each date since RAND(x) will return the same value y for the same value of x passed in, but will return a different value y for a different value of x passed in. Meaning:

      兰德(1) = y1
      兰德(2) = y2
      兰德(3) = y3
      兰德(2) = y2

      RAND(1) = y1
      RAND(2) = y2
      RAND(3) = y3
      RAND(2) = y2

      第二次调用 RAND(2) 时,它仍然返回与第一次调用时相同的 y2 值.

      The second time RAND(2) was called, it still returned the same value of y2 that it returned the first time it was called.

      因此,WhichSecond"CTE 可能是:

      Hence, the "WhichSecond" CTE could be:

      (
        SELECT cte.ID,
               cte.CREATED_DATE,
               (RAND(CONVERT(INT, cte.[CREATED_DATE])) * {some number}) + cte.[RowNum]
                            AS [ThisSecond]
        FROM   cte
      )
      

      这篇关于在固定间隔内生成不同的随机时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆