TSQL 将结果集平均划分为组并更新它们 [英] TSQL equally divide resultset to groups and update them

查看:28
本文介绍了TSQL 将结果集平均划分为组并更新它们的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据库有 3 个表,如下所示:

I have my database with 3 tables like so:

订单表有如下数据:

OrderID    OperatorID    GroupID        OrderDesc    Status    Cash    ...
--------------------------------------------------------------------------
      1             1          1      small order         1     100 
      2             1          1    another order         2       0 
      3             1          2      xxxxxxxxxxx         2    1000 
      5             2          2      yyyyyyyyyyy         2     150 
      9             5          1      xxxxxxxxxxx         1       0 
     10          NULL          2      xxxxxxxxxxx         1      10 
     11          NULL          3      xxxxxxxxxxx         1     120 

运算符表:

OperatorID    Name    GroupID    Active
---------------------------------------
      1       John          1         1
      2       Kate          1         1
      4       Jack          2         1
      5       Will          1         0
      6        Sam          3         1

组表:

GroupID    Name
---------------
      1      G1
      2      G2
      3      X1

如您所见,John 有 3 个订单,Kate 1、Will 1、Jack 和 Sam 没有.

As You can see John has 3 orders, Kate 1, Will 1, Jack and Sam none.

现在我想根据一些条件为订单分配操作员:

Now I would like to assign operators to orders base on some conditions:

  • 订单必须有现金>0
  • 订单必须具有 status=1
  • 订单必须在第 1 组或第 2 组中
  • 操作员必须处于活动状态 (active=1)
  • 操作员必须在第 1 组或第 2 组中

这是我想要得到的结果:

This is the result that I would like to get:

OrderID    OperatorID    GroupID        OrderDesc    Status    Cash    ...
--------------------------------------------------------------------------
      1             1          1      small order         1     100       < change
      2             1          1    another order         2       0 
      3             2          2      xxxxxxxxxxx         2    1000       < change
      5             4          2      yyyyyyyyyyy         2     150       < change
      9             5          1      xxxxxxxxxxx         1       0 
     10             4          2      xxxxxxxxxxx         1      10       < change
     11          NULL          3      xxxxxxxxxxx         1     120 

我想打乱订单并更新operatorID,这样每次我调用这个脚本时我都会随机获得分配器operatorID,但每个操作员都会有相同的数量或订单(接近相等,因为如果我有7个订单,一个人会有3,其余 2).

I would like to shuffle orders and update operatorID so that every time I call this script I get randomly assigner operatorID, but every operator will have equal number or orders (close to equal, because if I have 7 orders one person will have 3 and rest 2).

我可以使用 NTILE 将订单分配到组中,但我需要为该组分配 operatorID.

I can use NTILE to distribute orders into groups, but I need to assign operatorID to that group.

我认为我需要做这样的事情:

I think that I need to do something like this:

SELECT NTILE(2) OVER( order by orderID desc) as newID,* 
FROM
    orders(NOLOCK)

这将使我的订单表分成相等的部分.我需要知道的是运算符表的长度(将其作为参数添加到 NTILE),之后我可以将结果与运算符连接起来(使用 row_number())

This will give me my orders table grouped into equal parts. What I need to know is length of operators table (to add it as parameter to NTILE), after that I could join my results with operators (using row_number())

有更好的解决方案吗?

我的问题再次出现:如何将结果集平均划分为多个组并使用另一个表数据更新该记录集?

到目前为止,这是我的代码:http://sqlfiddle.com/#!3/39849/25

This is my code so far: http://sqlfiddle.com/#!3/39849/25

编辑 2我更新了我的问题并添加了更多条件.

EDIT 2 I've updated my question and added more conditions.

我想根据某些条件为订单分配操作员:

I would like to assign operators to orders based on some conditions:

  • 订单必须有现金>0
  • 订单必须具有 status=1
  • 订单必须在第 1 组或第 2 组中
  • 操作员必须处于活动状态 (active=1)
  • 操作员必须在第 1 组或第 2 组中

我将此查询构建为存储过程.
因此,第一步是将具有新分配的数据生成到临时表中,并在第二步最终批准后根据该临时表更新主表.

I'm building this query as stored procedure.
So the first step will be to generate data with new assignments into temporary table and after final approval in second step to update main table based on that temp table.

我还有两个问题:

  1. 是先将所有符合条件的订单和所有算子都选到临时表中,然后进行shuffling还是在一个大查询中完成?

  1. Will it be better to first select all all orders and all operators that meets the conditions to temporary table and then do the shuffling or to do it all in one big query?

我想将数组或组作为参数传递给我的程序.哪个选项最适合将数组传递给存储过程(SQL Server 2005).

我知道这被问了很多次,但我想知道创建一个单独的函数是否更好,将逗号分隔的字符串切割到表中(http://www.sommarskog.se/arrays-in-sql-2005.html) 还是将所有内容都放在一个大的胖程序中?:)

I would like to pass array or groups as a parameter to my procedure. Which option would be the best to pass array to stored procedure (SQL Server 2005).

I know this was asked many times but I would like to know if it is better to create a separate function that will cut comma separated string into table (http://www.sommarskog.se/arrays-in-sql-2005.html) or to put everything inside one big fat procedure? :)

<小时>

最终答案:可访问 http://sqlfiddle.com/#!3/afb48/2

SELECT o.*, op.operatorName AS NewOperator, op.operatorID AS NewOperatorId
FROM (SELECT o.*, (ROW_NUMBER() over (ORDER BY newid()) % numoperators) + 1 AS randseqnum
      FROM Orders o CROSS JOIN
     (SELECT COUNT(*) AS numoperators FROM operators WHERE operators.active=1) op
      WHERE o.cash>0 and o.status in (1,3)
     ) o JOIN
     (SELECT op.*, ROW_NUMBER() over (ORDER BY newid()) AS seqnum
      FROM Operators op WHERE op.active=1
     ) op
     ON o.randseqnum = op.seqnum ORDER BY o.orderID

答案基于 Gordon 的 Linoff 答案.谢谢!

Answer based on Gordon's Linoff answer. Thanks!

推荐答案

我不确定您是真的想要更新查询还是选择查询.以下查询根据您的条件为每个订单返回一个新的运算符:

I wasn't sure if you really wanted an update query or a select query. The following query returns a new operator for each order, subject to your conditions:

/*
with orders as (select 1 as orderId, 'order1' as orderDesc, 1 as OperatorId),
     operators as (select 1 as operatorID, 'John' as name)
 */
select o.*, op.name as NewOperator, op.operatorID as NewOperatorId
from (select o.*, (ROW_NUMBER() over (order by newid()) % numoperators) + 1 as randseqnum
      from Orders o cross join
     (select COUNT(*) as numoperators from operators) op
     ) o join
     (select op.*, ROW_NUMBER() over (order by newid()) as seqnum
      from Operators op
     ) op
     on o.randseqnum = op.seqnum order by orderid 

它基本上为连接的行分配了一个新的 id.订单表获得一个介于 1 和操作员数量之间的值,随机分配.然后将其连接到运算符上的序列号.

It basically assigned a new id to the rows for the join. The order table gets a value between 1 and the number of operators, randomly assignd. This is then joined to a sequence number on the operators.

如果您需要更新,那么您可以执行以下操作:

If you need to update, then you can do something like:

with toupdate as (<above query>)
update orders
    set operatorid = newoperatorid
    from toupdate
    where toupdate.orderid = orders.orderid

你的两个问题:

是先将所有符合条件的订单和所有算子都选到临时表中,然后进行shuffling还是在一个大查询中全部完成?

临时表的用户是应用程序的性能和要求问题.如果数据正在快速更新,那么是的,使用临时表是一个很大的胜利.如果您在同一数据上多次运行随机化,那么它可能是一个胜利,特别是如果表太大而无法放入内存.否则,假设您将条件放在最内部的子查询中,则一次运行不太可能有很大的性能提升.但是,如果性能有问题,您可以测试这两种方法.

The user of temporary tables is a matter of performance and requirements for the application. If the data is being rapidly updated, then yes, using a temporary table is a big win. If you are running the randomization many, many times on the same data, then it can be a win, particularly if the tables are too big to fit in memory. Otherwise, there is not likely to be a big performance gain on a one time run, assuming you put the conditions within the innermost subqueries. However, if performance is an issue, you can test the two approaches.

我想将数组或组作为参数传递给我的程序.哪个选项最适合将数组传递给存储过程(SQL Server 2005).

嗯,切换到具有表值参数的 2008.以下是 Erland Sommarskog 关于该主题的高度参考文章:http://www.sommarskog.se/arrays-in-sql-2005.html.

Hmmm, switch to 2008 which has table valued parameters. Here is a highly reference article on the subject by Erland Sommarskog: http://www.sommarskog.se/arrays-in-sql-2005.html.

这篇关于TSQL 将结果集平均划分为组并更新它们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆