多行最有效(快速)T-SQL DELETE? [英] Most Efficient (Fast) T-SQL DELETE For Many Rows?

查看:51
本文介绍了多行最有效(快速)T-SQL DELETE?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的服务器应用程序以每秒 1000-2000 行的速度接收有关要添加到数据库中的行的信息,一整天.表中有两个互斥的列唯一标识一行:一个是称为tag"的数字标识符,另一个是称为longTag"的 50 个字符的字符串.一行可以有一个标签或一个 longTag;不能两者兼而有之.

Our server application receives information about rows to add to the database at a rate of 1000-2000 rows per second, all day long. There are two mutually-exclusive columns in the table that uniquely identify a row: one is a numeric identifier called 'tag' and the other is a 50character string called 'longTag'. A row can have either a tag or a longTag; not both.

从套接字进入的每一行可能已经存在,也可能不存在于表中.如果存在,则必须使用新信息更新该行.如果不存在,则必须添加.我们使用的是 SQL 2005,在少数情况下甚至是 SQL 2000,所以我们不能使用新的 MERGE 关键字.

Each row that comes in off the socket may or may not already exist in the table. If it exists, that row must be updated with the new information. If it doesn't exist, it must be added. We are using SQL 2005 and in a few cases even SQL 2000, so we cannot use the new MERGE keyword.

我现在这样做的方法是构建一个巨大的 DELETE 语句,如下所示:

The way I am doing this now is to build a giant DELETE statement that looks like this:

DELETE from MyRecords
WHERE tag = 1
OR tag = 2
OR longTag = 'LongTag1'
OR tag = 555

...其中每个传入的行都有自己的 'OR tag = n' 或 'OR longTag = 'x'' 子句.

...where each incoming row has its own 'OR tag = n' or 'OR longTag = 'x'' clause.

然后我使用 ISQLXMLBulkLoad 执行 XML 批量加载以一次加载所有新记录.

Then I perform an XML Bulk Load using ISQLXMLBulkLoad to load all the new records at once.

巨大的 DELETE 语句有时会超时,需要 30 秒或更长时间.我不知道为什么.

The giant DELETE statement sometimes times out, taking 30 seconds or longer. I'm not sure why.

当记录从套接字进入时,它们要么被插入,要么必须替换现有的行.我这样做的方式是最好的方式吗?

As records come in off the socket they must either be inserted or they must replace existing rows. Is the way I'm doing it the best way to do it?

编辑:新行与替换行的比率将非常倾向于新行.在我看到的来自生产的数据中,每次更正通常会有 100-1000 行新行.

EDIT: The ratio of new rows to replacement rows is going to be very heavily slanted toward new rows. In data I have seen coming from production, there will typically be 100-1000 new rows for each correction.

EDIT 2:插入和删除都必须作为单个事务处理.如果插入或删除失败,它们都必须回滚,使表处于插入和删除之前的相同状态.删除开始.

EDIT 2: Both the inserts and the deletes must be handled as a single transaction. If either the insert or the delete fails, they must both be rolled back, leaving the table in the same state it was in before the inserts & deletes began.

EDIT 3:关于 NULL 标签.我需要先简单地描述一下这个系统.这是一个交易系统的数据库.MyTable 是一个包含两种交易的交易表:所谓的日间交易"和所谓的开仓".日内交易只是交易——如果您是期权交易者并且您进行了交易,那么该交易将是该系统中的日内交易.开仓头寸基本上是迄今为止您的投资组合的摘要.开仓头寸和日内交易都存储在同一个表中.日内交易有标签(longTags 或数字标签),而开仓没有.开仓头寸可以有重复的行——这很好&普通的.但是日间交易不能有重复的行.如果日内交易的标签与数据库中已有的记录相同,则表中的数据将替换为新数据.

EDIT 3: Regarding NULL tags. I need to first briefly describe the system a little more. This is a database for a trading system. MyTable is a trades table containing two kind of trades: so-called "day trades" and so-called "opening positions." Day trades are simply trades -- if you were an options trader and you did a trade, that trade would be a day trade in this system. Opening positions are basically a summary of your portfolio up until today. Both opening positions and day trades are stored in the same table. Day trades have tags (either longTags or numeric tags), and opening positions do not. There can be duplicate rows for opening positions -- that is fine & normal. But there cannot be duplicate rows for day trades. If a day trade comes in with the same tag as some record already in the database, then the data in the table is replaced with the new data.

所以标签 & 中的值有 4 种可能性.长标签:

So there are 4 possibilities for the values in tag & longTag:

1) 标签是非零的 &longTag 为空:这是一个带有数字标识符的日间交易.2) tag 为零且 longTag 具有非空字符值.这是一个带有字母数字标识符的日间交易.3) tag 为零且 longTag 为空:这是一个开仓头寸.4) tag 非零且 longTag 具有非空字符值.我们的服务器软件可以防止这种情况发生,但如果发生这种情况,则 longTag 将被忽略,并将与案例 #1 一样对待.同样,这不会发生.

1) tag is non-zero & longTag is empty: this is a day trade with a numeric identifier. 2) tag is zero and longTag has a non-empty character value. This is a day trade with an alphanumeric identifier. 3) tag is zero and longTag is empty: this is an opening position. 4) tag is non-zero and longTag has a non-empty character value. This is prevented from every happening by our server software, but if it were to happen the longTag would be ignored and it would be treated the same as case #1. Again, this does not happen.

推荐答案

OR(或 in)几乎就像每个 OR 操作数都是不同的查询一样.也就是变成了表扫描,对于每一行,数据库都要将每个OR操作数作为谓词进行测试,直到找到匹配项或操作数用完为止.

An OR (or an in) almost works as if each OR operand is a different query. That is, it turns into a table scan, and for each row, the database has to test each OR operand as a predicate, until it finds a match or runs out of operands.

将其打包的唯一原因是使其成为一个逻辑工作单元.您也可以将一堆删除操作包装在一个事务中,并且只有在所有操作都成功完成后才提交.

The only reason to package this up is to make it one logical unit of work. You could also wrap a bunch of deletes in a transaction, and only commit when all finish successfully.

Quassnoi 提出了一个有趣的建议——使用表格——但由于他随后使用了 IN 和 OR,结果是一样的.

Quassnoi makes an interesting suggestion -- to use a table --, but since he then uses INs and ORs, it comes out the same.

但是试试这个.

创建一个反映真实表格的新表格.称之为 u_real_table.在 tag 和 longTag 上索引它.

Create a new table that mirrors your real table. Call it u_real_table. Index it on tag and longTag.

将所有传入的数据放入 u_real_table.

Put all your incoming data into u_real_table.

现在,当您准备好进行批量处理时,请加入镜像表或标记上的真实表.从真实表中,删除 u_real_table 中所有标记的行:

Now, when you're ready to do your bulk thing, instead join the mirror table o the real table on tag. From the real table, delete all the tag'd rows in the u_real_table:

delete real_table from real_table a 
   join u_real_table b on (a.tag = b.tag);
insert into real_table select * 
   from u_real_table where tag is not null;

看到我们在这里做了什么吗?由于我们仅在标记上加入,因此可以使用标记索引的可能性更大.

See what we did here? Since we're joining only on tag, there's a greater chance the tag index can be used.

首先我们删除了所有新的内容,然后我们插入了新的替换内容.我们也可以在这里做一个更新.哪个更快取决于您的表结构及其索引.

First we deleted everything new, then we inserted the new replacements. We could also do an update here. Which is faster depends on your table structure and its indices.

我们不必编写脚本来执行此操作,只需将记录插入到 u_real_table 中即可.

We didn't have to write a script to do it, we just had to have inserted the records in u_real_table.

现在我们对 longTags 做同样的事情:

Now we do the same thing for longTags:

delete real_table from real_table a 
   join u_real_table b on (a.longTag = b.longTag);
insert into real_table select * 
   from u_real_table where longTag is not null;

最后,我们清除 u_real_table:

Finally, we clear out u_real_table:

delete from u_real_table;

很明显,我们将每个删除/插入对都包裹在一个事务中,这样删除只有在后续插入成功时才成为真实的,然后我们将整个事物包裹在另一个事务中.因为它是一个逻辑工作单元.

Obviously, we wrap the whole each delete/insert pair in a transaction, so that the delete only becomes real when the subsequent insert succeeds, and then we wrap the whole thing in another transaction. Because it is a logical unit of work.

此方法减少了您的手动工作,减少了手动错误的可能性,并有一定的机会加快删除速度.

This method reduces your manual work, reduces the possibility of a manual error, and has some chance of speeding up the deletes.

请注意,这依赖于缺失的标签和 longTags 正确为 null,而不是零或空字符串.

Note that this relies on missing tags and longTags correctly being null, not zero or the empty string.

这篇关于多行最有效(快速)T-SQL DELETE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆