两个sql查询的性能之间的区别? [英] Difference between performance of the two sql queries?

查看:52
本文介绍了两个sql查询的性能之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的表格中有一个文本数据类型的字段.

I have a field in my table having text data type.

以下两个sql查询的性能是否有所不同:

Is there a difference in performance for the following two sql queries:

 select * from tablename where fieldname="xyz%";
 select * from tablename where fieldname="%zyx";

如果我们要实现这些查询的执行,这是我认为我们需要做的:

If we were to implement the execution of these queries, this is what I think we would need to do:

我们必须匹配两个正则表达式(xyz *和* zyx).

We have to match the two regexes (xyz* and *zyx).

我们必须从头开始检查字符串chars.

We will have to check the string chars one by starting from the beginning.

对于第一个查询,我们将必须读取前三个字符以查看是否存在匹配项,但是对于第二个查询,我们将必须读取直至获取字符串的末尾以确定是否已发生匹配.但是,如果我们将字符串的长度存储在某个位置,则可以直接读取后三个字符,从而获得与第一种情况相似的性能.

For the first query we will have to read the first three characters to see if there is a match but for the second one we will have to read till the we get the end of the string to determine if the match has occurred. But if we have the length of the string stored somewhere we can directly read the last three characters giving similar performance as the first case.

我的问题是,诸如mysql和oracle之类的商业数据库在执行查询方面是否表现出任何差异.

My question is whether commercial databases like mysql and oracle show any difference in the performance in the execution of the queries.

推荐答案

从您的评论中摘录:我只想知道以match开头的内容是否与以match结尾的内容相区别".

Picking up from your comment : " I just want to know if a starts with match is diff from an ends with match".

首先-请记住,我们并不是在寻找匹配字符串的最佳算法.我们正在寻找最佳算法,以找到一组N行中的所有匹配字符串.我们想要做得比做X次N次算法"要好.

Firstly - remember that we are not looking for the best algorithm to match a string. We are looking for the best algorithm to find all matching strings in a set of N rows. We want to do better than 'Do algorithm X, N times'.

如果未对fieldname进行索引,则两个查询之间的性能差异将很小-SQL引擎将只对字符串的前3个字节或后3个字节进行匹配.偏移到正确的内存位置.

If fieldname is NOT indexed, then there will be very little difference in performance between the two queries - the SQL engine is just going to do a match on the first 3 or last 3 bytes of the string, which is simply a matter of offsetting to the right memory location.

如果对字段名进行索引,则两次搜索之间的性能会有很大差异,因为我们可以丢弃大部分数据,而不是检查所有N行.

If the fieldname IS indexed, there will be a huge difference in performance between the two searches, because rather than examining all N rows, we can discard most of the data.

即对于"xyz%"版本,我们可以使用二进制搜索.

i.e. for the "xyz%" version, we can use a binary search.

我们从中间元素开始,恰好是"peter".我们可以立即丢弃"peter"之前的所有内容,并获取其余部分的中间元素-"samantha",依此类推,直到找到以"xyz"开头的条目.

We start at the middle element, which happens to be 'peter'. We can immediately discard everything before 'peter' and get the middle element on the remainder - 'samantha', and so on, until we find the entries starting 'xyz'.

对于%xyz"版本,我们无法执行此操作,因为任何字符串都可能在末尾匹配,因此我们需要查看每个字符串.

With the "%xyz" version, we cannot do this, as ANY string could potentially match at the end, we need to look at every string.

随着表格规模的扩大,这两种方法之间的差异变得很大.

As the size of our table expands, the difference between these two approaches becomes large.

为字段名反向创建字段/索引的解决方案使我们可以再次使用二进制搜索技术. (在某些数据库中,实际上可以在不创建额外字段的情况下执行此操作,而是通过使用特定的索引类型,虚拟列等).

The solution of creating a field/index for the reverse of fieldname allows us to use the binary search technique again. (In some databases it is actual possible to do this without creating an extra field, but through using particular index types, virtual columns, etc).

这大大简化了-有关数据库索引的实际实现的详细信息,请查看B-Tree和B * Tree索引.

This is simplified a lot - for detail on the actual implementation of database indexes, look into B-Tree and B*Tree indexes.

这篇关于两个sql查询的性能之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆