键集分页 - 按跨多列的搜索词过滤 [英] Keyset Pagination - Filter By Search Term across Multiple Columns
问题描述
我正在尝试从 OFFSET/FETCH
分页转移到 Keyset
分页(也称为 Seek 方法).因为我刚刚开始,所以我有很多问题,但这是我尝试与过滤器一起正确分页的众多问题之一.
所以我有两张桌子
aspnet_users
有列
PK
UserId 唯一标识符
字段
用户名 NVARCHAR(256) 非空,AffiliateTag varchar(50) NULL.....其他领域
aspnet_membership
有列
PK+FK
UserId 唯一标识符
字段
电子邮件 NVARCHAR(256) NOT NULL.....其他领域
索引
Non Clustered
表索引aspnet_users
(UserName)Non Clustered
表索引aspnet_users
(AffiliateTag)Non Clustered
表索引aspnet_membership
(Email)
我有一个页面将列出用户(基于 search term
),页面大小设置为 20.我想跨多个列进行搜索,而不是执行 OR
我发现每个查询都有一个单独的查询,然后 Union
它们将使索引正确使用.
所以有存储过程,它将采用 search term
和可选的 UserName
和 UserId
下一页的最后一条记录.
创建过程 [dbo].[sp_searchuser]@take int,@searchTerm nvarchar(max) NULL,@lastUserName nvarchar(256)=NULL,@lastUserId nvarchar(256)=NULL作为IF(@lastUserName 不为空且@lastUserId 不为空)开始选择顶部(@take)*从(选择 u.UserId、u.UserName、u.AffiliateTag、m.Email来自 aspnet_Users as u内连接 aspnet_Membership as m在 u.UserId=m.UserId 上其中 u.UserName 像@searchTerm联盟选择 u.UserId、u.UserName、u.AffiliateTag、m.Email来自 aspnet_Users as u内连接 aspnet_Membership as m在 u.UserId=m.UserId其中 u.AffiliateTag 像 convert(varchar(50), @searchTerm)) 作为 u1其中 u1.UserName >@lastUserNameOR (u1.UserName=@lastUserName And u1.UserId > convert(uniqueidentifier, @lastUserId))按 u1.UserName 排序结尾别的开始选择顶部(@take)*从(选择 u.UserId、u.UserName、u.AffiliateTag、m.Email来自 aspnet_Users as u内连接 aspnet_Membership as m在 u.UserId=m.UserId 上其中 u.UserName 像@searchTerm联盟选择 u.UserId、u.UserName、u.AffiliateTag、m.Email来自 aspnet_Users as u内连接 aspnet_Membership as m在 u.UserId=m.UserId 上其中 u.AffiliateTag 像 convert(varchar(50), @searchTerm)) 作为 u1按 u1.UserName 排序结尾
现在使用搜索词mua
exec [sp_searchuser] 20, 'mua%'
它使用为 UserName 列创建的索引和为 AffiliateTag 列创建的另一个索引,这很好
但问题是我发现内部联合查询返回所有匹配的行
就像在这种情况下,执行计划显示
UserName Like SubQuery
读取的行数= 5实际行数= 4
AffiliateTag Like SubQuery
读取的行数= 465实际行数 = 465
因此总共内部查询返回 469
个匹配行
然后外部查询取出 20 用于最终结果重置.所以真的要读取比需要更多的数据.
什么时候去下一页
exec [sp_searchuser] 20, 'mua%', 'lastUserName', 'lastUserId'
执行计划显示
UserName Like SubQuery
读取的行数= 5实际行数= 4
AffiliateTag Like SubQuery
读取的行数= 465实际行数 = 445
总共内部查询返回 449
个匹配行
所以无论有没有分页,它读取的数据都比需要的多.
我的期望是以某种方式限制内部查询,使其不返回所有匹配的行.
您可能对
好处是索引是覆盖的,我们没有执行查找.根据搜索条件,它的效果可能比您的方法更好.
如果性能不好,我们可以使用触发器UNPIVOT
将原始数据记录在单独的表中.它可能看起来像这样(最好使用attribute_id而不是像我这样的文本):
DROP TABLE IF EXISTS [dbo].[StackOverflowAttributes];创建表 [dbo].[StackOverflowAttributes]([用户ID] INT,[AttributeName] NVARCHAR(128),[AttributeValue] NVARCHAR(128),PRIMARY KEY([UserID], [AttributeName], [AttributeValue]));去创建索引 IX_StackOverflowAttributes_AttributeValue ON [dbo].[StackOverflowAttributes]([属性值])INSERT INTO [dbo].[StackOverflowAttributes] ([UserID], [AttributeName], [AttributeValue])选择 [用户 ID],'名称',[用户名]从 [dbo].[StackOverflow]联盟选择 [用户 ID],'附属标签',[附属标签]从 [dbo].[StackOverflow];
之前的查询将如下所示:
SELECT TOP 20 U.[UserID],U.[用户名],U.[附属标签],U.[用户邮箱]FROM [dbo].[StackOverflowAttributes] A内连接 [dbo].[StackOverflow] UON A.[用户ID] = U.[用户ID]WHERE A.[AttributeValue] LIKE 'UserName200%'由您订购.[用户名];
现在,我们只读取索引行的一部分,然后执行查找.
为了比较性能,最好使用:
SET STATISTICS IO, TIME ON;
因为它会告诉你如何从索引中读取页面.结果可以在此处可视化.
I'm trying to move away from OFFSET/FETCH
pagination to Keyset
Pagination (also known as Seek Method). Since I'm just started, there are many questions I have in my mind but this is one of many where I try to get the pagination right along with Filter.
So I have 2 tables
aspnet_users
having columns
PK
UserId uniquidentifier
Fields
UserName NVARCHAR(256) NOT NULL,
AffiliateTag varchar(50) NULL
.....other fields
aspnet_membership
having columns
PK+FK
UserId uniquidentifier
Fields
Email NVARCHAR(256) NOT NULL
.....other fields
Indexes
Non Clustered
Index on Tableaspnet_users
(UserName)Non Clustered
Index on Tableaspnet_users
(AffiliateTag)Non Clustered
Index on Tableaspnet_membership
(Email)
I have a page that will list the users (based on search term
) with page size set to 20. And I want to search across multiple columns so instead of doing OR
I find out having a separate query for each and then Union
them will make the index use correctly.
so have the stored proc that will take search term
and optionally UserName
and UserId
of last record for next page.
Create proc [dbo].[sp_searchuser]
@take int,
@searchTerm nvarchar(max) NULL,
@lastUserName nvarchar(256)=NULL,
@lastUserId nvarchar(256)=NULL
AS
IF(@lastUserName IS NOT NULL AND @lastUserId IS NOT NULL)
Begin
select top (@take) *
from
(
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.UserName like @searchTerm
UNION
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.AffiliateTag like convert(varchar(50), @searchTerm)
) as u1
where u1.UserName > @lastUserName
OR (u1.UserName=@lastUserName And u1.UserId > convert(uniqueidentifier, @lastUserId))
order by u1.UserName
End
Else
Begin
select top (@take) *
from
(
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.UserName like @searchTerm
UNION
select u.UserId, u.UserName, u.AffiliateTag, m.Email
from aspnet_Users as u
inner join aspnet_Membership as m
on u.UserId=m.UserId
where u.AffiliateTag like convert(varchar(50), @searchTerm)
) as u1
order by u1.UserName
End
Now to get the result for first page with search term mua
exec [sp_searchuser] 20, 'mua%'
it uses both indexes created one for UserName column and another for AffiliateTag column which is good
But the problem is I find the inner union queries return all the matching rows
like in this case, the execution plan shows
UserName Like SubQuery
Number of Rows Read= 5
Actual Number of Rows= 4
AffiliateTag Like SubQuery
Number of Rows Read= 465
Actual Number of Rows= 465
so in total inner queries return 469
matching rows
and then outer query take out 20 for final result reset. So really reading more data than needed.
And when go to next page
exec [sp_searchuser] 20, 'mua%', 'lastUserName', 'lastUserId'
the execution plan shows
UserName Like SubQuery
Number of Rows Read= 5
Actual Number of Rows= 4
AffiliateTag Like SubQuery
Number of Rows Read= 465
Actual Number of Rows= 445
in total inner queries return 449
matching rows
so either with or without pagination, it reads more data than needed.
My expectation is to somehow limit the inner queries so it does not return all matching rows.
You might be interested in the Logical Processing Order, which determines when the objects defined in one step are made available to the clauses in subsequent steps. The Logical Processing Order
steps are:
- FROM
- ON
- JOIN
- WHERE
- GROUP BY
- WITH CUBE or WITH ROLLUP
- HAVING
- SELECT
- DISTINCT
- ORDER BY
- TOP
Of course, as noted the docs:
The actual physical execution of the statement is determined by the query processor and the order may vary from this list.
meaning that sometimes some statements can start before previous complete.
In your case, you query looks like:
- some data extraction
- sort by
user_name
- get TOP records
There is no way to reduce the rows in the data extraction part as to have a deterministic result (we actually may need to order by user_name, user_id
to have such) we need to get all matching rows, sort them and then get the desired rows.
For example, image the first query returning 20 names starting with 'Z'. And the second query to returned only one name starting with 'A'. If you stop somehow the execution and skip the second query, you will get wrong results - 20 names starting with 'Z' instead one starting with 'A' and 19 with 'Z'.
In such cases, I prefer to use dynamic T-SQL statements in order to get better execution times and reduce the code length. You are saying:
And I want to search across multiple columns so instead of doing OR I find out having a separate query for each and then Union them will make the index use correctly.
When you are using UNION
you are performing double reads to your tables. In your cases, you are reading the aspnet_Membership
table twice and the aspnet_Users
twice (yes, here you are using two different indexes but I believe they are not covering and you end up performing look ups to extract the users name
and email
.
I guess you have started with covering indexed like in the example below:
DROP TABLE IF EXISTS [dbo].[StackOverflow];
CREATE TABLE [dbo].[StackOverflow]
(
[UserID] INT PRIMARY KEY
,[UserName] NVARCHAR(128)
,[AffiliateTag] NVARCHAR(128)
,[UserEmail] NVARCHAR(128)
,[a] INT
,[b] INT
,[c] INT
,[z] INT
);
CREATE INDEX IX_StackOverflow_UserID_UserName_AffiliateTag_I_UserEmail ON [dbo].[StackOverflow]
(
[UserID]
,[UserName]
,[AffiliateTag]
)
INCLUDE ([UserEmail]);
GO
INSERT INTO [dbo].[StackOverflow] ([UserID], [UserName], [AffiliateTag], [UserEmail])
SELECT TOP (1000000) ROW_NUMBER() OVER(ORDER BY t1.number)
,CONCAT('UserName',ROW_NUMBER() OVER(ORDER BY t1.number))
,CONCAT('AffiliateTag', ROW_NUMBER() OVER(ORDER BY t1.number))
,CONCAT('UserEmail', ROW_NUMBER() OVER(ORDER BY t1.number))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;
GO
So, for the following query:
SELECT TOP 20 [UserID]
,[UserName]
,[AffiliateTag]
,[UserEmail]
FROM [dbo].[StackOverflow]
WHERE [UserName] LIKE 'UserName200%'
OR [AffiliateTag] LIKE 'UserName200%'
ORDER BY [UserName];
GO
The issue here is we are reading all the rows even we are using the index.
What's good is that the index is covering and we are not performing look ups. Depending on the search criteria it may perform better than your approach.
If the performance is bad, we can use a trigger to UNPIVOT
the original data and record in a separate table. It may look like this (it will be better to use attribute_id rather than the text like me):
DROP TABLE IF EXISTS [dbo].[StackOverflowAttributes];
CREATE TABLE [dbo].[StackOverflowAttributes]
(
[UserID] INT
,[AttributeName] NVARCHAR(128)
,[AttributeValue] NVARCHAR(128)
,PRIMARY KEY([UserID], [AttributeName], [AttributeValue])
);
GO
CREATE INDEX IX_StackOverflowAttributes_AttributeValue ON [dbo].[StackOverflowAttributes]
(
[AttributeValue]
)
INSERT INTO [dbo].[StackOverflowAttributes] ([UserID], [AttributeName], [AttributeValue])
SELECT [UserID]
,'Name'
,[UserName]
FROM [dbo].[StackOverflow]
UNION
SELECT [UserID]
,'AffiliateTag'
,[AffiliateTag]
FROM [dbo].[StackOverflow];
and the query before will looks like:
SELECT TOP 20 U.[UserID]
,U.[UserName]
,U.[AffiliateTag]
,U.[UserEmail]
FROM [dbo].[StackOverflowAttributes] A
INNER JOIN [dbo].[StackOverflow] U
ON A.[UserID] = U.[UserID]
WHERE A.[AttributeValue] LIKE 'UserName200%'
ORDER BY U.[UserName];
Now, we are reading only a part of the the index rows and after that performing a lookup.
In order to compare performance it will be better to use:
SET STATISTICS IO, TIME ON;
as it will give you how pages are read from the indexes. The result can be visualized here.
这篇关于键集分页 - 按跨多列的搜索词过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!