Oracle:带条件的全文搜索 [英] Oracle: Full text search with condition
问题描述
我已经创建了如下所示的Oracle Text索引:
I've created an Oracle Text index like the following:
create index my_idx on my_table (text) indextype is ctxsys.context;
然后我可以执行以下操作:
And I can then do the following:
select * from my_table where contains(text, '%blah%') > 0;
但是我们假设我们在此表中有另一列,例如 group_id
,我想改为执行以下查询:
But lets say we have a have another column in this table, say group_id
, and I wanted to do the following query instead:
select * from my_table where contains(text, '%blah%') > 0 and group_id = 43;
使用上述索引,Oracle必须搜索包含的所有项目'blah'
,然后检查所有 group_id
s。
With the above index, Oracle will have to search for all items that contain 'blah'
, and then check all of their group_id
s.
理想情况下,我更愿意只搜索 group_id = 43
的项目,所以我想要一个像这样的索引:
Ideally, I'd prefer to only search the items with group_id = 43
, so I'd want an index like this:
create index my_idx on my_table (group_id, text) indextype is ctxsys.context;
有点像普通索引,因此可以为每个<$ c $进行单独的文本搜索c> group_id 。
Kind of like a normal index, so a separate text search can be done for each group_id
.
有没有办法在Oracle中做这样的事情(如果重要的话,我使用10g)?
Is there a way to do something like this in Oracle (I'm using 10g if that is important)?
编辑(澄清)
考虑一个包含一百万行及以下内容的表其中两列, A
和 B
,均为数字。假设有500个不同的 A
值和2000个不同的 B
值,每行都是唯一的。
Consider a table with one million rows and the following two columns among others, A
and B
, both numeric. Lets say there are 500 different values of A
and 2000 different values of B
, and each row is unique.
现在让我们考虑选择...其中A = x且B = y
一个索引在 A
和 B
分别据我所知,做一个索引搜索 B
,它将返回500个不同的行,然后对这些行进行连接/扫描。在任何情况下,至少需要查看500行(除了数据库是幸运的,并提前找到所需的行。
An index on A
and B
separately as far as I can tell do an index search on B
, which will return 500 different rows, and then do a join/scan on these rows. In any case, at least 500 rows have to be looked at (aside from the database being lucky and finding the required row early.
而<$ c $的索引c>(A,B)更有效,它在一个索引搜索中找到一行。
Whereas an index on (A,B)
is much more effective, it finds the one row in one index search.
在<$上放置单独的索引c $ c> group_id ,我认为文本只留下查询生成器有两个选项。
Putting separate indexes on group_id
and the text I feel only leaves the query generator with two options.
(1)使用 group_id
index,并扫描文本的所有结果行。
(2)使用文本索引,并扫描 group_id
。
(3)使用两个索引,然后进行连接。
(1) Use the group_id
index, and scan all the resulting rows for the text.
(2) Use the text index, and scan all the resulting rows for the group_id
.
(3) Use both indexes, and do a join.
我想要:
(4)使用(group_id,text)
索引查找特定<$ c下的文本索引$ c> group_id 并扫描我需要的特定行/行的文本索引。不需要扫描和检查或加入,就像在上使用索引时一样(A,B )
。
(4) Use the (group_id, "text")
index to find the text index under the particular group_id
and scan that text index for the particular row/rows I need. No scanning and checking or joining required, much like when using an index on (A,B)
.
推荐答案
Oracle Text
1 - 您可以通过使用 FILTER BY :
1 - You can improve performance by creating the CONTEXT index with FILTER BY:
create index my_idx on my_table(text) indextype is ctxsys.context filter by group_id;
在我的测试中,过滤器
肯定有所改善性能,但在group_id上使用btree索引仍然稍微快一些。
In my tests the filter by
definitely improved the performance, but it was still slightly faster to just use a btree index on group_id.
2 - CTXCAT索引使用子索引,似乎工作类似于多列索引。这似乎是您正在寻找的选项(4):
2 - CTXCAT indexes use "sub-indexes", and seem to work similar to a multi-column index. This seems to be the option (4) you're looking for:
begin
ctx_ddl.create_index_set('my_table_index_set');
ctx_ddl.add_index('my_table_index_set', 'group_id');
end;
/
create index my_idx2 on my_table(text) indextype is ctxsys.ctxcat
parameters('index set my_table_index_set');
select * from my_table where catsearch(text, 'blah', 'group_id = 43') > 0
这可能是最快的方法。对120MB的随机文本使用上述查询,类似于你的A和B场景,只需要18个一致的获取。但在不利方面,创建CTXCAT索引需要大约11分钟,并使用1.8GB的空间。
This is likely the fastest approach. Using the above query against 120MB of random text similar to your A and B scenario required only 18 consistent gets. But on the downside, creating the CTXCAT index took almost 11 minutes and used 1.8GB of space.
(注意:Oracle Text似乎在这里正常工作,但我是不熟悉Text而且我不能保证这不是对@NullUserException这些索引的不当使用。)
(Note: Oracle Text seems to work correctly here, but I'm not familiar with Text and I can't gaurentee this isn't an inappropriate use of these indexes like @NullUserException said.)
多列索引与index joins
对于您在编辑中描述的情况,通常 使用索引之间没有显着差异(A,B)并加入A和B上的单独索引。我使用与您描述的数据类似的数据构建了一些测试,并且索引连接只需要7个一致获取,而多列索引则需要2个一致获取。
For the situation you describe in your edit, normally there would not be a significant difference between using an index on (A,B) and joining separate indexes on A and B. I built some tests with data similar to what you described and an index join required only 7 consistent gets versus 2 consistent gets for the multi-column index.
原因是因为Oracle以块的形式检索数据。块通常为8K,并且索引块已经排序,因此您可以在几个块中拟合500到2000个值。如果您担心性能,通常读取和写入块的IO是唯一重要的。 Oracle是否必须将几千行连接在一起是一个无关紧要的CPU时间。
The reason for this is because Oracle retrieves data in blocks. A block is usually 8K, and an index block is already sorted, so you can probably fit the 500 to 2000 values in a few blocks. If you're worried about performance, usually the IO to read and write blocks is the only thing that matters. Whether or not Oracle has to join together a few thousand rows is an inconsequential amount of CPU time.
但是,这不适用于Oracle Text索引。您可以使用btree索引(位图和?)加入CONTEXT索引,但性能很差。
However, this doesn't apply to Oracle Text indexes. You can join a CONTEXT index with a btree index (a "bitmap and"?), but the performance is poor.
这篇关于Oracle:带条件的全文搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!