Oracle:带条件的全文搜索 [英] Oracle: Full text search with condition

查看:140
本文介绍了Oracle:带条件的全文搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了如下所示的Oracle Text索引:

I've created an Oracle Text index like the following:

create index my_idx on my_table (text) indextype is ctxsys.context; 

然后我可以执行以下操作:

And I can then do the following:

select * from my_table where contains(text, '%blah%') > 0;

但是我们假设我们在此表中有另一列,例如 group_id ,我想改为执行以下查询:

But lets say we have a have another column in this table, say group_id, and I wanted to do the following query instead:

select * from my_table where contains(text, '%blah%') > 0 and group_id = 43;

使用上述索引,Oracle必须搜索包含的所有项目'blah',然后检查所有 group_id s。

With the above index, Oracle will have to search for all items that contain 'blah', and then check all of their group_ids.

理想情况下,我更愿意只搜索 group_id = 43 的项目,所以我想要一个像这样的索引:

Ideally, I'd prefer to only search the items with group_id = 43, so I'd want an index like this:

create index my_idx on my_table (group_id, text) indextype is ctxsys.context; 

有点像普通索引,因此可以为每个<$ c $进行单独的文本搜索c> group_id 。

Kind of like a normal index, so a separate text search can be done for each group_id.

有没有办法在Oracle中做这样的事情(如果重要的话,我使用10g)?

Is there a way to do something like this in Oracle (I'm using 10g if that is important)?

编辑(澄清)

考虑一个包含一百万行及以下内容的表其中两列, A B ,均为数字。假设有500个不同的 A 值和2000个不同的 B 值,每行都是唯一的。

Consider a table with one million rows and the following two columns among others, A and B, both numeric. Lets say there are 500 different values of A and 2000 different values of B, and each row is unique.

现在让我们考虑选择...其中A = x且B = y

一个索引在 A B 分别据我所知,做一个索引搜索 B ,它将返回500个不同的行,然后对这些行进行连接/扫描。在任何情况下,至少需要查看500行(除了数据库是幸运的,并提前找到所需的行。

An index on A and B separately as far as I can tell do an index search on B, which will return 500 different rows, and then do a join/scan on these rows. In any case, at least 500 rows have to be looked at (aside from the database being lucky and finding the required row early.

而<$ c $的索引c>(A,B)更有效,它在一个索引搜索中找到一行。

Whereas an index on (A,B) is much more effective, it finds the one row in one index search.

在<$上放置单独的索引c $ c> group_id ,我认为文本只留下查询生成器有两个选项。

Putting separate indexes on group_id and the text I feel only leaves the query generator with two options.

(1)使用 group_id index,并扫描文本的所有结果行。

(2)使用文本索引,并扫描 group_id 。

(3)使用两个索引,然后进行连接。

(1) Use the group_id index, and scan all the resulting rows for the text.
(2) Use the text index, and scan all the resulting rows for the group_id.
(3) Use both indexes, and do a join.

我想要:

(4)使用(group_id,text)索引查找特定<$ c下的文本索引$ c> group_id 并扫描我需要的特定行/行的文本索引。不需要扫描和检查或加入,就像在上使用索引时一样(A,B )

(4) Use the (group_id, "text") index to find the text index under the particular group_id and scan that text index for the particular row/rows I need. No scanning and checking or joining required, much like when using an index on (A,B).

推荐答案

Oracle Text

1 - 您可以通过使用 FILTER BY

1 - You can improve performance by creating the CONTEXT index with FILTER BY:

create index my_idx on my_table(text) indextype is ctxsys.context filter by group_id;

在我的测试中,过滤器肯定有所改善性能,但在group_id上使用btree索引仍然稍微快一些。

In my tests the filter by definitely improved the performance, but it was still slightly faster to just use a btree index on group_id.

2 - CTXCAT索引使用子索引,似乎工作类似于多列索引。这似乎是您正在寻找的选项(4):

2 - CTXCAT indexes use "sub-indexes", and seem to work similar to a multi-column index. This seems to be the option (4) you're looking for:

begin
  ctx_ddl.create_index_set('my_table_index_set');
  ctx_ddl.add_index('my_table_index_set', 'group_id');
end;
/

create index my_idx2 on my_table(text) indextype is ctxsys.ctxcat
    parameters('index set my_table_index_set');

select * from my_table where catsearch(text, 'blah', 'group_id = 43') > 0

这可能是最快的方法。对120MB的随机文本使用上述查询,类似于你的A和B场景,只需要18个一致的获取。但在不利方面,创建CTXCAT索引需要大约11分钟,并使用1.8GB的空间。

This is likely the fastest approach. Using the above query against 120MB of random text similar to your A and B scenario required only 18 consistent gets. But on the downside, creating the CTXCAT index took almost 11 minutes and used 1.8GB of space.

(注意:Oracle Text似乎在这里正常工作,但我是不熟悉Text而且我不能保证这不是对@NullUserException这些索引的不当使用。)

(Note: Oracle Text seems to work correctly here, but I'm not familiar with Text and I can't gaurentee this isn't an inappropriate use of these indexes like @NullUserException said.)

多列索引与index joins

对于您在编辑中描述的情况,通常 使用索引之间没有显着差异(A,B)并加入A和B上的单独索引。我使用与您描述的数据类似的数据构建了一些测试,并且索引连接只需要7个一致获取,而多列索引则需要2个一致获取。

For the situation you describe in your edit, normally there would not be a significant difference between using an index on (A,B) and joining separate indexes on A and B. I built some tests with data similar to what you described and an index join required only 7 consistent gets versus 2 consistent gets for the multi-column index.

原因是因为Oracle以块的形式检索数据。块通常为8K,并且索引块已经排序,因此您可以在几个块中拟合500到2000个值。如果您担心性能,通常读取和写入块的IO是唯一重要的。 Oracle是否必须将几千行连接在一起是一个无关紧要的CPU时间。

The reason for this is because Oracle retrieves data in blocks. A block is usually 8K, and an index block is already sorted, so you can probably fit the 500 to 2000 values in a few blocks. If you're worried about performance, usually the IO to read and write blocks is the only thing that matters. Whether or not Oracle has to join together a few thousand rows is an inconsequential amount of CPU time.

但是,这不适用于Oracle Text索引。您可以使用btree索引(位图和?)加入CONTEXT索引,但性能很差。

However, this doesn't apply to Oracle Text indexes. You can join a CONTEXT index with a btree index (a "bitmap and"?), but the performance is poor.

这篇关于Oracle:带条件的全文搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆