Oracle：带条件的全文搜索 [英] Oracle: Full text search with condition

查看：140 发布时间：2018/8/2 13:10:47 sql oracle indexing full-text-indexing oracle-text

本文介绍了Oracle：带条件的全文搜索的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经创建了如下所示的Oracle Text索引：

I've created an Oracle Text index like the following:

create index my_idx on my_table (text) indextype is ctxsys.context;

然后我可以执行以下操作：

And I can then do the following:

select * from my_table where contains(text, '%blah%') > 0;

但是我们假设我们在此表中有另一列，例如 group_id ，我想改为执行以下查询：

But lets say we have a have another column in this table, say group_id, and I wanted to do the following query instead:

select * from my_table where contains(text, '%blah%') > 0 and group_id = 43;

使用上述索引，Oracle必须搜索包含的所有项目'blah'，然后检查所有 group_id s。

With the above index, Oracle will have to search for all items that contain 'blah', and then check all of their group_ids.

理想情况下，我更愿意只搜索 group_id = 43 的项目，所以我想要一个像这样的索引：

Ideally, I'd prefer to only search the items with group_id = 43, so I'd want an index like this:

create index my_idx on my_table (group_id, text) indextype is ctxsys.context;

有点像普通索引，因此可以为每个<$ c $进行单独的文本搜索c> group_id 。

Kind of like a normal index, so a separate text search can be done for each group_id.

有没有办法在Oracle中做这样的事情（如果重要的话，我使用10g）？

Is there a way to do something like this in Oracle (I'm using 10g if that is important)?

编辑（澄清）

考虑一个包含一百万行及以下内容的表其中两列， A 和 B ，均为数字。假设有500个不同的 A 值和2000个不同的 B 值，每行都是唯一的。

Consider a table with one million rows and the following two columns among others, A and B, both numeric. Lets say there are 500 different values of A and 2000 different values of B, and each row is unique.

现在让我们考虑选择...其中A = x且B = y

一个索引在 A 和 B 分别据我所知，做一个索引搜索 B ，它将返回500个不同的行，然后对这些行进行连接/扫描。在任何情况下，至少需要查看500行（除了数据库是幸运的，并提前找到所需的行。

An index on A and B separately as far as I can tell do an index search on B, which will return 500 different rows, and then do a join/scan on these rows. In any case, at least 500 rows have to be looked at (aside from the database being lucky and finding the required row early.

而<$ c $的索引c>（A，B）更有效，它在一个索引搜索中找到一行。

Whereas an index on (A,B) is much more effective, it finds the one row in one index search.

在<$上放置单独的索引c $ c> group_id ，我认为文本只留下查询生成器有两个选项。

Putting separate indexes on group_id and the text I feel only leaves the query generator with two options.

（1）使用 group_id index，并扫描文本的所有结果行。

（2）使用文本索引，并扫描 group_id 。

（3）使用两个索引，然后进行连接。

(1) Use the group_id index, and scan all the resulting rows for the text.
(2) Use the text index, and scan all the resulting rows for the group_id.
(3) Use both indexes, and do a join.

我想要：

（4）使用（group_id，text）索引查找特定<$ c下的文本索引$ c> group_id 并扫描我需要的特定行/行的文本索引。不需要扫描和检查或加入，就像在上使用索引时一样（A，B ）。

(4) Use the (group_id, "text") index to find the text index under the particular group_id and scan that text index for the particular row/rows I need. No scanning and checking or joining required, much like when using an index on (A,B).

推荐答案

Oracle Text

1 - 您可以通过使用 FILTER BY ：

1 - You can improve performance by creating the CONTEXT index with FILTER BY:

create index my_idx on my_table(text) indextype is ctxsys.context filter by group_id;

在我的测试中，过滤器肯定有所改善性能，但在group_id上使用btree索引仍然稍微快一些。

In my tests the filter by definitely improved the performance, but it was still slightly faster to just use a btree index on group_id.

2 - CTXCAT索引使用子索引，似乎工作类似于多列索引。这似乎是您正在寻找的选项（4）：

2 - CTXCAT indexes use "sub-indexes", and seem to work similar to a multi-column index. This seems to be the option (4) you're looking for:

begin
  ctx_ddl.create_index_set('my_table_index_set');
  ctx_ddl.add_index('my_table_index_set', 'group_id');
end;
/

create index my_idx2 on my_table(text) indextype is ctxsys.ctxcat
    parameters('index set my_table_index_set');

select * from my_table where catsearch(text, 'blah', 'group_id = 43') > 0

这可能是最快的方法。对120MB的随机文本使用上述查询，类似于你的A和B场景，只需要18个一致的获取。但在不利方面，创建CTXCAT索引需要大约11分钟，并使用1.8GB的空间。

This is likely the fastest approach. Using the above query against 120MB of random text similar to your A and B scenario required only 18 consistent gets. But on the downside, creating the CTXCAT index took almost 11 minutes and used 1.8GB of space.

（注意：Oracle Text似乎在这里正常工作，但我是不熟悉Text而且我不能保证这不是对@NullUserException这些索引的不当使用。）

(Note: Oracle Text seems to work correctly here, but I'm not familiar with Text and I can't gaurentee this isn't an inappropriate use of these indexes like @NullUserException said.)

多列索引与index joins

对于您在编辑中描述的情况，通常使用索引之间没有显着差异（A，B）并加入A和B上的单独索引。我使用与您描述的数据类似的数据构建了一些测试，并且索引连接只需要7个一致获取，而多列索引则需要2个一致获取。

For the situation you describe in your edit, normally there would not be a significant difference between using an index on (A,B) and joining separate indexes on A and B. I built some tests with data similar to what you described and an index join required only 7 consistent gets versus 2 consistent gets for the multi-column index.

原因是因为Oracle以块的形式检索数据。块通常为8K，并且索引块已经排序，因此您可以在几个块中拟合500到2000个值。如果您担心性能，通常读取和写入块的IO是唯一重要的。 Oracle是否必须将几千行连接在一起是一个无关紧要的CPU时间。

The reason for this is because Oracle retrieves data in blocks. A block is usually 8K, and an index block is already sorted, so you can probably fit the 500 to 2000 values in a few blocks. If you're worried about performance, usually the IO to read and write blocks is the only thing that matters. Whether or not Oracle has to join together a few thousand rows is an inconsequential amount of CPU time.

但是，这不适用于Oracle Text索引。您可以使用btree索引（位图和？）加入CONTEXT索引，但性能很差。

However, this doesn't apply to Oracle Text indexes. You can join a CONTEXT index with a btree index (a "bitmap and"?), but the performance is poor.

这篇关于Oracle：带条件的全文搜索的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Oracle：带条件的全文搜索 [英] Oracle: Full text search with condition

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

Oracle：带条件的全文搜索 [英] Oracle: Full text search with condition

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭