使用索引搜索相似的单词 [英] Search for similar words using an index
问题描述
我需要使用某种模糊搜索(例如来自oracle的模糊搜索)并使用索引来搜索数据库表,因为我不希望进行表扫描(数据很多).
I need to search over a DB table using some kind of fuzzy search like the one from oracle and using indexes since I do not want a table scan(there is a lot of data).
我想忽略大小写,语言特殊内容(ñ,ß,...)和特殊字符,例如_,(),-等...
I want to ignore case, language special stuff(ñ, ß, ...) and special characters like _, (), -, etc...
搜索"maria(cool)"时,应将"maria-COOL"和María_Cool"作为匹配项.
在Oracle中有可能吗?
Search for "maria (cool)" should get "maria- COOL" and "María_Cool" as matches.
Is that possible in Oracle in some way?
关于这种情况,我认为可以解决的问题是直接以小写形式创建索引并始终搜索小写形式.但是我不知道如何解决特殊字符的问题.
我曾考虑过将不包含特殊字符的数据存储在单独的列中,然后搜索返回真正的数据,但是我不确定100%在哪里是完美的解决方案.
About the case, I think it can be solved created the index directly in lower case and searching always lower-cased. But I do not know how to solve the special chars stuff.
I thought about storing the data without special chars in a separated column and searching on that returning the real one, but I am not 100% sure where that is the perfect solution.
有什么想法吗?
推荐答案
But you can also create a function based index on, lets say, something like this:
regexp_replace(your_column, '[^0-9a-zA-Z]+', ' ')
并尝试这样匹配:
...
WHERE regexp_replace(your_column, '[^0-9a-zA-Z]+', ' ') =
regexp_replace('maria (cool)' , '[^0-9a-zA-Z]+', ' ')
这是一个sqlfiddle演示尚不完整,但可以作为一个开始
Here is a sqlfiddle demo It's not complete, but can be a start
这篇关于使用索引搜索相似的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!