如何Firefox的“真棒”栏匹配字符串? [英] How does Firefox's 'awesome' bar match strings?

查看:104
本文介绍了如何Firefox的“真棒”栏匹配字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在的问题是字符串匹配是如何完成的找到了火狐3的 URL栏。在每个条目子串匹配可能很慢。什么算法可以用来在任何位置匹配的快捷方式?

解决方案

正常的方式来做到快速串匹配是创建包含所有你想要搜索字符串的所有后缀的数据结构。根据不同的组织,这可以被称为后缀树或后缀数组。例如,如果你有1000个字符串,每一个是50个字符长,你有1.000×50平凡的后缀,即后缀数组将有50.000项。

然后做匹配的,你做的二进制搜索(如果数组)或树搜索(如果树)来查找数据结构,它的所有后缀的开始的匹配写在搜索框中输入字符串。因为它是你匹配后缀的开始,你可以使用标准的搜索程序(二进制搜索,树后裔)去的结果快。每一个后缀链接到它出现的字符串。

举例:你有两个字符串猫和点。后缀数组看起来像这样(注意lexiographic =字母排序):

 #1  - >猫
#2 CAT  - >猫
#3 DOT  - >点
#4 OT  - >点
#5笔 - > DOT,CAT
 

请注意,有6后缀但二者的是相同的(最后的T,在CAT和点)和均重新由同一条目(#5)。

现在,当在搜索的用户类型,例如OT(这应该与点),你可以做简单的lexiographic顺序查找日志时,你正在寻找一个的开始的匹配后缀数组中开始。

这是快速文本搜索的基本原则,当搜索模式不包含通配符。

The question is how the string matching is done to find matching entries by the firefox 3 url bar. Substring matching on every entry might be slow. What algorithm can be used to match at any location in a fast way?

解决方案

The normal way to do fast substring matching is to create a data structure which contain all the suffixes of all the strings you want search. Depending on the organization, this can be called the "suffix tree" or "suffix array". For example, if you have 1000 strings and every one is 50 characters long, you have 1.000 x 50 nontrivial suffixes, i.e. your suffix array would have 50.000 entries.

Then to do the matching, you do binary search (if array) or tree search (if tree) to find all those suffixes in the data structure whose beginning matches the string written in the search box. Because it is the beginning of the suffix you are matching, you can use standard search procedures (binary search, tree descent) to get to the result fast. Every suffix is linked to those strings in which it appears.

Example: you have two strings "CAT" and "DOT". Your suffix array looks like this (note lexiographic = alphabetic ordering):

#1 AT  --> CAT
#2 CAT --> CAT
#3 DOT --> DOT
#4 OT  --> DOT
#5 T   --> DOT, CAT

Note that there are six suffixes but two of the are the same (last "T" in "CAT" and "DOT") and the are both represented by the same entry (#5).

Now when the user types in search, e.g. "OT" (which should match "DOT"), you can do simple lexiographic ordering lookup in log-time as you are now searching for a beginning match in the suffix array.

This is the basic principle of fast text searching when the search pattern does not contain wildcards.

这篇关于如何Firefox的“真棒”栏匹配字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆