PHP/SQL - 改进搜索功能/模糊搜索 [英] PHP / SQL - Improving search feature / fuzzy search

查看:53
本文介绍了PHP/SQL - 改进搜索功能/模糊搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为我的网站创建产品搜索,用户可以在其中搜索多种语言的产品,并且(希望)在没有完全匹配的情况下获得模糊搜索结果.

I am trying to create a product search for my site, where a user can search for products in multiple languages and (hopefully) get fuzzy search results if there is no exact match.

  • 我有一个 pro_search 表,其中包含 id、pro_id、en、de、es、fr,它.
  • pro_id 列是指产品在他们自己的桌子.
  • en, de, es, fr, it 列已翻译各种语言的每种产品的元数据.
  • 元只是由空格分隔的关键字
  • $term 是搜索词.
  • $lang 指的是用户选择的语言
  • I have a pro_search table that has columns id, pro_id, en, de, es, fr, it.
  • The pro_id column refers to the id of the products in their own table.
  • The en, de, es, fr, it columns have the translated meta of each product in various languages.
  • The meta is just keywords seperated by spaces
  • $term is the search term.
  • $lang refers to the users chosen language

所以首先我做一个基本的LIKE"SQL查询来查看是否有匹配项,如果没有结果,我查询所有产品并使用similar_text()创建一个按相似性排序的数组 函数

So first I do a basic 'LIKE' SQL query to see if there are matches, if there are no results from this I query all the products and create an array sorted by their similarity using the similar_text() function

例如,我搜索衬衫"这很好,如果该产品的元数据只包含衬衫"一词,但如果元数据包含蓝色品牌 T 恤",这将更具描述性,并让用户有机会进行搜索按品牌,但意味着搜索很可能会变得模糊,而不是通过 LIKE SQL 查询找到.

For example I search 'shirt' this is fine if the meta for this product just includes the word 'shirt', but if the meta includes 'blue branded tshirt' this is being more descriptive and gives the user a chance to search by brand but means that the search will more than likely go fuzzy rather than be found with a LIKE SQL query.

这是一种工作,但我想知道如何改进,是否有更好的搜索方式或人们通常如何做?我是否应该将元拆分为每个单独的关键字,然后尝试查看有多少词匹配而不是将术语与整个元匹配?

This is kind of working but I was wondering how this could be improved, is there a better way of searching or how do people normally do it? Should I be splitting the meta into each individual keywords and try to see how many words match rather than matching the term to the whole meta?

    $ids = [];

    $params = ['%'.$term.'%'];
    $sql = "SELECT * FROM pro_search WHERE $lang LIKE ?";
    $stmt = DB::run($sql,$params);

    $count = $stmt->rowCount();
    if($count > 0){

        // product search
        while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
            $id = $row["pro_id"];
            array_push($ids,$id);
        }
        show_products($ids);

    }else{

        // product fuzzy search
        $sql = "SELECT * FROM pro_search";
        $stmt = DB::run($sql);
        while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
            $id = $row["pro_id"];
            $result = $row[$lang];
            similar_text($term,$result,$similarity);
            $similar_array[$similarity][] = $id;
        }

        $closest_match = array_keys($similar_array);
        rsort($closest_match);
        $match_count = count($closest_match);

        for($i=0; $i<$match_count; $i++){
            foreach($similar_array[$closest_match[$i]] as $id){
                array_push($ids,$id);
            }
        }
        show_products($ids);
    }

我以前问过类似的问题,人们向我指出了将术语与元进行比较的不同方法(例如 levenshtein),但我所看到的一切都是比较两个简单的词(例如苹果和橙子)和这个对于拥有数千种产品的现实生活应用程序来说,这还不够好,而且用户几乎可以搜索任何东西(如 $term='literally nothing';)

I have asked similar questions before and people have pointed me to different ways of comparing the term against the meta (such as levenshtein), but everything I've seen has been comparing two simple words (like apples and oranges) and this just isn't good enough for a real life application with thousands of products and a user could search for literally anything (as in $term='literally anything';)

关键问题:

  • 我的元数据应该只有产品名称还是多个相关的关键字(关键字太多意味着单个单词不太相似整体)?
  • 如果我在元数据中有多个关键字,我应该获取每个单独的关键字并将其与搜索进行比较期限?
  • 也可能有否定关键字个别产品.

推荐答案

您正在寻找 带查询扩展的全文搜索

MySQL 支持使用 LIKE 运算符和正则表达式进行文本搜索.但是,当文本列较大且表中的行数增加时,使用这些方法有一定的局限性:

MySQL supports text searching by using the LIKE operator and regular expression. However, when the text column is large and the number of rows in a table is increased, using these methods has some limitations:

  • 性能:MySQL 必须扫描整个表才能根据 LIKE 语句中的模式或正则表达式中的模式找到准确的文本.
  • 灵活搜索:使用LIKE运算符和正则表达式搜索,很难有一个灵活的搜索查询,例如,找到描述包含汽车但不包含经典的产品.
  • 相关性排名:无法指定结果集中的哪一行与搜索词更相关.
  • Performance: MySQL has to scan the whole table to find the exact text based on a pattern in the LIKE statement or pattern in the regular expressions.
  • Flexible search: with the LIKE operator and regular expression searches, it is difficult to have a flexible search query e.g., to find product whose description contains car but not classic.
  • Relevance ranking: there is no way to specify which row in the result set is more relevant to the search terms.

由于这些限制,MySQL 扩展了一个非常好的特性,即全文搜索.从技术上讲,MySQL 从启用的全文搜索列的单词创建索引,并在该索引上执行搜索.MySQL 使用复杂的算法来确定与搜索查询匹配的行.

Because of these limitations, MySQL extended a very nice feature so-called full-text search. Technically, MySQL creates an index from the words of the enabled full-text search columns and performs searches on this index. MySQL uses a sophisticated algorithm to determine the rows matched against the search query.

为此,将用于搜索的列必须是 TEXT 类型和 FULLTEXT 类型的索引,索引可以使用 ALTER TABLECREATE INDEX 并且如果您使用 phpMyAdmin 来管理您的数据库,您可以通过转到该表的结构,然后单击该列的操作下的更多并选择全文.

To do that, the columns that will be used for search must be in TEXT type and index of type FULLTEXT, index can be given using ALTER TABLE or CREATE INDEX and if you are using phpMyAdmin to manage your databases, you can do that by going to the Structure of that table, then click on More under Action of that column and choose Fulltext.

之后,您可以使用 MATCH AGAINST 语法执行搜索.MATCH() 获取要搜索的列.AGAINST 需要一个要搜索的字符串,以及一个指示要执行的搜索类型的可选修饰符.

After that you can performe a search using MATCH AGAINST syntax. MATCH() takes the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform.

在某些情况下,用户希望根据他们拥有的知识搜索信息.用户根据自己的经验定义关键字来搜索信息,通常这些关键字太短.

In some cases, users want to search for information based on the knowledge that they have. Users use their experience to define keywords to search for information, and typically those keywords are too short.

为了帮助用户根据太短的关键字查找信息,MySQL全文搜索引擎引入了查询扩展的概念.

To help users to find information based on the too-short keywords, MySQL full-text search engine introduces a concept called query expansion.

查询扩展用于基于自动相关性反馈(或盲查询扩展)扩展全文搜索的搜索结果.从技术上讲,MySQL全文搜索引擎在使用查询扩展时执行以下步骤:

The query expansion is used to widen the search result of the full-text searches based on automatic relevance feedback (or blind query expansion). Technically, MySQL full-text search engine performs the following steps when the query expansion is used:

  • 首先,MySQL 全文搜索引擎会查找与搜索查询匹配的所有行.
  • 其次,它检查搜索结果中的所有行并找到相关词.
  • 第三,它根据相关词而不是用户提供的原始关键字再次执行搜索.

以下示例向您展示了如何搜索产品名称或元数据中至少包含一个词的产品(衬衫 tshirt).

The following example shows you how to search for a product whose product name or meta contains at least one word (shirt tshirt).

SELECT * FROM products WHERE MATCH(product_name,product_meta) AGAINST('shirt tshirt' WITH QUERY EXPANSION)

您可以在 MYSQL 文档(答案开头的链接)和 这里

You can read more info in MYSQL document (the link at the beginning of the answer) and here

也不要错过如何微调MySQL全文搜索

这篇关于PHP/SQL - 改进搜索功能/模糊搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆