向下钻取/过滤搜索的设计模式 [英] Design Pattern for Drilldown / Filtered Search

查看:39
本文介绍了向下钻取/过滤搜索的设计模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望为网站构建一个强大的搜索功能,类似于 NewEgg 的下钻搜索,例如,

I'm looking to build a powerful search feature for a site, similar to NewEgg's drilldown search, e.g.,

http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=2010150014%201035507776&name=7200%20RPM

我正在处理各种类似于具有不同标准的产品的对象.任何人都可以推荐一个好的设计来构建像 NewEgg 的搜索引擎吗?

I'm working with a variety of objects similar to products which have different criteria. Can anyone recommend a good design for building a search engine like NewEgg's?

推荐答案

垂直"存储数据,即在 实体-属性-值 (EAV) 格式,连同 EAV 隐含的 [元] 数据驱动模式管理,提供了一个框架,其中每个产品的属性彼此独立".这反过来又促进了下钻的实施(即查询的引导细化,其中在每个步骤向最终用户提供仍然适用的可能属性列表,对于每个这样的属性提供可能值列表).

Storing the data "vertically", i.e. in an Entity-Attribute-Value (EAV) format, along with the [meta]data-driven schema management implicit to EAV, provide a framework where each product's attributes are "independent" from one another. This, in turn, facilitates the implementation of drill-down (i.e. guided refinement of the query, where at each step the end-user is supplied with the list of possible attributes still applicable, for each such attribute the list of possible values).

需要注意的是,这更适用于较小的目录(比如少于 100 万个产品),因为 EAV 模型可能会在较大的数据库中引入一些性能和/或扩展问题.性能受到关注的实际大小因目录的具体情况而异(每个产品的平均属性数、不同类型产品之间属性的共性、本体"的一般复杂性等),但 EAV 是相当小目录的方法.除了对向下钻取"过滤的支持外,它还介绍了灵活的数据模式(添加/删除属性和/或产品类型等的能力,无需更改物理(数据库)模式;只有逻辑模式是修改).

A small caveat is that this is better applicable to smaller catalogs (say fewer than 1 Million products), for the EAV model can introduce some performance and/or scaling issues with bigger databases. The actual size at which performance is a concern varies with the specifics of the catalog (average number of attributes per product, commonality of attributes between products of a different type, general complexity of the "ontology" etc.), but EAV is quite the way to go for smaller catalogs. In addition to its support for the "drill down" filtering describes it introduces flexible data schema (ability to add/remove attributes and/or product types etc., without requiring a change of the physical (database) schema; only the logical schema is altered).

编辑:有关 EAV 的更多详细信息/资源
诚然,维基百科关于它的文章有些抽象......
简而言之,该模型确定了以下概念:

Edit: more detail/resources on EAV
Admittedly, the Wikipedia article about it is somewhat abstract...
In a nutshell, the model identify the following concepts:

  • 实体(又名产品或物品)= 传统关系术语中的记录"
  • 属性 = RDBMS 术语中的列"(又名字段")
  • 值 = 给定记录的给定列的数字(或字符串或其他)值.
  • 类型(又名类别)= [松散地] RDBMS 中的表",即一组通常共享相同属性集的记录.

为了说明这一点,例如,电子产品目录,实体可以是特定的纯平显示器",其类型可以是显示设备",其属性可以是尺寸"、分辨率"、价格"等.

To illustrate this with, say, an electronics goods catalog, an entity could be a particular "Flat Screen Monitor", its Type could be "Display Devices", its Attributes "Size", "Resolution", "Price" etc.

使用 EAV,大部分信息存储在两个表中,例如 Product 表和 ProductAttributes 表:

With EAV, the bulk of the information is stored in two tables called say the Product table, and the ProductAttributes table:

Product table  
   "ProductID" (primary key, the "EntityId")
    "TypeId" 
    optionally, some common attributes found in all/most other Products, say...
      price
      ManufacturerId
      Photo

ProductAttributes table
    "ProductID" (Foreign Key to Product table)
    "AttributeID"  (FK to Attribute table)
    "Value"   (actual value; note: sometimes we can have several SQL fields for
               this say IntValue, StringValue, DateValue, allowing to store 
               values in their natural format)

以上表格构成了数据的主体,并辅以存储目录[逻辑]模式的表格,也称为元数据".这些表包括:

The tables above constitute the bulk of the data, and it is complemented by tables storing the [logical] schema of the catalog, also known as the "metadata". These tables include:

  • 定义属性的属性表:名称、数据类型、isRequired 等.
  • 定义类型(类别)的类型表:名称,在分层本体的情况下可能是父类型.
  • Type_Attributes,其中列出了给定类型的可能属性(例如:电视机"具有属性频道数"、屏幕尺寸"等,而VCR"具有属性头数"、支持的格式"、体色"等

与逻辑模式在 SQL 模式中硬编码"的传统方法相比,所有这些似乎有些复杂,即我们有一个TVSets"表,每个属性的列集,然后是VCR"" 具有自己的不同列/属性集的表.然而,通过这种方法,应用程序逻辑最终会以某种方式(如果只是通过排序映射中的间接)对表名和列名进行硬编码.
相比之下,EAV 模型允许程序发现可能类型的列表,以及对于每种类型的可能属性列表(必需的或可选的).此外,由于属性值都存储在同一个表中,因此可以过滤属性而不考虑产品的类型(或子类型).例如,要获得所有低于 50 美元的商品(在另一种方法中,我们可能不得不查看数十张桌子).

All this may seem somewhat complicated, compared with the traditional approach whereby the logical schema is "hardcoded" within the SQL schema, i.e. we have one "TVSets" Table with its set of columns one per attribute, and then a "VCR" table with its own, different set of columns/attributes. However with such an approach the application logic ends up hard-coding in some fashion (if only through an indirection in a map of sorts) the table and column names.
In contrast, the EAV model, allows the program to discover the list of possible types, and, for each type the list of possible attributes (either required or optional). Also, since the attribute values are all stored in the same table, it is possible to filter on attributes irrespective of the type (or sub-type) of the product. For example to get all items cheaper than 50 dollars (in the other approach we may have had to look in dozen of tables for that).

回到向下钻取"功能...
一旦执行初始搜索(例如搜索名称 [全文索引] 包含单词屏幕"的所有产品),ProductAttributes 表可以生成所有不同 AttributeID 的不同列表(因此通过在 Attributes 表中查找属性名称)满足第一个搜索条件的产品.
在用户选择给定的属性后(比如制造商",ProductAttributes 表可以生成不同的制造商列表(以及每个制造商的产品数量).(或者,可以最初而不是懒惰地搜索此类信息,当用户请求).
然后用户选择一个给定的制造商(或几个),并运行一个新的查询以减少初始结果列表.可能的属性列表(以及每个属性内的可能值列表)减少了,因为最初选择的一些产品(实体)现在被排除在外.
该过程继续进行,为最终用户提供对目录的引导搜索.当然用户可能会回溯等.

Back to the "drill-down" feature...
Once an initial search is performed (say searching all products where name [full-text indexed] contains the word "screen"), the ProductAttributes table can produce the distinct list of all different AttributeID (hence attribute name by lookup in Attributes table) for product satisfying this first search criteria.
Upon the user selecting a given attribute (say "Manufacturer", the ProductAttributes table can produce the distinct list of manufacturers (along with the number of products for each manufacturer). (alternatively such info can be searched initially rather than lazily, when the the user requests it).
The user then selects a given Manufacturer (or several), and a new query is ran to reduce the initial results list. The list of possible Attributes (and within each attributes the list of possible values) decreases, since some products (entities) initially selected are now excluded.
The process continues, providing the end user with guided search into the catalog. Of course the user may backtrack etc.

为了帮助解释这个冗长的解释(或者可能进一步混淆读者......),下面的代码片段提供了一个更准确的指示,说明了这种结构可以用来实现搜索的方式.此代码适用于上述解释中使用的表名,可能包含一些拼写错误,但通常提供事物的风味.此外,这是用公共表表达式 (CTE) 编写的,但也可以编写为子查询.也不是我们不连接逻辑模式(元数据)表,但也可以这样做,直接在结果集中获取属性名称、类型名称等.
正如前面所暗示的,支持这种架构的查询和逻辑更复杂,但也更通用,更能容忍存储的项目类型及其属性的变化.当然,此类查询是根据最终用户提供的当前搜索条件列表动态生成的.

To maybe help with this wordy explanation (or maybe to further confuse the reader...) the following snippet provides a more precise indication of way this structure can be used to implement searches. This code is adapted for the table names used in the explanation above and may include a few typos, but generally provide the flavor of things. Also, this is written with a Common Table Expression (CTE) but could well be written as a subquery. Also not the that we do not join with the logical schema (meta data) tables, but that could be done too, to get the attribute names, type name and such, directly in the resultset.
As hinted earlier the queries and logic supporting this architecture are more complicated but also more versatile and tolerant of changes in the type of items stored and their attributes. Of course, this type of query is generated dynamically, based on the current list of search criteria supplied by the end-user.

WITH SearchQry AS (
  SELECT ROW_NUMBER() OVER (ORDER BY P.EntityId ASC) AS RowNum,  
         P.EntityId AS EId
         FROM  Products P
         INNER JOIN ProductAttributes PA1 ON P.EntitityId = PA1.EntityId and PA1.AttributeID = <some attribute id, say for Manufacturer> 
         INNER JOIN ProductAttributes PA2 ON P.EntitityId = PA2.EntityId and PA2.AttributeID = <some other attribute id, say for Color>
         -- here for additional PAn JOINs as more criteria is added
         WHERE  P.ProductType IN (ProdId_x, ProdId_y, ProdId_z)  -- for example where these x,y,z Ids correspond to say "TV Sets", "LapTop Computers" and "PDAs" respectively
            AND PA1.Value = 'SAMSUNG' -- for example
            AND PA2.Value = 'YELLOW' -- for example
         GROUP BY P.EntityId
   )  

SELECT  P.EntityId, PA.AttributeId, PA.Value -- PA.IntValue (if so structured)
FROM (SELECT * FROM SearchQry WHERE RowNum BETWEEN  1 AND  15)  AS S
JOIN ProductAttributes PA ON PA.EntityId = S.EId
INNER JOIN Products P on P.EntityID = PA.EntityId
ORDER BY P.EntityId, P.AttributeId  -- or some other sort order

抱歉,解释太长了,网上可能[可能]对此有更好的描述,但我还没有找到...

Sorry for the long explanation, there's maybe [probably] a better description of this online, but I haven't found it...

这篇关于向下钻取/过滤搜索的设计模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆