用于处理动态分类的专用分面搜索引擎 - 仅有助于提高性能还是灵活性? [英] Dedicated faceted search engine for dealing with dynamic taxonomies - helps just with performance or also flexibilty?

查看:18
本文介绍了用于处理动态分类的专用分面搜索引擎 - 仅有助于提高性能还是灵活性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在考虑使用类似于 ebay 的分类法和依赖于特定产品类别的属性对典型的电子商务网站进行建模.

I've been thinking for a while about modeling typical ecommerce site with ebay-like taxonomy and attributes dependent on a particular product category.

第一次尝试是在 EAV 和 Table Per Class db 继承建模之间进行选择.我之所以选择后者是因为性能,但它的意思是为每个特定(类别树中的叶子)产品类别创建专用表,并将特定类别属性(如电视的分辨率)建模为单独的列.

First attempt was choosing between EAV and Table Per Class db inheritance modeling. I've chosen the latter because of the performance, but what it meant was creating dedicated table for each specific (leaf in the category tree) product category with specific category attributes (like resolution for TVs) modeled as a separate column.

虽然性能良好,但如果您需要向现有类别添加属性或添加新类别,则此设置并不灵活.对于每个此类更改,都需要以下内容:

While performant this setup is not flexible if you need adding attributes to the existing categories or adding new categories. For each such change following is needed:

  • 更改/创建表
  • 用于按特定属性过滤此类类别的新表单
  • 用于生成用于搜索和过滤的数据库查询的新代码
  • 一些用于展示新类别产品的新视图模型/DTO 和视图

为了应对这种复杂性,我认为需要在 xml 甚至 excel 文件中对这些属性进行某种元表示(甚至在应用程序之外),以便在每次更改时自动生成所有提到的代码(sql/orm 查询、应用程序代码、模板).所以它可以帮助开发,但仍然需要测试和额外的部署.

To cope with that complexity I think some kind of meta representation of those attributes is needed (even outside of the application) in xml or even excel file, so that on each change all mentioned code could be auto-generated (sql/orm queries, application code, templates). So it can help with development, but still testing and extra deployment is needed.

那时我了解到 ebay 并没有真正使用关系数据库进行搜索,而且他们的分类非常灵活,他们可以很快地添加新的叶子类别.此外,它们的类别可能不是来自在关系数据库中建模的分层树的类别,而只是搜索属性(方面).

At that point I've learned that ebay doesn't really use relational db for search, and that their taxonomy is so flexible, that they can quite quickly add new leaf categories. Also their categories aren't probably categories from a hierarchical tree modeled in relational db, but just search attributes (facets).

在快速查看最有前途的专用分面搜索设置(单独的 Solr 实例)后,我不确定它是否可以帮助我灵活地应对分类变化,因为通常 Solr 只是以某种方式反映了关系数据库,因此特定的类别属性会仍然必须在 DB 中建模为 DBMS 元数据,例如.动态生成用于过滤属性的 UI 表单将很困难,除非:

After having a quick look into most promising dedicated faceted search setup (separate Solr instance) I'm not sure whether it could help me in being flexible to taxonomy changes since usually Solr just mirrors somehow relational DB, so specific category attributes would still have to be modelled in DB as DBMS metadata, so eg. dynamic generating UI forms for filtering attributes would be hard unless:

1) 我将使用 EAV fasion 将数据保存在 RDBMS 中,并通过使用 SOLR 搜索克服其性能问题(但仍然存在 EAV 混乱、没有数据完整性执行等问题)

1) I would keep the data in RDBMS using EAV fasion and overcome its performance problems with using SOLR search (but there still would be problems with EAV messiness, no data integrity enforcement etc)

2) 我会在 RDBMS 中只保留属性字典(即,只保留它们的名称和类型),并将特定的属性值存储在 SOLR 中,将其用作除搜索工具之外的一种非关系数据存储.我也不相信这个解决方案(即使可能),因为应用程序会与 solr 紧密耦合(即产品版本管理员 CRUD 将直接与 SOLR 交互).

2) I would keep just the attributes dictionary (ie. just their names and types) in RDBMS and store the specific attribute values in SOLR using it as kind of non-relational data store apart from search facility. I'm not convinced to this solution either (even if it's possible) since application would be coupled to tight with solr (ie. product edition admin CRUD would interact with SOLR directly).

你有什么想法?您是否认为对于任何类型的此类(高性能)分类法灵活性代码生成都是不可避免的?你会怎么处理?也许只是为了代码生成目的,在 DB 中以 EAV 方式使用一些单独的数据字典?我想我也可以使用 MongoDB 之类的东西,但是 UI 代码生成(运行时与否)仍然需要某种元数据.

What are your thoughts? Do you think that for any kind of such (performant) taxonomy flexibility code generation is inevitable? How would you handle that? Maybe some separate data dictionary in EAV fashion in DB just for code generation purposes? I guess I could also use something like MongoDB, but the UI code generation (runtime or not) would still need some kind of metadata.

这里有很多问题,但我不想把它分解成更小的问题,因为我对处理更多此类问题时的通用设计方法很感兴趣.

There's lot of question here, but I didn't want to break it up into smaller questions since I'm interested in a general design approach when dealing with a bigger class of such problems.

推荐答案

我并不声称对所有这些都有明确的答案(这是一个相当开放的问题,您应该尝试分解成更小的部分,取决于您的实际要求,实际上我很想投票关闭它)但我会评论一些事情:

I don't claim to have a definitive answer to all of this (it's a rather open-ended question which you should try to break into smaller parts and it depends on your actual requirements, in fact I'm tempted to vote to close it) but I will comment on a few things:

  1. 我会忘记在 RDBMS 上进行建模.分面搜索在关系模式中不起作用.
  2. IMO 这不是生成代码的正确位置.您应该设计您的代码,使其不会随着数据的变化而变化(我不是在谈论架构变化).
  3. 在 Excel 电子表格中存储元数据/属性似乎是一个非常糟糕的主意.我会构建一个 UI 来编辑它,它将存储在 Solr/MongoDB/CouchDB/任何你选择管理它的地方.
  4. Solr 只是镜像关系数据库".实际上,Solr 完全独立于关系数据库.最常见的情况之一是将数据从 RDBMS 转储到 Solr(在此过程中对数据进行非规范化处理),但 Solr 足够灵活,无需任何关系数据源即可工作.
  5. Solr 中的分层分面 仍然是研究中的一个悬而未决的问题.目前正在研究两种不同的方法(SOLR-64SOLR-792)
  1. I would forget about modelling this on a RDBMS. Faceted search just doesn't work in a relational schema.
  2. IMO this is not the right place for code generation. You should design your code so it doesn't change with data changes (I'm not talking about schema changes).
  3. Storing metadata / attributes on an Excel spreadsheet seems like a very bad idea. I'd build a UI to edit this, which would be stored on Solr / MongoDB / CouchDB / whatever you choose to manage this.
  4. Solr does not "just mirror relational DB". In fact, Solr is completely independent of relational databases. One of the most common cases is dumping data from a RDBMS to Solr (denormalizing data in the process), but Solr is flexible enough to work without any relational data source.
  5. Hierarchical faceting in Solr is still an open issue in research. Currently there are two separate approaches being researched (SOLR-64, SOLR-792)

这篇关于用于处理动态分类的专用分面搜索引擎 - 仅有助于提高性能还是灵活性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆