用于处理动态分类的专用多面搜索引擎 - 仅有助于提高性能还是提高灵活性? [英] Dedicated faceted search engine for dealing with dynamic taxonomies - helps just with performance or also flexibilty?

查看:13
本文介绍了用于处理动态分类的专用多面搜索引擎 - 仅有助于提高性能还是提高灵活性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在考虑使用类似 ebay 的分类法和依赖于特定产品类别的属性来建模典型的电子商务网站.

I've been thinking for a while about modeling typical ecommerce site with ebay-like taxonomy and attributes dependent on a particular product category.

第一次尝试是在 EAV 和 Table Per Class 数据库继承建模之间进行选择.我选择后者是因为性能,但这意味着为每个特定(类别树中的叶子)产品类别创建专用表,并将特定类别属性(如电视的分辨率)建模为单独的列.

First attempt was choosing between EAV and Table Per Class db inheritance modeling. I've chosen the latter because of the performance, but what it meant was creating dedicated table for each specific (leaf in the category tree) product category with specific category attributes (like resolution for TVs) modeled as a separate column.

如果您需要向现有类别添加属性或添加新类别,则此设置并不灵活.对于每个此类更改,都需要以下内容:

While performant this setup is not flexible if you need adding attributes to the existing categories or adding new categories. For each such change following is needed:

  • 更改/创建表
  • 用于按特定属性过滤此类类别的新表单
  • 用于生成用于搜索和过滤的数据库查询的新代码
  • 用于展示新类别产品的一些新视图模型/DTO 和视图

为了应对这种复杂性,我认为需要在 xml 甚至 excel 文件中(甚至在应用程序之外)对这些属性进行某种元表示,以便在每次更改时,所有提到的代码都可以自动生成(sql/orm 查询、应用程序代码、模板).所以它可以帮助开发,但仍然需要测试和额外的部署.

To cope with that complexity I think some kind of meta representation of those attributes is needed (even outside of the application) in xml or even excel file, so that on each change all mentioned code could be auto-generated (sql/orm queries, application code, templates). So it can help with development, but still testing and extra deployment is needed.

那时我了解到 ebay 并没有真正使用关系数据库进行搜索,而且他们的分类非常灵活,可以很快添加新的叶子类别.此外,它们的类别可能不是在关系数据库中建模的分层树中的类别,而只是搜索属性(方面).

At that point I've learned that ebay doesn't really use relational db for search, and that their taxonomy is so flexible, that they can quite quickly add new leaf categories. Also their categories aren't probably categories from a hierarchical tree modeled in relational db, but just search attributes (facets).

在快速浏览了最有希望的专用分面搜索设置(单独的 Solr 实例)之后,我不确定它是否可以帮助我灵活地应对分类变化,因为通常 Solr 只是以某种方式反映关系数据库,因此特定的类别属性会仍然必须在 DB 中建模为 DBMS 元数据,例如.用于过滤属性的动态生成 UI 表单会很困难,除非:

After having a quick look into most promising dedicated faceted search setup (separate Solr instance) I'm not sure whether it could help me in being flexible to taxonomy changes since usually Solr just mirrors somehow relational DB, so specific category attributes would still have to be modelled in DB as DBMS metadata, so eg. dynamic generating UI forms for filtering attributes would be hard unless:

1) 我会使用 EAV fasion 将数据保存在 RDBMS 中,并通过使用 SOLR 搜索克服其性能问题(但仍然存在 EAV 混乱、没有数据完整性执行等问题)

1) I would keep the data in RDBMS using EAV fasion and overcome its performance problems with using SOLR search (but there still would be problems with EAV messiness, no data integrity enforcement etc)

2) 我会在 RDBMS 中只保留属性字典(即它们的名称和类型),并将特定属性值存储在 SOLR 中,使用它作为一种非关系数据存储,而不是搜索工具.我也不相信这个解决方案(即使它是可能的),因为应用程序将与 solr 紧密耦合(即产品版本管理员 CRUD 将直接与 SOLR 交互).

2) I would keep just the attributes dictionary (ie. just their names and types) in RDBMS and store the specific attribute values in SOLR using it as kind of non-relational data store apart from search facility. I'm not convinced to this solution either (even if it's possible) since application would be coupled to tight with solr (ie. product edition admin CRUD would interact with SOLR directly).

你的想法是什么?您认为对于任何类型的(高性能)分类法灵活性代码生成是不可避免的吗?你会怎么处理?也许只是为了代码生成目的,在数据​​库中以 EAV 方式的一些单独的数据字典?我想我也可以使用 MongoDB 之类的东西,但是 UI 代码生成(运行时与否)仍然需要某种元数据.

What are your thoughts? Do you think that for any kind of such (performant) taxonomy flexibility code generation is inevitable? How would you handle that? Maybe some separate data dictionary in EAV fashion in DB just for code generation purposes? I guess I could also use something like MongoDB, but the UI code generation (runtime or not) would still need some kind of metadata.

这里有很多问题,但我不想把它分解成更小的问题,因为在处理更大类的此类问题时我对通用设计方法感兴趣.

There's lot of question here, but I didn't want to break it up into smaller questions since I'm interested in a general design approach when dealing with a bigger class of such problems.

推荐答案

我并没有声称对所有这些都有明确的答案(这是一个相当开放的问题,您应该尝试将其分解为较小的部分,然后取决于您的实际要求,实际上我很想投票关闭它)但我会评论一些事情:

I don't claim to have a definitive answer to all of this (it's a rather open-ended question which you should try to break into smaller parts and it depends on your actual requirements, in fact I'm tempted to vote to close it) but I will comment on a few things:

  1. 我会忘记在 RDBMS 上对此进行建模.分面搜索在关系模式中不起作用.
  2. IMO 这不是生成代码的正确位置.您应该设计您的代码,使其不会随数据更改而更改(我不是在谈论 schema 更改).
  3. 在 Excel 电子表格中存储元数据/属性似乎是个非常糟糕的主意.我会构建一个 UI 来编辑它,它将存储在 Solr/MongoDB/CouchDB/任何你选择管理它的地方.
  4. Solr 只是镜像关系数据库".事实上,Solr 完全独立于关系数据库.最常见的情况之一 将数据从 RDBMS 转储到 Solr(过程中的非规范化数据),但 Solr 足够灵活,无需任何关系数据源即可工作.
  5. Solr 中的层次分面 仍然是一个开放的研究问题.目前有两种不同的方法正在研究中(SOLR-64SOLR-792)
  1. I would forget about modelling this on a RDBMS. Faceted search just doesn't work in a relational schema.
  2. IMO this is not the right place for code generation. You should design your code so it doesn't change with data changes (I'm not talking about schema changes).
  3. Storing metadata / attributes on an Excel spreadsheet seems like a very bad idea. I'd build a UI to edit this, which would be stored on Solr / MongoDB / CouchDB / whatever you choose to manage this.
  4. Solr does not "just mirror relational DB". In fact, Solr is completely independent of relational databases. One of the most common cases is dumping data from a RDBMS to Solr (denormalizing data in the process), but Solr is flexible enough to work without any relational data source.
  5. Hierarchical faceting in Solr is still an open issue in research. Currently there are two separate approaches being researched (SOLR-64, SOLR-792)

这篇关于用于处理动态分类的专用多面搜索引擎 - 仅有助于提高性能还是提高灵活性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆