具有多个值字段和构面的Solr DIH [英] Solr DIH with multi value fields and faceting

查看:99
本文介绍了具有多个值字段和构面的Solr DIH的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Solr使用SQL DIH索引存储在DBMS中的数据集.桌子上的一个使用n对n的关系.只是为了简单起见(我的应用程序比这复杂得多),下面是该应用程序的示例:一个人有一个名字,并且具有0..n个角色(角色由role_name字符串描述).

I’ m using Solr to index a dataset stored in DBMS using SQL DIH. One on the table use a n-to-n relationship. Just for sake of simplicity (my app is much more complex than this) here is an example of the application: a person has a name and it has associated 0..n roles (a role is described by a role_name string).

Table Person:
- id: int
- Name: string

Table roles
- id: int
- role_name: string

Table association
- id_person: int
- id_role: int

两个人可以形容为:

id=1, name=John Doe, roles=[programmer, father, soccer player]
id=2, name= Eric Smith, roles=[]

这是我想用solr实现的目标.

Here what I would like to achieve with solr.

  1. 用DIH导入数据(可能使用嵌套的sql查询吗?)
  2. 使用所有人员信息和人员角色查询并显示数据
  3. 能够使用给定角色进行查询,例如告诉我所有具有role = programmer的人吗?
  4. 设置构面,以创建所有角色的列表,每个角色都有整个数据集中出现的次数

我希望这在solr中是可行的(我使用的是6.4版,但我可以轻松升级到最新的6.5版).有人能解释如何做或指向正确的信息/教程吗?

I expect this to be possible with solr (I am using version 6.4, but I can easily upgrade to latest 6.5). Does anybody can explain how to do it or point to proper information/tutorial?

谢谢

UMG

推荐答案

您需要考虑的一些棘手的事情:

some tricky things you need to take into account:

  1. 您将角色定义为多值

  1. you define roles as multivalued

 <field name="roles" type="string" indexed="true" stored="true" multiValued="true"/>

为了获得最佳性能,请在DIH设置中使用

  • 进行此操作(这是针对mysql,请根据需要对DB进行修改):左联接,因此您可以运行单个查询(比运行内部查询快得多)每人),并使用sql GROUP BY和一个转换器将角色按摩到多值字段中:

  • in the DIH setup, for optimal performance, do it like this (this is for mysql, do modify as needed for you DB): left join so you run a single query (much faster than running an inner query per person), and use sql GROUP BY, and a transformer to massage roles into the multivalued field:

     <entity name="person" pk="id" transformer="RegexTransformer" query="
        SELECT p.id... GROUP_CONCAT(DISTINCT COALESCE(r.name,'') SEPARATOR '|') AS roles FROM person p LEFT JOIN association a ON p.id_person = a.id_role LEFT JOIN roles r ON a.id_role=r.id 
        WHERE ...
        GROUP BY p.id, ...
            ">
        <field column="roles" name="roles" splitBy="\|"/>
    </entity>
    

  • 这主要是为了获得最佳索引性能.将其编入索引后,要运行的查询非常基本.

    This is mostly for optimal indexing perf. Once you have it indexed, the queries you want to run are pretty basic.

    上面的conf是手写的,未经测试,可能有错别字,但希望您能理解.

    The conf above is hand written and not tested, there might be some typo etc, but hope you get the gist of it.

    这篇关于具有多个值字段和构面的Solr DIH的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆