如何确定SOLR索引的字段类型? [英] How to determine field-type for SOLR indexing?

查看:256
本文介绍了如何确定SOLR索引的字段类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MySQL表中有两个表字段。
一个是VARCHAR,是分类(分类广告网站)的标题。
另一个是TEXT字段,其中包含已分类的文本。

I have two table fields in a MySQL table. One is VARCHAR and is a "headline" for a classified (classifieds website). The other is TEXT field which contains the "text" for the classified.

两个问题:

我应该如何确定如何索引这两个字段?(什么字段类型,要使用的类等)

Two Questions:
How should I determine how to index these two fields? (what field-type, what classes to use etc)

目前我有一个ad_id作为每个广告的唯一标识符,例如bmw_m3_82398292。

如果SOLR找到'查询匹配',我怎样才能让SOLR返回此标识符?
(标识符的第一部分实际上是标题字段内容,第二部分是选择的随机数)

Currently I have an "ad_id" as a unique identifier for each ad, example "bmw_m3_82398292".
How can I make SOLR return this identifier whenever a 'query match' is found by SOLR? (The first part of the identifier is actually the headline fields content, the second part is a random number chosen)

谢谢

推荐答案

1。架构

您的Solr架构很大程度上取决于您的预期搜索行为。在schema.xml文件中,您将看到一系列选项,如text和string。它们的行为不同。

Your Solr schema is very much determined by your intended search behavior. In your schema.xml file, you'll see a bunch of choices like "text" and "string". They behave differently.

<fieldtype name="string" class="solr.StrField" sortMissingLast="true"     omitNorms="true"/>

字符串字段类型是文字字符串匹配。它将在SQL语句中像 == 一样运行。

The string field type is a literal string match. It would operate like == in a SQL statement.

<fieldtype name="text_ws"   class="solr.TextField"          positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>
</fieldtype>

text_ws字段类型进行标记化。但是, text 字段的一个很大区别是停用词和分隔符以及下限的过滤器。注意如何为Lucene索引和Solr查询指定这些过滤器。因此,在搜索文本字段时,它会使用这些过滤器调整查询字词以帮助查找匹配项。

The text_ws field type does tokenization. However, a big difference in the text field is the filters for stop-words and delimiters and lower-casing. Notice how these filters are designated for both the Lucene index and the Solr query. So when searching a text field, it will adapt the query terms using these filters to help find a match.

<fieldtype name="text"      class="solr.TextField"  positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter ..... />
    <filter ..... />
    <filter ..... />
  </analyzer>
</fieldtype>

例如,在为新闻报道编制索引时,您可能希望以不同方式搜索公司名称和标题。

When indexing things like news stories, for example, you probably want to search for company names and headlines differently.

<field name="headline" type="text" />
<field name="coname" type="string" indexed="true" multiValued="false" omitNorms="true" />

以上示例允许您进行类似& coname的搜索:英特尔和标题:处理器+规格并检索完全符合英特尔故事的匹配。

The above example would allow you to do a search like &coname:Intel&headline:processor+specifications and retrieve matches hitting exactly Intel stories.

如果您想搜索范围

2。结果字段

您可以在 RequestHandler

<requestHandler name="mumble" class="solr.DisMaxRequestHandler" >
    <str name="fl">
        category,coname,headline
    </str>
</requestHandler>

您还可以使用 fl在查询字符串中定义所需的字段参数。:

/select?indent=on&version=2.2&q=coname%3AIn*&start=0&rows=10&fl=coname%2Cid&qt=standard

您还可以使用字段在查询字词中选择范围:[x TO *] 语法。如果您想按日期选择特定广告,可以使用

You can also select ranges in your query terms using the field:[x TO *] syntax. If you wanted to select certain ads by their date , you might build a query with

ad_date:[20100101 TO 20100201]

。 (搜索范围有很多种方法,我提出的方法是使用整数而不是Date类。)

in your query terms. (There are many ways to search ranges, I'm presenting a method that uses integers instead of Date class.)

这篇关于如何确定SOLR索引的字段类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆