Solr不搜索整数? [英] Solr does not search into integers?

查看:152
本文介绍了Solr不搜索整数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用Solr为电子商务网站开发搜索引擎。所以我在我的schema.xml中得到这两个字段:

 < field name =skutype =stringindexed =truestored =truerequired =false/> 
< field name =collectiontype =stringindexed =truestored =truerequired =false/>

(完整的schema.xml如下)



有关信息:




  • sku看起来像这样:959620,929345,912365,...

  • 集合看起来像这样:Alcott,Spigrim,Tantal,...



例如,当我查找:

  http:// localhost:8080 / solr / myindex / select /?q = Alcott 

我获得了所有带有Alcott集合的产品。



但是当我查找时;

  http:// localhost:8080 / solr / myindex / select /?q = 959620 

我什么都没有。



但是,当我深入这个请求,

  http:// localhost:8080 / solr / myindex / select /? q = sku:969520 



我已将产品附加到此SKU。



有没有办法有q = 969520工作?

感谢您的帮助!



schema.xml:

 <?xml version =1.0encoding =UTF-8 > 

< schema name =exampleversion =1.2>


< types>

< fieldType name =stringclass =solr.StrFieldsortMissingLast =trueomitNorms =true/>

<! - boolean type:true或false - >
< fieldType name =booleanclass =solr.BoolFieldsortMissingLast =trueomitNorms =true/>
<! - 二进制数据类型。数据应该作为Base64编码字符串发送/检索。
< fieldtype name =binaryclass =solr.BinaryField/>


< fieldType name =intclass =solr.TrieIntFieldprecisionStep =0omitNorms =truepositionIncrementGap =0/>
< fieldType name =floatclass =solr.TrieFloatFieldprecisionStep =0omitNorms =truepositionIncrementGap =0/>
< fieldType name =longclass =solr.TrieLongFieldprecisionStep =0omitNorms =truepositionIncrementGap =0/>
< fieldType name =doubleclass =solr.TrieDoubleFieldprecisionStep =0omitNorms =truepositionIncrementGap =0/>


< fieldType name =tintclass =solr.TrieIntFieldprecisionStep =8omitNorms =truepositionIncrementGap =0/>
< fieldType name =tfloatclass =solr.TrieFloatFieldprecisionStep =8omitNorms =truepositionIncrementGap =0/>
< fieldType name =tlongclass =solr.TrieLongFieldprecisionStep =8omitNorms =truepositionIncrementGap =0/>
< fieldType name =tdoubleclass =solr.TrieDoubleFieldprecisionStep =8omitNorms =truepositionIncrementGap =0/>


< fieldType name =dateclass =solr.TrieDateFieldomitNorms =trueprecisionStep =0positionIncrementGap =0/>

<! - 基于Trie的日期字段,用于更快的日期范围查询和日期分面。 - >
< fieldType name =tdateclass =solr.TrieDateFieldomitNorms =trueprecisionStep =6positionIncrementGap =0/>



< fieldType name =pintclass =solr.IntFieldomitNorms =true/>
< fieldType name =plongclass =solr.LongFieldomitNorms =true/>
< fieldType name =pfloatclass =solr.FloatFieldomitNorms =true/>
< fieldType name =pdoubleclass =solr.DoubleFieldomitNorms =true/>
< fieldType name =pdateclass =solr.DateFieldsortMissingLast =trueomitNorms =true/>



< fieldType name =sintclass =solr.SortableIntFieldsortMissingLast =trueomitNorms =true/>
< fieldType name =slongclass =solr.SortableLongFieldsortMissingLast =trueomitNorms =true/>
< fieldType name =sfloatclass =solr.SortableFloatFieldsortMissingLast =trueomitNorms =true/>
< fieldType name =sdoubleclass =solr.SortableDoubleFieldsortMissingLast =trueomitNorms =true/>


< fieldType name =randomclass =solr.RandomSortFieldindexed =true/>


<! - 一个文本字段,只分割在空格中,用于字词的精确匹配 - >
< fieldType name =text_wsclass =solr.TextFieldpositionIncrementGap =100>
< analyzer>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
< / analyzer>
< / fieldType>


< fieldType name =textclass =solr.TextFieldpositionIncrementGap =100>
< analyzer type =index>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
<! - 在这个例子中,我们只使用查询时的同义词
< filter class =solr.SynonymFilterFactorysynonyms =index_synonyms.txtignoreCase =trueexpand =假/>
- >
<! - 不区分大小写的停用词删除。
在索引和查询
分析器中添加enablePositionIncrements = true,为更准确的短语查询留下空白。
- >
< filter class =solr.StopFilterFactory
ignoreCase =true
words =stopwords.txt
enablePositionIncrements =true
/
< filter class =solr.WordDelimiterFilterFactorygenerateWordParts =1generateNumberParts =1catenateWords =1catenateNumbers =1catenateAll =0splitOnCaseChange =1/&
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.SnowballPorterFilterFactorylanguage =Englishprotected =protwords.txt/>
< / analyzer>
< analyzer type =query>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
< filter class =solr.SynonymFilterFactorysynonyms =synonyms.txtignoreCase =trueexpand =true/>
< filter class =solr.StopFilterFactory
ignoreCase =true
words =stopwords.txt
enablePositionIncrements =true
/
< filter class =solr.WordDelimiterFilterFactorygenerateWordParts =1generateNumberParts =1catenateWords =0catenateNumbers =0catenateAll =0splitOnCaseChange =1/&
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.SnowballPorterFilterFactorylanguage =Englishprotected =protwords.txt/>
< / analyzer>
< / fieldType>

< fieldType name =text_frclass =solr.TextFieldpositionIncrementGap =100>

< analyzer type =query>
<! - normalization des accents,cédilles,e dans l'o,... - >
< charFilter class =solr.MappingCharFilterFactorymapping =mapping-ISOLatin1Accent.txt/>
<! - découpageselon les espaces - >
< tokenizer class =solr.WhitespaceTokenizerFactory/>
<! - suppression de la ponctuation - >
< filter class =solr.PatternReplaceFilterFactorypattern =^(\p {Punct} *)(。*?)(\p {Punct} *)$replacement =$ 2/> ;
<! - suppression des tokens vides et des motsdémesurés - >
< filter class =solr.LengthFilterFactorymin =1max =100/>
<! - passage en minuscules - >
< filter class =solr.LowerCaseFilterFactory/>
<! - suppression desélisions(l',qu',...) - >
< filter class =solr.ElisionFilterFactoryarticles =elisionwords_fr.txt/>
<! - découpagedes motscomposés - >
< filter class =solr.WordDelimiterFilterFactorysplitOnCaseChange =1splitOnNumerics =1stemEnglishPossessive =1generateWordParts =1
generateNumberParts =1catenateWords =1catenateNumbers = 1catenateAll =1preserveOriginal =1/> ;.
<! - suppression des mots insignifiants - >
< filter class =solr.StopFilterFactoryignoreCase =1words =stopwords_fr.txtenablePositionIncrements =true/>
<! - gestion des synonymes - >
< filter class =solr.SynonymFilterFactorysynonyms =synonyms_fr.txtignoreCase =trueexpand =true/>
<! - partie de mot - >
< filter class =solr.EdgeNGramFilterFactoryminGramSize =3maxGramSize =6/>
<! - lemmatisation(pluriels,...) - >
< filter class =solr.SnowballPorterFilterFactorylanguage =Frenchprotected =protwords_fr.txt/>
<! - suppression des doublonséventuels - >
< filter class =solr.RemoveDuplicatesTokenFilterFactory/>
< / analyzer>



< analyzer type =index>
<! - normalization des accents,cédilles,e dans l'o,... - >
< charFilter class =solr.MappingCharFilterFactorymapping =mapping-ISOLatin1Accent.txt/>
<! - découpageselon les espaces - >
< tokenizer class =solr.WhitespaceTokenizerFactory/>
<! - suppression de la ponctuation - >
< filter class =solr.PatternReplaceFilterFactorypattern =^(\p {Punct} *)(。*?)(\p {Punct} *)$replacement =$ 2/> ;
<! - suppression des tokens vides et des motsdémesurés - >
< filter class =solr.LengthFilterFactorymin =1max =100/>
<! - passage en minuscules - >
< filter class =solr.LowerCaseFilterFactory/>
<! - suppression desélisions(l',qu',...) - >
< filter class =solr.ElisionFilterFactoryarticles =elisionwords_fr.txt/>
<! - découpagedes motscomposés - >
< filter class =solr.WordDelimiterFilterFactorysplitOnCaseChange =1splitOnNumerics =1stemEnglishPossessive =1generateWordParts =1
generateNumberParts =1catenateWords =1catenateNumbers = 1catenateAll =1preserveOriginal =1/>
<! - suppression des mots insignifiants - >
< filter class =solr.StopFilterFactoryignoreCase =1words =stopwords_fr.txtenablePositionIncrements =true/>
<! - gestion des synonymes - >
< filter class =solr.SynonymFilterFactorysynonyms =synonyms_fr.txtignoreCase =trueexpand =true/>
<! - partie de mot - >
< filter class =solr.EdgeNGramFilterFactoryminGramSize =3maxGramSize =6/>
<! - lemmatisation(pluriels,...) - >
< filter class =solr.SnowballPorterFilterFactorylanguage =Frenchprotected =protwords_fr.txt/>
<! - suppression des doublonséventuels - >
< filter class =solr.RemoveDuplicatesTokenFilterFactory/>
< / analyzer>
< / fieldType>




<! - 不太灵活的匹配,但较少的假匹配。可能不是理想的产品名称,
,但可能是好的SKU。可以在错误的地方插入破折号,但仍然匹配。 - >
< fieldType name =textTightclass =solr.TextFieldpositionIncrementGap =100>
< analyzer>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
< filter class =solr.SynonymFilterFactorysynonyms =synonyms.txtignoreCase =trueexpand =false/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txt/>
< filter class =solr.WordDelimiterFilterFactorygenerateWordParts =0generateNumberParts =0catenateWords =1catenateNumbers =1catenateAll =0/>
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.SnowballPorterFilterFactorylanguage =Englishprotected =protwords.txt/>
<! - 此过滤器可以删除出现在相同位置的任何重复的令牌 - 有时
可能与WordDelimiterFilter在词干的结合。 - >
< filter class =solr.RemoveDuplicatesTokenFilterFactory/>
< / analyzer>
< / fieldType>


<! - 一个未存储的文本字段 - 如果不知道该字段的语言 - >
< fieldType name =textgenclass =solr.TextFieldpositionIncrementGap =100>
< analyzer type =index>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
< filter class =solr.WordDelimiterFilterFactorygenerateWordParts =1generateNumberParts =1catenateWords =1catenateNumbers =1catenateAll =0splitOnCaseChange =0/&
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< analyzer type =query>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
< filter class =solr.SynonymFilterFactorysynonyms =synonyms.txtignoreCase =trueexpand =true/>
< filter class =solr.StopFilterFactory
ignoreCase =true
words =stopwords.txt
enablePositionIncrements =true
/
< filter class =solr.WordDelimiterFilterFactorygenerateWordParts =1generateNumberParts =1catenateWords =0catenateNumbers =0catenateAll =0splitOnCaseChange =0/&
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>


<! - 通常未标记的文本字段,通常对索引进行索引,并且
反转(通过ReversedWildcardFilterFactory),以使更高效的
引导通配符查询。 - >
< fieldType name =text_revclass =solr.TextFieldpositionIncrementGap =100>
< analyzer type =index>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
< filter class =solr.WordDelimiterFilterFactorygenerateWordParts =1generateNumberParts =1catenateWords =1catenateNumbers =1catenateAll =0splitOnCaseChange =0/&
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.ReversedWildcardFilterFactorywithOriginal =true
maxPosAsterisk =3maxPosQuestion =2maxFractionAsterisk =0.33/>
< / analyzer>
< analyzer type =query>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
< filter class =solr.SynonymFilterFactorysynonyms =synonyms.txtignoreCase =trueexpand =true/>
< filter class =solr.StopFilterFactory
ignoreCase =true
words =stopwords.txt
enablePositionIncrements =true
/
< filter class =solr.WordDelimiterFilterFactorygenerateWordParts =1generateNumberParts =1catenateWords =0catenateNumbers =0catenateAll =0splitOnCaseChange =0/&
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>


< fieldType name =alphaOnlySortclass =solr.TextFieldsortMissingLast =trueomitNorms =true>
< analyzer>
<! - KeywordTokenizer没有实际的标记化,所以整个
输入字符串保存为单个标记
- >
< tokenizer class =solr.KeywordTokenizerFactory/>
<! - The LowerCase TokenFilter做你所期望的,当你希望你的排序不区分大小写时,它可以是

- >
< filter class =solr.LowerCaseFilterFactory/>
<! - TrimFilter删除任何前导或尾随的空格 - >
< filter class =solr.TrimFilterFactory/>

< filter class =solr.PatternReplaceFilterFactory
pattern =([^ a-z])replacement =replace =all
/
< / analyzer>
< / fieldType>

< fieldtype name =phoneticstored =falseindexed =trueclass =solr.TextField>
< analyzer>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.DoubleMetaphoneFilterFactoryinject =false/>
< / analyzer>
< / fieldtype>

< fieldtype name =payloadsstored =falseindexed =trueclass =solr.TextField>
< analyzer>
< tokenizer class =solr.WhitespaceTokenizerFactory/>
<! -
DelimitedPayloadTokenFilter可以将有效载荷放在令牌上...例如,
afoo | 1.4的令牌将被索引为foo,有效载荷为1.4f
DelimitedPayloadTokenFilterFactory的属性:
delimiter - 单字符分隔符。默认为| (pipe)
encoder - 如何将以下值编码到playload中
float - > org.apache.lucene.analysis.payloads.FloatEncoder,
integer - > o.a.l.a.p.IntegerEncoder
identity - > o.a.l.a.p.IdentityEncoder
完全限定类名实现PayloadEncoder,编码器必须有一个无参数构造函数。
- >
< filter class =solr.DelimitedPayloadTokenFilterFactoryencoder =float/>
< / analyzer>
< / fieldtype>

<! - 降低整个字段值,将其保留为单个令牌。 - >
< fieldType name =lowercaseclass =solr.TextFieldpositionIncrementGap =100>
< analyzer>
< tokenizer class =solr.KeywordTokenizerFactory/>
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>


<! - 由于此类型的字段默认不存储或索引,
添加到它们的任何数据将被直接忽略。 - >
< fieldtype name =ignoredstored =falseindexed =falsemultiValued =trueclass =solr.StrField/>

< / types>


< fields>
<! - Vu fields - >
< field name =idtype =stringindexed =truestored =truerequired =true/>
< field name =skutype =stringindexed =truestored =truerequired =false/>
< field name =collectiontype =stringindexed =truestored =truerequired =false/>
< field name =titletype =text_frrequired =false/>
< field name =descriptiontype =text_frrequired =false/>
< field name =pricetype =floatrequired =falseindexed =truestored =false/>
< field name =brand_idtype =textrequired =false/>
< field name =date_onlinetype =daterequired =false/>
< field name =product_typetype =textrequired =false/>
< field name =selection_idtype =sintrequired =falsemultiValued =trueindexed =truestored =false/>
< field name =stock_delaytype =sintrequired =false/>
< field name =stocktype =sintrequired =false/>
< field name =price_typetype =sintrequired =false/>
< field name =main_product_idtype =textrequired =false/>
< field name =date_pricetype =daterequired =false/>
<! - attributes - >
< dynamicField name =attr_ *type =sintindexed =truemultiValued =true/>

< field name =attr_13type =intindexed =truemultiValued =false/>
< field name =attr_14type =intindexed =truemultiValued =false/>
< field name =attr_19type =intindexed =truemultiValued =false/>

<! - Ce champ contiendra la copie de tous les autres,pour faciliter la recherche - >
< field name =globaltype =text_frrequired =falsemultiValued =true/>


<! - 字段的有效属性:
name:mandatory - 字段的名称
type:mandatory - 以前定义的类型的名称从
< types>
indexed:true如果此字段应该被索引(可搜索或可排序)
stored:如果此字段应该可检索,则为true
compressed:[false]如果此字段应使用gzip压缩
(这将仅适用于字段类型是可压缩的;在
之间的标准字段类型,只有TextField和StrField)
multiValued:如果此字段可能包含多个值每个文档
$ b omitNorms:(expert)设置为true以忽略与
相关的规范此字段(这将禁用字段的长度归一化和索引时间
boosting,并保存一些内存)。只有需要索引时间提升的全文
字段或字段需要规范。
termVectors:[false]设置为true以存储
给定字段的术语向量。
当使用MoreLikeThis时,用于相似性的字段应为
存储以获得最佳性能。
termPositions:存储位置信息与术语向量。
这将增加存储成本。
termOffsets:将偏移量信息与术语向量一起存储。这
将增加存储成本。
default:如果未指定值,应使用的值
添加文档时。
- >
<! -
< field name =idtype =stringindexed =truestored =truerequired =true/>
< field name =skutype =textTightindexed =truestored =trueomitNorms =true/>
< field name =nametype =textgenindexed =truestored =true/>
< field name =alphaNameSorttype =alphaOnlySortindexed =truestored =false/>
< field name =manutype =textgenindexed =truestored =trueomitNorms =true/>
< field name =cattype =text_wsindexed =truestored =truemultiValued =trueomitNorms =true/>
< field name =featurestype =textindexed =truestored =truemultiValued =true/>
< field name =includestype =textindexed =truestored =truetermVectors =truetermPositions =truetermOffsets =true/>

< field name =weighttype =floatindexed =truestored =true/>
< field name =pricetype =floatindexed =truestored =true/>
< field name =popularitytype =intindexed =truestored =true/>
< field name =inStocktype =booleanindexed =truestored =true/>
- >

<! - 公共元数据字段,特别命名为与
匹配在解析诸如Word,PDF等富文档时使用SolrCell元数据。
一些字段是multiValued,因为Tika当前可能会为它们返回
个值。
- >
<! -
< field name =titletype =textindexed =truestored =truemultiValued =true/>
< field name =subjecttype =textindexed =truestored =true/>
< field name =descriptiontype =textindexed =truestored =true/>
< field name =commentstype =textindexed =truestored =true/>
< field name =authortype =textgenindexed =truestored =true/>
< field name =keywordstype =textgenindexed =truestored =true/>
< field name =categorytype =textgenindexed =truestored =true/>
< field name =content_typetype =stringindexed =truestored =truemultiValued =true/>
< field name =last_modifiedtype =dateindexed =truestored =true/>
< field name =linkstype =stringindexed =truestored =truemultiValued =true/>
- >

<! - catchall字段,包含所有其他可搜索的文本字段(在此模式中通过copyField实现
- >
< !-< field name =texttype =textindexed =truestored =falsemultiValued =true/> - >

& - >
<! - < field name =text_revtype =text_revindexed =truestored = falsemultiValued =true/> - >

<! - 制造商的非标记化版本,以使制造商更容易对结果进行排序或分组。通过copyField从manu复制 - >
<! - < field name =manu_exacttype =stringindexed =truestored =false/>

<! - < field name =payloadstype =payloadsindexed =truestored =true/> -
$ b b<! - 取消注释以下将使用
a默认值NOW创建时间戳字段,以指示每个文档何时被索引。
- >
<! -
< field name =timestamptype =dateindexed =truestored =truedefault =NOWmultiValued =false/>
- >。


<! - 动态字段定义。如果未找到字段名称,则如果名称匹配任何模式,将使用dynamicFields

限制:name属性中的类似glob的模式必须在开始或结束处有
a*。
示例:name =* _ i将匹配以_i结尾的任何字段(如myid_i,z_i)
较长的模式将首先匹配。如果等号模式
都匹配,则将使用模式中首次出现的模式。 - >
<! -
< dynamicField name =* _ itype =intindexed =truestored =true/>
< dynamicField name =* _ stype =stringindexed =truestored =true/>
< dynamicField name =* _ ltype =longindexed =truestored =true/>
< dynamicField name =* _ ttype =textindexed =truestored =true/>
< dynamicField name =* _ btype =booleanindexed =truestored =true/>
< dynamicField name =* _ ftype =floatindexed =truestored =true/>
< dynamicField name =* _ dtype =doubleindexed =truestored =true/>
< dynamicField name =* _ dttype =dateindexed =truestored =true/>
- >

<! - 一些特里码编码的动态字段,用于更快的范围查询 - >
<! -
< dynamicField name =* _ titype =tintindexed =truestored =true/>
< dynamicField name =* _ tltype =tlongindexed =truestored =true/>
< dynamicField name =* _ tftype =tfloatindexed =truestored =true/>
< dynamicField name =* _ tdtype =tdoubleindexed =truestored =true/>
< dynamicField name =* _ tdttype =tdateindexed =truestored =true/>

< dynamicField name =* _ pitype =pintindexed =truestored =true/>

< dynamicField name =ignored_ *type =ignoredmultiValued =true/>
< dynamicField name =attr_ *type =textgenindexed =truestored =truemultiValued =true/>

< dynamicField name =random_ *type =random/>
- >
<! - 取消注释以下内容忽略任何与现有
字段名或动态字段不匹配的字段,而不是将它们报告为错误。
,则将type =ignored更改为其他类型。 text如果你想要
未知字段索引和/或默认存储 - >
<! - dynamicField name =*type =ignoredmultiValued =true/ - >

< / fields>

<! - 用于确定和强制执行文档唯一性的字段。
除非此字段标记为required =false,否则将为必填字段
- >
< uniqueKey> id< / uniqueKey>

<! - 字段,用于当显式字段名不存在时QueryParser使用 - >
< defaultSearchField> global< / defaultSearchField>

<! - SolrQueryParser配置:defaultOperator =AND | OR - >
< solrQueryParser defaultOperator =OR/>

<! - copyField命令将文档
添加到索引时将一个字段复制到另一个字段。它用于以不同的方式索引相同的字段,
或将多个字段添加到同一字段以便更容易/更快速的搜索。 - >

< copyField source =titledest =global/>
< copyField source =descriptiondest =global/>


< / schema>


解决方案

是在schema.xml中添加这样的指令之后的字段定义:

 < copyField source =skudest =text>假设defaultSearchField设置为 text  



< 。



要搜索以96开头的所有SKU,您可以搜索96 *。请记住,虽然这将返回以96开头的所有字段(不仅仅是SKU)。要将其限制为SKU,您必须搜索 sku:96 * 。 / p>

I'm currently developping a search engine using Solr for an ecommerce website. So I get these two fields in my schema.xml:

   <field name="sku" type="string" indexed="true" stored="true" required="false" />
   <field name="collection" type="string" indexed="true" stored="true" required="false" />

(The complete schema.xml is available below)

For information:

  • sku looks like this: 959620, 929345, 912365, ...
  • collection looks like this: Alcott, Spigrim, Tantal,...

They are well indexed. For instance, when I look for:

http://localhost:8080/solr/myindex/select/?q=Alcott

I got all products with collection "Alcott".

But when I look for;

http://localhost:8080/solr/myindex/select/?q=959620

I got nothing.

However, when I go deep forward with this request,

http://localhost:8080/solr/myindex/select/?q=sku:969520

I do have the product attached to this sku.

Is there any way to have "q=969520" working ? And even better: "q=96" resulting all products with sku starting by "96" ?

Thank you for your help !

schema.xml:

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="example" version="1.2">


  <types>

    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

    <!-- boolean type: "true" or "false" -->
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
    <!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
    <fieldtype name="binary" class="solr.BinaryField"/>


    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>


    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>


    <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>

    <!-- A Trie based date field for faster date range queries and date faceting. -->
    <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>



    <fieldType name="pint" class="solr.IntField" omitNorms="true"/>
    <fieldType name="plong" class="solr.LongField" omitNorms="true"/>
    <fieldType name="pfloat" class="solr.FloatField" omitNorms="true"/>
    <fieldType name="pdouble" class="solr.DoubleField" omitNorms="true"/>
    <fieldType name="pdate" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>



    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>


    <fieldType name="random" class="solr.RandomSortField" indexed="true" />


    <!-- A text field that only splits on whitespace for exact matching of words -->
    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>


    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="query">
        <!-- normalisation des accents, cédilles, e dans l'o,... -->
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <!-- découpage selon les espaces -->
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- suppression de la ponctuation -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
        <!-- suppression des tokens vides et des mots démesurés -->
        <filter class="solr.LengthFilterFactory" min="1" max="100" />
        <!-- passage en minuscules -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- suppression des élisions (l', qu',...) -->
        <filter class="solr.ElisionFilterFactory" articles="elisionwords_fr.txt"/> 
        <!-- découpage des mots composés -->
        <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1"
                                                        generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
        <!-- suppression des mots insignifiants -->
        <filter class="solr.StopFilterFactory" ignoreCase="1" words="stopwords_fr.txt" enablePositionIncrements="true"/>
        <!-- gestion des synonymes -->
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms_fr.txt" ignoreCase="true" expand="true"/>
        <!-- partie de mot -->
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="6"/>
        <!-- lemmatisation (pluriels,...) -->
        <filter class="solr.SnowballPorterFilterFactory" language="French" protected="protwords_fr.txt"/>
        <!-- suppression des doublons éventuels -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>



      <analyzer type="index">
        <!-- normalisation des accents, cédilles, e dans l'o,... -->
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <!-- découpage selon les espaces -->
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- suppression de la ponctuation -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
        <!-- suppression des tokens vides et des mots démesurés -->
        <filter class="solr.LengthFilterFactory" min="1" max="100" />
        <!-- passage en minuscules -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- suppression des élisions (l', qu',...) -->
        <filter class="solr.ElisionFilterFactory" articles="elisionwords_fr.txt"/> 
        <!-- découpage des mots composés -->
        <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1"
                                                        generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
        <!-- suppression des mots insignifiants -->
        <filter class="solr.StopFilterFactory" ignoreCase="1" words="stopwords_fr.txt" enablePositionIncrements="true"/>
        <!-- gestion des synonymes -->
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms_fr.txt" ignoreCase="true" expand="true"/>
        <!-- partie de mot -->
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="6"/>
        <!-- lemmatisation (pluriels,...) -->
        <filter class="solr.SnowballPorterFilterFactory" language="French" protected="protwords_fr.txt"/>
        <!-- suppression des doublons éventuels -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>




    <!-- Less flexible matching, but less false matches.  Probably not ideal for product names,
         but may be good for SKUs.  Can insert dashes in the wrong place and still match. -->
    <fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
             possible with WordDelimiterFilter in conjuncton with stemming. -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


    <!-- A general unstemmed text field - good if one does not know the language of the field -->
    <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>


    <!-- A general unstemmed text field that indexes tokens normally and also
         reversed (via ReversedWildcardFilterFactory), to enable more efficient 
   leading wildcard queries. -->
    <fieldType name="text_rev" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>


    <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer>
        <!-- KeywordTokenizer does no actual tokenizing, so the entire
             input string is preserved as a single token
          -->
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <!-- The LowerCase TokenFilter does what you expect, which can be
             when you want your sorting to be case insensitive
          -->
        <filter class="solr.LowerCaseFilterFactory" />
        <!-- The TrimFilter removes any leading or trailing whitespace -->
        <filter class="solr.TrimFilterFactory" />

        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all"
        />
      </analyzer>
    </fieldType>

    <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
      </analyzer>
    </fieldtype>

    <fieldtype name="payloads" stored="false" indexed="true" class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--
        The DelimitedPayloadTokenFilter can put payloads on tokens... for example,
        a token of "foo|1.4"  would be indexed as "foo" with a payload of 1.4f
        Attributes of the DelimitedPayloadTokenFilterFactory : 
         "delimiter" - a one character delimiter. Default is | (pipe)
   "encoder" - how to encode the following value into a playload
      float -> org.apache.lucene.analysis.payloads.FloatEncoder,
      integer -> o.a.l.a.p.IntegerEncoder
      identity -> o.a.l.a.p.IdentityEncoder
            Fully Qualified class name implementing PayloadEncoder, Encoder must have a no arg constructor.
         -->
        <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>
      </analyzer>
    </fieldtype>

    <!-- lowercases the entire field value, keeping it as a single token.  -->
    <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>


    <!-- since fields of this type are by default not stored or indexed,
         any data added to them will be ignored outright.  --> 
    <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" /> 

 </types>


 <fields>
 <!-- Vu fields -->
   <field name="id" type="string" indexed="true" stored="true" required="true" /> 
   <field name="sku" type="string" indexed="true" stored="true" required="false" /> 
   <field name="collection" type="string" indexed="true" stored="true" required="false" /> 
   <field name="title" type="text_fr" required="false" />
   <field name="description" type="text_fr" required="false" />
   <field name="price" type="float" required="false" indexed="true" stored="false" />
   <field name="brand_id" type="text" required="false" />
   <field name="date_online" type="date" required="false" />
   <field name="product_type" type="text" required="false" />   
   <field name="selection_id" type="sint" required="false" multiValued="true" indexed="true" stored="false" />
   <field name="stock_delay" type="sint" required="false"  />
   <field name="stock" type="sint" required="false"  />
   <field name="price_type" type="sint" required="false"  />
   <field name="main_product_id" type="text" required="false"  />
   <field name="date_price" type="date" required="false" />
   <!-- attributes -->
   <dynamicField name="attr_*" type="sint" indexed="true" multiValued="true"/>

   <field name="attr_13" type="int" indexed="true" multiValued="false"/>
   <field name="attr_14" type="int" indexed="true" multiValued="false"/>
   <field name="attr_19" type="int" indexed="true" multiValued="false"/>

    <!-- Ce champ contiendra la copie de tous les autres, pour faciliter la recherche -->
   <field name="global" type="text_fr" required="false" multiValued="true" />


   <!-- Valid attributes for fields:
     name: mandatory - the name for the field
     type: mandatory - the name of a previously defined type from the 
       <types> section
     indexed: true if this field should be indexed (searchable or sortable)
     stored: true if this field should be retrievable
     compressed: [false] if this field should be stored using gzip compression
       (this will only apply if the field type is compressable; among
       the standard field types, only TextField and StrField are)
     multiValued: true if this field may contain multiple values per document
     omitNorms: (expert) set to true to omit the norms associated with
       this field (this disables length normalization and index-time
       boosting for the field, and saves some memory).  Only full-text
       fields or fields that need an index-time boost need norms.
     termVectors: [false] set to true to store the term vector for a
       given field.
       When using MoreLikeThis, fields used for similarity should be
       stored for best performance.
     termPositions: Store position information with the term vector.  
       This will increase storage costs.
     termOffsets: Store offset information with the term vector. This 
       will increase storage costs.
     default: a value that should be used if no value is specified
       when adding a document.
   -->
    <!--
   <field name="id" type="string" indexed="true" stored="true" required="true" /> 
   <field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
   <field name="name" type="textgen" indexed="true" stored="true"/>
   <field name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/>
   <field name="manu" type="textgen" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="text_ws" indexed="true" stored="true" multiValued="true" omitNorms="true" />
   <field name="features" type="text" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

   <field name="weight" type="float" indexed="true" stored="true"/>
   <field name="price"  type="float" indexed="true" stored="true"/>
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />
    -->

   <!-- Common metadata fields, named specifically to match up with
     SolrCell metadata when parsing rich documents such as Word, PDF.
     Some fields are multiValued only because Tika currently may return
     multiple values for them.
   -->
   <!--
   <field name="title" type="text" indexed="true" stored="true" multiValued="true"/>
   <field name="subject" type="text" indexed="true" stored="true"/>
   <field name="description" type="text" indexed="true" stored="true"/>
   <field name="comments" type="text" indexed="true" stored="true"/>
   <field name="author" type="textgen" indexed="true" stored="true"/>
   <field name="keywords" type="textgen" indexed="true" stored="true"/>
   <field name="category" type="textgen" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
    -->

   <!-- catchall field, containing all other searchable text fields (implemented
        via copyField further on in this schema  -->
   <!-- <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> -->

   <!-- catchall text field that indexes tokens both normally and in reverse for efficient
        leading wildcard queries. -->
   <!-- <field name="text_rev" type="text_rev" indexed="true" stored="false" multiValued="true"/> -->

   <!-- non-tokenized version of manufacturer to make it easier to sort or group
        results by manufacturer.  copied from "manu" via copyField -->
   <!-- <field name="manu_exact" type="string" indexed="true" stored="false"/> -->

   <!-- <field name="payloads" type="payloads" indexed="true" stored="true"/> -->

   <!-- Uncommenting the following will create a "timestamp" field using
        a default value of "NOW" to indicate when each document was indexed.
     -->
   <!--
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
     -->


   <!-- Dynamic field definitions.  If a field name is not found, dynamicFields
        will be used if the name matches any of the patterns.
        RESTRICTION: the glob-like pattern in the name attribute must have
        a "*" only at the start or the end.
        EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
        Longer patterns will be matched first.  if equal size patterns
        both match, the first appearing in the schema will be used.  -->
   <!--
   <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
   <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
   <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
   <dynamicField name="*_t"  type="text"    indexed="true"  stored="true"/>
   <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
   <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
   <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
   <dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>
    -->

   <!-- some trie-coded dynamic fields for faster range queries -->
   <!--
   <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
   <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
   <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
   <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
   <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>

   <dynamicField name="*_pi"  type="pint"    indexed="true"  stored="true"/>

   <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
   <dynamicField name="attr_*" type="textgen" indexed="true" stored="true" multiValued="true"/>

   <dynamicField name="random_*" type="random" />
    -->
   <!-- uncomment the following to ignore any fields that don't already match an existing 
        field name or dynamic field, rather than reporting them as an error. 
        alternately, change the type="ignored" to some other type e.g. "text" if you want 
        unknown fields indexed and/or stored by default --> 
   <!--dynamicField name="*" type="ignored" multiValued="true" /-->

 </fields>

 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>global</defaultSearchField>

 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="OR"/>

  <!-- copyField commands copy one field to another at the time a document
        is added to the index.  It's used either to index the same field differently,
        or to add multiple fields to the same field for easier/faster searching.  -->

   <copyField source="title" dest="global"/>
   <copyField source="description" dest="global"/>


</schema>

解决方案

Yes add a directive like this in your schema.xml after the field definitions:

<copyField source="sku" dest="text">

assuming that the defaultSearchField is set to text.

To search for all SKUs beginning with 96 you can search for 96*. Keep in mind though this will return all fields (not just SKUs) that begin with 96. To restrict it to SKUs, you will have to search for sku:96*.

这篇关于Solr不搜索整数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆