我如何将法语文本FEMMES.COM索引为FEMMES的语言变体 [英] How do I get french text FEMMES.COM to index as language variants of FEMMES
问题描述
我需要FEMMES.COM才能将其标记为基本单词FEMME的单数+复数形式.
I need FEMMES.COM to get tokenized as singular + plural forms of the base word FEMME.
"analyzers":[{"@ odata.type":#Microsoft.Azure.Search.CustomAnalyzer","name":"text_language_search_custom_analyzer","tokenizer":"text_language_search_custom_analyzer_ms_tokenizer","tokenFilters":[[小写," asciifolding]," charFilters:[" html_strip]}]," tokenizers:[{" @ odata.type:"#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer," name:" text_language_search_custom_analyzer_ms_tokenizer","maxTokenLength":300,"isSearchTokenizer":false,"language":"english"}],"tokenFilters":[],"charFilters":[]}
"analyzers": [ { "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", "name": "text_language_search_custom_analyzer", "tokenizer": "text_language_search_custom_analyzer_ms_tokenizer", "tokenFilters": [ "lowercase", "asciifolding" ], "charFilters": [ "html_strip" ] } ], "tokenizers": [ { "@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer", "name": "text_language_search_custom_analyzer_ms_tokenizer", "maxTokenLength": 300, "isSearchTokenizer": false, "language": "english" } ], "tokenFilters": [], "charFilters": []}
{"analyzer":"text_language_search_custom_analyzer","text":"FEMMES"}
{ "analyzer": "text_language_search_custom_analyzer", "text": "FEMMES" }
{"@ odata.context":" https://one -adscope-search-eu-stage.search.windows.net/ $ metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult," tokens:[{" token:" femme," startOffset":0,"endOffset":6,6,"position":0},{"token":"femmes","startOffset":0,"endOffset":6,"position":0}]}
{ "@odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 } ] }
{"@ odata.context":" https://one -adscope-search-eu-stage.search.windows.net/ $ metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult," tokens:[{" token:" femmes," startOffset":0,"endOffset":6,6,"position":0},{"token":"com","startOffset":7,7,"endOffset":10,"position":1}]}
{ "@odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ] }
{"@ odata.context":" https://one -adscope-search-eu-stage.search.windows.net/ $ metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult," tokens:[{" token:" femme," startOffset":0,"endOffset":6,6,"position":0},{"token":"femmes","startOffset":0,"endOffset":6,"position":0},{"token":"com","startOffset":7,"endOffset":10,"position":1}]}
{ "@odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ]}
推荐答案
我以前的答案不正确. Azure搜索实施实际上在令牌筛选器之前应用了语言令牌生成器.在我的用例中,这实际上使WordDelimiterToken过滤器无效.
My previous answer was not correct. Azure Search implementation actually applies the language tokenizer BEFORE token filters. This essentially made the WordDelimiterToken filter useless in my use case.
最终我要做的是在上传到Azure进行索引之前对数据进行预处理.在我的C#代码中,我添加了一些正则表达式逻辑,这些逻辑会将诸如FEMMES2017之类的文本分解为FEMMES 2017,然后再将其发送到Azure.这样,当文本到达Azure时,索引器将单独看到FEMMES,并使用语言标记器将其正确标记为FEMME和FEMMES.
What I ended up having to do was to pre-process data BEFORE I uploaded to Azure for indexing. In my C# code, I added some regex logic that would break apart text like FEMMES2017 into FEMMES 2017, before I sent it to Azure. This way, when the text got to Azure, the indexer would see FEMMES by itself and properly tokenize as FEMME and FEMMES using the language tokenizer.
这篇关于我如何将法语文本FEMMES.COM索引为FEMMES的语言变体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!