在Core Data存储中匹配近似字符串 [英] Matching an approximate string in a Core Data store

查看:174
本文介绍了在Core Data存储中匹配近似字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前写的核心数据应用程序有一个小问题。我有两个不同的模型,上下文和存储。一个是我的应用程序数据,另一个是一个网站与我的相关信息。



大多数时候,我只匹配一个记录从我的应用程序到另一个记录从其他来源。有时候,我必须回退到模糊字符串匹配链接两个记录。
我试图匹配歌曲标题。我的地方头衔可以是(组成)法国理想主义者在你的阴谋和远程歌曲标题可以01 - 10 - 法语理想主义者在你是pensee,the(dub remix,feat。DJ Objective-C)



我搜索堆栈溢出,谷歌,可可文档,我找不到任何明确的答案,如何做这些情况下的模糊匹配。我的字符串可以从任何开始,有一堆特殊字符,通常以随机结尾或被忽略的字符。



Regexp不会,NSPredicates,Soundex doesn与外国名字很好地工作,也许Levenshtein不够(或会吗?)。



我在一组中寻找一个标题大约十几个潜在的匹配,但我哈佛做这个操作相当多。



我在想删除忽略的单词,提取关键字(在这个例子中,french,idealist,pensée),concatenate他们,然后使用Levenshtein距离(歌曲标题中的字应该是在相同的顺序)。



在我的特殊情况下,它会工作吗?核心数据,Cocoa或Objective-C能帮助我吗?


$。这个问题的行业标准是什么(我不能是世界上唯一一个想要匹配略有不同的歌曲名称的人) b $ b

非常感谢。

解决方案

您希望您的搜索不会因为é pensée和'e'在pensee。您可以通过在属性后面添加[d]来获得此属性。像这样:

  NSPredicate * predicate = [NSPredicate predicateWithFormat:@(songTitle like [cd]%@),yourSongSubstring];  



由于您的字符串可能以任何顺序出现在您要搜索的字符串中,你可以tokenize你的搜索字符串([... componentsByString:@])然后创建一个谓词像

  NSPredicate * predicate = [NSPredicate predicateWithFormat:@ ]%@)和(songTitle like [cd]%@),songToken1,songToken2];  

上述结合谓词的语法可能已经关闭, / p>

I have a small problem with the core data application i'm currently writing. I have two differents models, contexts and peristent stores. One is for my app data, the other one is for a website with relevant infos to me.

Most of the time, I match exactly one record from my app to another record from the other source. Sometimes however, I have to fallback to fuzzy string matching to link the two records. I'm trying to match song titles. My local title could be the (made up) "The French Idealist is in your pensée" and the remote song title could be "01 - 10 - French idealist in in you're pensee, The (dub remix, feat. DJ Objective-C)"

I search stack overflow, Google, the cocoa documentation, and I can't find any clear answer on how to do a fuzzy matching in these cases. My strings can start with anything, have a bunch of special characters, usually end with random or to be ignored characters.

Regexp won't do, nor NSPredicates, Soundex doesn't work well with foreign names, and maybe the Levenshtein won't be enough (or will it ?).

I'm looking for a title in a set of about a dozen potential matches, but I hava to do this operation quite a lot. 100% accuracy is not the goal.

I was thinking of removing the ignored words, extracting the keywords (in this example, "french, idealist, pensée"), concatenate them, and then use the Levenshtein distance (words in song title should be in the same order).

In my special case, would it work ? What is the industry standard regarding this problem (I can't be the only one in the world who want to match slightly different songs names) Can Core Data, Cocoa or Objective-C help me ?

Thanks a lot.

解决方案

You want your search to be diacritic insensitive to match the 'é' in pensée and 'e' in pensee. You get this by adding the [d] after the attribute. Like so:

    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"(songTitle like[cd] %@)", yourSongSubstring];

The 'c' in [cd] is for case insensitivity.

Since your string could appear in any order in the string you are searching, you could tokenize your search string ([... componentsByString:@" "]) then create a predicate like

    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"(songTitle like[cd] %@) and (songTitle like[cd] %@)", songToken1, songToken2];

That syntax to combine predicates above may be off, going from memory.

这篇关于在Core Data存储中匹配近似字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆