如何优化Core Data查询全文搜索 [英] How to optimize Core Data query for full text search

查看:289
本文介绍了如何优化Core Data查询全文搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在文本中搜索匹配单词时,我可以优化Core Data查询吗? (这个问题也适用于iPhone上的自定义SQL对核心数据的智慧。)

Can I optimize a Core Data query when searching for matching words in a text? (This question also pertains to the wisdom of custom SQL versus Core Data on an iPhone.)

我正在开发一个新手机应用程序工具的科学数据库。主界面是一个标准的可搜索表视图,我想在用户键入新单词时作为类型响应。词匹配必须是文本中词的前缀。文本由100,000个单词组成。

I'm working on a new (iPhone) app that is a handheld reference tool for a scientific database. The main interface is a standard searchable table view and I want as-you-type response as the user types new words. Words matches must be prefixes of words in the text. The text is composed of 100,000s of words.

在我的原型中,我直接编码SQL。我创建了一个单独的words表,包含主实体的文本字段中的每个单词。我按照

In my prototype I coded SQL directly. I created a separate "words" table containing every word in the text fields of the main entity. I indexed words and performed searches along the lines of

SELECT id, * FROM textTable 
  JOIN (SELECT DISTINCT textTableId FROM words 
         WHERE word BETWEEN 'foo' AND 'fooz' ) 
    ON id=textTableId
 LIMIT 50

运行速度非常快。使用IN可能工作得很好,即

This runs very fast. Using an IN would probably work just as well, i.e.

SELECT * FROM textTable
 WHERE id IN (SELECT textTableId FROM words 
               WHERE word BETWEEN 'foo' AND 'fooz' ) 
 LIMIT 50

LIMIT是至关重要的,可以让我快速显示结果。如果达到限制,我通知用户有太多无法显示。这是kludgy。

The LIMIT is crucial and allows me to display results quickly. I notify the user that there are too many to display if the limit is reached. This is kludgy.

我花了最近几天思考移动到Core Data的优点,但我担心在模式,索引,

I've spent the last several days pondering the advantages of moving to Core Data, but I worry about the lack of control in the schema, indexing, and querying for an important query.

理论上, textField MATCHES'。* \bfoo。*'的NSPredicate只是工作,但我相信它会慢。这种文本搜索似乎很常见,我不知道什么是通常的攻击?你可以像上面一样创建一个单词实体,并使用word BEGINSWITH'foo'的谓词吗?将工作像我的原型一样快? Core Data会自动创建正确的索引吗?我找不到任何建议持久存储关于索引的明确方法。

Theoretically an NSPredicate of textField MATCHES '.*\bfoo.*' would just work, but I'm sure it will be slow. This sort of text search seems so common that I wonder what is the usual attack? Would you create a words entity as I did above and use a predicate of "word BEGINSWITH 'foo'"? Will that work as fast as my prototype? Will Core Data automatically create the right indexes? I can't find any explicit means of advising the persistent store about indexes.

我在iPhone应用程序中看到Core Data的一些不错的优点。故障和其他内存注意事项允许高效的数据库检索tableview查询,而不设置任意限制。对象图管理使我能够轻松地遍历实体而不需要编写大量的SQL。迁移功能将在未来很好。另一方面,在有限的资源环境(iPhone)中,我担心自动生成的数据库会因为元数据,不必要的反向关系,低效的属性数据类型等而膨胀。

I see some nice advantages of Core Data in my iPhone app. The faulting and other memory considerations allow for efficient database retrievals for tableview queries without setting arbitrary limits. The object graph management allows me to easily traverse entities without writing lots of SQL. Migration features will be nice in the future. On the other hand, in a limited resource environment (iPhone) I worry that an automatically generated database will be bloated with metadata, unnecessary inverse relationships, inefficient attribute datatypes, etc.

我应该潜水还是继续小心?

Should I dive in or proceed with caution?

推荐答案

我制定了一个解决方案。我认为它类似于此帖。我将合并源代码添加到我的Core Data项目,然后创建了一个不是托管对象子类的全文搜索类。在FTS类中我 #importsqlite3.h(源文件)而不是sqlite框架。 FTS类保存到与Core Data持久存储不同的.sqlite文件。

I made a workaround solution. I think it's similar to this post. I added the amalgamation source code to my Core Data project, then created a full-text search class that was not a managed object subclass. In the FTS class I #import "sqlite3.h" (the source file) instead of the sqlite framework. The FTS class saves to a different .sqlite file than the Core Data persistent store.

当我导入数据时,Core Data对象存储相关FTS对象的rowid作为整数属性。我有一个静态数据集,所以我不担心参照完整性,但是维护完整性的代码应该是微不足道的。

When I import my data, the Core Data object stores the rowid of the related FTS object as an integer attribute. I have a static dataset, so I don't worry about referential integrity, but the code to maintain integrity should be trivial.

要执行FTS,我 MATCH 查询FTS类,返回一组rowid。在我的托管对象类中,我使用 [NSPredicate predicateWithFormat:@rowid IN%@,rowids] 查询相应的对象。我避免以这种方式遍历任何多对多关系。

To perform FTS, I MATCH query the FTS class, returning a set of rowids. In my managed object class, I query for the corresponding objects with [NSPredicate predicateWithFormat:@"rowid IN %@", rowids]. I avoid traversing any many-to-many relationships this way.

性能的提高是巨大的。我的数据集为142287行,包括194MB(核心数据)和92MB(FTS删除了无字词)。根据搜索字词的频率,我的搜索对于不频繁的字词(< 100次匹配)从几秒钟到0.1秒,对于常用字词(> 2000次点击),搜索次数为0.2秒。

The performance improvement is dramatic. My dataset is 142287 rows, comprising 194MB (Core Data) and 92MB (FTS with stopwords removed). Depending on the search term frequency, my searches went from several seconds to 0.1 seconds for infrequent terms (<100 hits) and 0.2 seconds for frequent terms (>2000 hits).

我确定有我的方法(代码膨胀,可能的命名空间冲突,丢失一些核心数据功能)的无数的问题,但它似乎是工作。

I'm sure there are myriad problems with my approach (code bloat, possible namespace collisions, loss of some Core Data features), but it seems to be working.

这篇关于如何优化Core Data查询全文搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆