如何创建“喜欢”的在couchdb中过滤视图 [英] how do I create a "like" filter view in couchdb

查看:72
本文介绍了如何创建“喜欢”的在couchdb中过滤视图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我在sql中需要的示例:

Here's an example of what I need in sql:


从雇用名字中选择名称,例如%bro%

如何在沙发上创建这样的视图?

How do I create view like that in couchdb?

推荐答案

简单的答案是CouchDB视图对此并不理想。

The simple answer is that CouchDB views aren't ideal for this.

更复杂的答案是这种类型的查询容易也会在典型的SQL引擎中效率非常低下,因此,如果您认为使用 any 解决方案会有所取舍,那么CouchDB实际上具有让您选择权衡的好处。

The more complicated answer is that this type of query tends to be very inefficient in typical SQL engines too, and so if you grant that there will be tradeoffs with any solution then CouchDB actually has the benefit of letting you choose your tradeoff.

1。 SQL方式

当您执行 SELECT ...时,名称类似于%bro%,我熟悉的所有SQL引擎都必须执行所谓的全表扫描。这意味着服务器将读取相关表中的每一行,然后蛮力扫描该字段以查看其是否匹配。

When you do SELECT ... WHERE name LIKE %bro%, all the SQL engines I'm familiar with must do what's called a "full table scan". This means the server reads every row in the relevant table, and brute force scans the field to see if it matches.

您可以在CouchDB 2.x中使用使用 $ regex 运算符。对于基本情况,查询看起来像这样:

You can do this in CouchDB 2.x with a Mango query using the $regex operator. The query would look something like this for the basic case:

{"selector":{
  "name": {
    "$regex": "bro"
  }
}}

似乎没有任何区分大小写的选项,但是您可以扩展它以仅在开头/结尾或更复杂的模式下进行匹配。如果您还可以通过其他一些(可索引的)字段运算符来限制查询,则可能会提高性能。正如文档中警告的那样:

There do not appear to be any options exposed for case-sensitivity, etc. but you could extend it to match only at the beginning/end or more complicated patterns. If you can also restrict your query via some other (indexable) field operator, that would likely help performance. As the documentation warns:


正则表达式不适用于索引,因此不应将其用于过滤大型数据集。 […]

Regular expressions do not work with indexes, so they should not be used to filter large data sets. […]

您也可以在CouchDB 1.x中使用临时视图

You can do a full scan in CouchDB 1.x too, using a temporary view:

POST /some_database/_temp_view

{"map": "function (doc) { if (doc.name && doc.name.indexOf('bro') !== -1) emit(null); }"}

这将浏览数据库中的每个文档,并为您提供匹配文件列表。您可以调整map函数以使其也匹配文档类型,或者发出带有特定键的命令- emit(doc.timestamp)-或一些对您的目的- emit(null,doc.name)

This will look through every single document in the database and give you a list of matching documents. You can tweak the map function to also match on a document type, or to emit with a certain key for ordering — emit(doc.timestamp) — or some data value useful to your purpose — emit(null, doc.name).

2。 大量可用磁盘空间方式

根据您的源数据大小,您可以创建一个发出所有可能的内部字符串的索引作为其永久(磁盘上)视图键。也就是说,对于 Dobros之类的名称,您将 emit( dobros);发出( obros);发出(兄弟);发出( ros);发出( os);发出( s); 。然后,对于像'%bro%'这样的术语,您可以使用 startkey = bro& endkey = bro\uFFFF 查询视图,以获取所有查找字词。您的索引将大约为文本内容 squared 的大小,但是如果您需要比上面的完整数据库扫描更快地执行任意的字符串查找操作,请留出足够的空间来工作。不过,专为子字符串搜索设计的数据结构会更好。

Depending on your source data size you could create an index that emits every possible "interior string" as its permanent (on-disk) view key. That is to say for a name like "Dobros" you would emit("dobros"); emit("obros"); emit("bros"); emit("ros"); emit("os"); emit("s");. Then for a term like '%bro%' you could query your view with startkey="bro"&endkey="bro\uFFFF" to get all occurrences of the lookup term. Your index will be approximately the size of your text content squared, but if you need to do an arbitrary "find in string" faster than the full DB scan above and have the space this might work. You'd be better served by a data structure designed for substring searching though.

这也带给我们...

3。 全文搜索方式

您可以使用CouchDB插件( couchdb-lucene 现在通过 Dreyfus / Clouseau for 2.x, ElasticSearch SQLite的FTS )以生成辅助文本面向文档的索引。

You could use a CouchDB plugin (couchdb-lucene now via Dreyfus/Clouseau for 2.x, ElasticSearch, SQLite's FTS) to generate an auxiliary text-oriented index into your documents.

请注意,大多数全文搜索索引也不自然地支持任意通配符前缀,这可能是因为上面提到的空间效率类似的原因。通常,全文搜索并不意味着蛮力二进制搜索,而是单词搜索。不过,YMMV可以浏览全文引擎中的可用选项。

Note that most full text search indexes don't naturally support arbitrary wildcard prefixes either, likely for similar reasons of space efficiency as we saw above. Usually full text search doesn't imply "brute force binary search", but "word search". YMMV though, take a look around at the options available in your full text engine.

如果您真的不需要在任何地方找到兄弟

If you don't really need to find "bro" anywhere in a field, you can implement basic "find a word starting with X" search with regular CouchDB views by just splitting on various locale-specific word separators and omitting these "words" as your view keys. This will be more efficient than above, scaling proportionally to the amount of data indexed.

这篇关于如何创建“喜欢”的在couchdb中过滤视图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆