在documentdb语法上自联接错误 [英] self-join on documentdb syntax error

查看:109
本文介绍了在documentdb语法上自联接错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法对documentdb进行其他有效的SQL自联接查询.

I'm having trouble doing an otherwise SQL valid self-join query on documentdb.

因此,以下查询有效: SELECT * FROM c AS c1 WHERE c1.obj="car"

So the following query works: SELECT * FROM c AS c1 WHERE c1.obj="car"

但是此简单的自联接查询不会:SELECT c1.url FROM c AS c1 JOIN c AS c2 WHERE c1.obj="car" AND c2.obj="person" AND c1.url = c2.url,错误为Identifier 'c' could not be resolved.

But this simple self join query does not: SELECT c1.url FROM c AS c1 JOIN c AS c2 WHERE c1.obj="car" AND c2.obj="person" AND c1.url = c2.url, with the error, Identifier 'c' could not be resolved.

似乎,documendb支持内部的自连接文档,但是我要问的是收藏级别.

It seems that documendb supports self-joins within the document, but I'm asking on the collection level.

我查看了官方语法doc 并了解集合名称基本上是推断出来的;我尝试将c更改为显式我的集合名称和根目录,但均无效.

I looked at the official syntax doc and understand that the collection name is basically inferred; I tried changing c to explicitly my collection name and root but neither worked.

我缺少明显的东西吗?谢谢!

Am I missing something obvious? Thanks!

推荐答案

需要澄清的几件事:

1.)关于Identifier 'c' could not be resolved

1.) Regarding Identifier 'c' could not be resolved

查询仅限于单个集合;在上面的示例中,c是该集合的隐式别名,该集合正通过AS关键字重新别名为c1.

Queries are scoped to a single collection; and in the example above, c is an implicit alias for the collection which is being re-aliased to c1 with the AS keyword.

您可以通过将JOIN固定为引用c1来修改示例查询:

You can fix the example query changing fixing the JOIN to reference c1:

SELECT c1.url
FROM c AS c1
JOIN c1 AS c2
WHERE c1.obj="car" AND c2.obj="person" AND c1.url = c2.url`

这也等同于:

SELECT c1.url
FROM c1
JOIN c1 AS c2
WHERE c1.obj="car" AND c2.obj="person" AND c1.url = c2.url`

2.)了解联接并检查您的数据模型

话虽如此,我认为解决上面的查询语法问题不会产生您期望的行为. DocumentDB SQL中的JOIN关键字设计用于形成文档中元素具有非规范化数组的叉积(与在同一集合中的其他文档中形成叉积相反).如果您在这里遇到麻烦,可能值得退后一步,重新考虑如何为Azure Cosmos DB建模数据.

With that said, I don't think fixing the query syntax issue above will produce the behavior you are expecting. The JOIN keyword in DocumentDB SQL is designed for forming a cross product with a denormalized array of elements within a document (as opposed to forming cross products across other documents in the same collection). If you run in to struggles here, it may be worth taking a step back and revisiting how to model your data for Azure Cosmos DB.

在RDBMS中,您受过训练,以实体为先,并根据实体对数据模型进行规范化.您在很大程度上依赖查询引擎来优化查询以适合您的工作负载(这通常在检索数据方面做得很好,但并不总是最佳的).面临的挑战是,随着规模的增加,许多关系收益会丢失,而扩展到多个分片/分区成为必需.

In a RDBMS, you are trained to think entity-first and normalize your data model based on entities. You rely heavily on a query engine to optimize queries to fit your workload (which typically do a good, but not always optimal, job for retrieving data). The challenges here are that many relational benefits are lost as scale increases, and scaling out to multiple shards/partitions becomes a requirement.

对于像Cosmos DB这样的横向扩展分布式数据库,您将要首先了解工作量并优化数据模型以适应工作量(而不是首先考虑实体).您将要记住,集合仅仅是一个逻辑抽象,它由分区集中的许多副本组成.它们不强制执行架构,是查询的边界.

For a scale-out distributed database like Cosmos DB, you will want to start with understanding the workload first and optimize your data model to fit the workload (as opposed to thinking entity first). You'll want to keep in mind that collections are merely a logical abstraction composed of many replicas that live within partition sets. They do not enforce schema and are the boundary for queries.

在设计模型时,您需要将以下问题纳入您的思考过程:

When designing your model, you will want to incorporate the following questions in to your thought process:

  • 对于更广泛的解决方案,规模和吞吐量的规模是多少(数量级的估计就足够了)?

  • What is the scale, in terms of size and throughput, for the broader solution (an estimate of order of magnitude is sufficient)?

读与写的比率是多少?

用于写入-写入的模式是什么?它主要是插入内容,还是有很多更新?

For writes - what is the pattern for writes? Is it mostly inserts, or are there a lot of updates?

用于读取-前N个查询是什么样的?

For reads - what do top N queries look like?

以上内容将影响您对分区键的选择以及数据/对象模型的外观.例如:

The above should influence your choice of partition key as well as what your data / object model should look like. For example:

  • 请求的比例将帮助指导您如何进行权衡(使用帕累托原理并针对大部分工作负载进行优化).
  • 对于繁重的工作负载,通常过滤的属性将成为选择分区键的候选对象.
  • 倾向于经常一起更新的属性应在数据模型中一起抽象,并且远离以较慢节奏进行更新的属性(以降低RU的更新费用).
  • 不要害怕在不同的记录类型之间重复属性以丰富查询性并注释类型.例如,我们有两种类型的文档:cat和person.

    {
       "id": "Andrew",
       "type": "Person",
       "familyId": "Liu",
       "employer": "Microsoft"
    }
     
    {
       "id": "Ralph",
       "type": "Cat",
       "familyId": "Liu",
       "fur": {
             "length": "short",
             "color": "brown"
       }
    }
     

我们可以简单地通过运行没有类型过滤器的查询来查询这两种类型的文档而无需JOIN:

We can query both types of documents without needing a JOIN simply by running a query without a filter on type:

SELECT * FROM c WHERE c.familyId = "Liu"

如果我们想对type ="Person"进行过滤,我们可以简单地在查询中添加对type的过滤器:

And if we wanted to filter on type = "Person", we can simply add a filter on type to our query:

SELECT * FROM c WHERE c.familyId = "Liu" AND c.type = "Person"

这篇关于在documentdb语法上自联接错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆