Azure 分析服务与直接查询 [英] Azure Analysis Services vs Direct Query

查看:13
本文介绍了Azure 分析服务与直接查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试评估在成本和性能方面的最佳方法,以使用带有 Azure Analysis Services 的 Power BI 或带有 Direct Query 的 Power BI 访问数据和在数据集上运行查询.

I'm tring to assess the best approach, in terms of cost and performance, in accessing data and running queries on a dataset using Power BI with Azure Analysis Services or Power BI with Direct Query.

我试图用下图说明这两种方法.

I have trie to illustrate both approaches with the following diagram.

从图中,步骤 4 和 5 描述了使用 Power BI 和 Direct Query 访问 Azure Data Lake 中的数据.然而,第 4 步和第 6 步描述了使用 Power BI 和 Azure Analysis Services 访问数据.

From the diagram steps 4 and 5 describe accessing data in Azure Data Lake using Power BI with Direct Query. Wherease, steps 4 and 6 describes accessing data using Power BI with Azure Analysis Services.

根据我自己的研究,Direct Query 因存在性能问题而臭名昭著,例如

From my own research, Direct Query is notorious for having performance issues e.g

所有 DirectQuery 请求都发送到源数据库,所以时间刷新视觉所需的时间取决于后端的时间source 以查询(或查询)的结果作为响应.

All DirectQuery requests are sent to the source database, so the time required to refresh a visual is dependent on how long that back-end source takes to respond with the results from the query (or queries).

上面的陈述有据可查,但是,在我的设计中,DirectQuery 请求不应该成为问题,因为大部分逻辑和转换都将在 Databricks 中进行(尽管我不希望这个问题集中在 Databricks 上).

The above statement is well documented, however, in my design DirectQuery request shouldn't be an issue because most of the logic and transformation will take place in Databricks (although, I don't want this question to focus on Databricks).

另一方面,使用 Azure 分析服务 (AAS),所有请求都发生在内存中,而不是 DirectQuery,因此速度更快.

On the other hand, with Azure Analysis Services(AAS) all requests occur in memory as opposed to DirectQuery and therefore much faster.

所以,如果您能分享您使用 DirectQuery 和 AAS 的经验,我会很高兴.如果你能告诉我我是否错过了使用技术相对于其他技术的任何优点/缺点/

So, I would like it if you could share you experience using DirectQuery, and AAS. If you could let me know if I have missed out on any advantages/disadvantages of using on technology over the other/

推荐答案

Power BI (PBI) 数据模型是分析服务的轻量级版本.如果你打开了 PBI Desktop,你可以打开任务管理器,看到后台有一个 Analysis Services 实例.在 Power BI 中,数据集大小限制为 1GB,在 Premium 中为 10GB,可以刷新到 12GB.

The Power BI (PBI) data model, is a lighter weight version of Analysis Service. If you have PBI Desktop open, you can open task manager and see that there is a Analysis Services instance in the background. In Power BI the dataset size is limited to 1GB, in Premium it is 10GB, with the ability to refresh to 12GB.

Analysis Services 将能够保存更多数据,并且不仅限于有限的数据集大小,而且您还拥有基于企业组织的其他功能.Analysis Services 还可以直接查询模式下的数据源或导入数据,如 Power BI.

Analysis Services will be able to hold more data, and is not limited to the limited data set sizes, and you also have other features based for an Enterprise Organisation. Analysis Services can also sit over a data source in direct query mode or import the data, like Power BI.

在您的问题中,您提到直接查询模式因存在性能问题而臭名昭著",但这取决于数据源的结构和大小.对于我部署的许多项目,我使用 Direct Query 来处理至少 50-100GB 的数据源,但是这些大多是标准的 Star Schema 数据仓库或定义的报告表,两者都将具有相关索引、覆盖索引或列存储索引,以允许更有效地检索数据.由于基于度量、关系和连接开销对数据源执行的查询数量,直接查询模式会变慢.另一个可以是页面上视觉对象的数量,因为每个视觉对象都是一个查询,并且每个视觉对象都必须在数据源上运行.

In your question you mentioned that the Direct Query Mode 'is notorious for having performance issues', however that will be dependent on the structure and size of the data source. For a number of projects that I have deployed, I have used Direct Query to sit over data sources that have been at least 50-100GB, however these have been mostly standard Star Schema data warehouses, or a defined reporting table, both will have the relevant indexes, covering indexes, or Column Store Indexes to allow more efficient retrieval of data. Direct Query Mode will slow down due to the number of query's that it has the do on the data source based on the measure, relationships and the connection overhead. Another can be the number of visuals on page, as each visual is a query and each one has to run on the data source.

另一种提高 Direct Query 速度的方法是使用 Power BI 中的聚合,用于在 Power BI 中存储导入的数据子集.如果查询可以由聚合层回答,那么它将得到更快的回答.微软通过 'Trillion Row Demo' 展示了这一点

One other method to increase the speed of Direct Query would be to use Aggregations in Power BI, to store an Imported subset of data in Power BI. If the query can be answered by the aggregation layer then it will be answered quicker. Microsoft demonstrated this with the 'Trillion Row Demo'

就 Power BI Direct Query 问题而言,从我与之交互的客户范围来看,那些确实存在 Direct Query 问题的客户,在低效架构中混搭表,对数据运行次优查询源,在 DAX 中包含许多数据转换,以及编写得很糟糕的 DAX 度量,例如很多 DISTINCT COUNTS &开关.

In terms of the Power BI Direct Query Issues, from the range of clients that I interact with, those that do have issues with Direct Query, have a mash up of tables in an inefficient schema, running sub optimal query's on the data source, with a number of data transformations in DAX, and DAX measures that have been badly written, for example lots of DISTINCT COUNTS & SWITCH.

因此,如果您希望导入数据,并且它超出了数据集大小限制,那么 Analysis Services 是您的最佳选择.如果可以很好地设置数据结构,那么 Power BI 和 Direct Query 应该没有问题.

So, if you wish to import the data, and it is over dataset size limits then Analysis Services is your best option. If you can set up the data structure in a good way, there should be no issues with Power BI and Direct Query.

希望有帮助

这篇关于Azure 分析服务与直接查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆