Azure Analysis Services与直接查询 [英] Azure Analysis Services vs Direct Query

查看:197
本文介绍了Azure Analysis Services与直接查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正努力评估在成本和性能方面的最佳方法,该方法是使用带有Azure Analysis Services的Power BI或带有直接查询的Power BI访问数据并在数据集中运行查询。

I'm tring to assess the best approach, in terms of cost and performance, in accessing data and running queries on a dataset using Power BI with Azure Analysis Services or Power BI with Direct Query.

我试图用下图说明这两种方法。

I have trie to illustrate both approaches with the following diagram.

从图中的步骤4和5描述了使用Power BI和Direct在访问Azure Data Lake中的数据查询。因此,第4步和第6步描述了如何使用Power BI和Azure Analysis Services访问数据。

From the diagram steps 4 and 5 describe accessing data in Azure Data Lake using Power BI with Direct Query. Wherease, steps 4 and 6 describes accessing data using Power BI with Azure Analysis Services.

根据我自己的研究,直接查询因存在性能问题而臭名昭著,例如

From my own research, Direct Query is notorious for having performance issues e.g


所有DirectQuery请求都发送到源数据库,因此刷新视觉效果所需的时间
取决于该后端
源采用查询(或多个查询)的结果作为响应。

All DirectQuery requests are sent to the source database, so the time required to refresh a visual is dependent on how long that back-end source takes to respond with the results from the query (or queries).

上面的陈述有据可查,在我的设计中,DirectQuery请求应该不是问题,因为大多数逻辑和转换都将在Databricks中进行(尽管我不希望这个问题集中在Databricks上)。

The above statement is well documented, however, in my design DirectQuery request shouldn't be an issue because most of the logic and transformation will take place in Databricks (although, I don't want this question to focus on Databricks).

另一方面,使用Azure Analysis Services(AAS),所有请求都在内存中而不是DirectQuery中发生,因此速度更快。

On the other hand, with Azure Analysis Services(AAS) all requests occur in memory as opposed to DirectQuery and therefore much faster.

因此,我想如果您可以分享使用DirectQuery和AAS的经验。如果您可以让我知道,是否错过了使用其他技术的优势/劣势?

So, I would like it if you could share you experience using DirectQuery, and AAS. If you could let me know if I have missed out on any advantages/disadvantages of using on technology over the other/

推荐答案

Power BI(PBI)数据模型是Analysis Service的轻量级版本。如果您打开了PBI Desktop,则可以打开任务管理器,并在后台看到Analysis Services实例。
在Power BI中,数据集的大小限制为1GB,在Premium中,数据集的大小限制为10GB,可以刷新到12GB。

The Power BI (PBI) data model, is a lighter weight version of Analysis Service. If you have PBI Desktop open, you can open task manager and see that there is a Analysis Services instance in the background. In Power BI the dataset size is limited to 1GB, in Premium it is 10GB, with the ability to refresh to 12GB.

Analysis Services将能够保存更多数据,并且不仅限于有限的数据集大小,而且您还具有其他功能以企业组织为基础。 Analysis Services还可以直接查询模式放置数据源或导入数据,例如Power BI。

Analysis Services will be able to hold more data, and is not limited to the limited data set sizes, and you also have other features based for an Enterprise Organisation. Analysis Services can also sit over a data source in direct query mode or import the data, like Power BI.

在您提到的问题中,您提到直接查询模式是臭名昭著的但是,这取决于数据源的结构和大小。对于我已经部署的许多项目,我使用Direct Query来查看至少50-100GB的数据源,但是这些数据源大多是标准的Star Schema数据仓库或已定义的报告表,都将具有相关索引,覆盖索引或列存储索引以允许更有效地检索数据。由于基于度量,关系和连接开销对数据源执行的查询数量众多,因此直接查询模式会变慢。另一个可以是页面上的视觉效果数量,因为每个视觉效果都是一个查询,每个视觉效果都必须在数据源上运行。

In your question you mentioned that the Direct Query Mode 'is notorious for having performance issues', however that will be dependent on the structure and size of the data source. For a number of projects that I have deployed, I have used Direct Query to sit over data sources that have been at least 50-100GB, however these have been mostly standard Star Schema data warehouses, or a defined reporting table, both will have the relevant indexes, covering indexes, or Column Store Indexes to allow more efficient retrieval of data. Direct Query Mode will slow down due to the number of query's that it has the do on the data source based on the measure, relationships and the connection overhead. Another can be the number of visuals on page, as each visual is a query and each one has to run on the data source.

另一种提高速度的方法直接查询将使用Power BI中的聚合来将导入的数据子集存储在Power BI中。如果查询可以由聚合层回答,那么它将更快地回答。微软通过' Trillion Row Demo '

One other method to increase the speed of Direct Query would be to use Aggregations in Power BI, to store an Imported subset of data in Power BI. If the query can be answered by the aggregation layer then it will be answered quicker. Microsoft demonstrated this with the 'Trillion Row Demo'

就Power BI Direct Query问题而言,从与我进行交互的客户端范围中,那些确实存在Direct Query问题的客户端,在效率低下的架构中混搭了表,正在运行数据源上的次优查询,在DAX中进行了许多数据转换,并且DAX度量被错误地编写,例如很多DISTINCT COUNTS&切换。

In terms of the Power BI Direct Query Issues, from the range of clients that I interact with, those that do have issues with Direct Query, have a mash up of tables in an inefficient schema, running sub optimal query's on the data source, with a number of data transformations in DAX, and DAX measures that have been badly written, for example lots of DISTINCT COUNTS & SWITCH.

因此,如果您要导入数据,并且超出数据集大小限制,则Analysis Services是您的最佳选择。如果可以很好地设置数据结构,则Power BI和直接查询应该没有问题。

So, if you wish to import the data, and it is over dataset size limits then Analysis Services is your best option. If you can set up the data structure in a good way, there should be no issues with Power BI and Direct Query.

希望有帮助

这篇关于Azure Analysis Services与直接查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆