优化Dax& “在…之间的日期”模型类型查询 [英] Optimizing Dax & model for "where date between" type queries

查看:156
本文介绍了优化Dax& “在…之间的日期”模型类型查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个模型,以便可以报告两个单独的数据集,在这个示例中,我们将学生数据集&

I am building a model to allow reporting on two seperate datasets, for this example we'l say a Students dataset & a Staff dataset.

数据集是非常独立的,两者之间唯一真正的联系是日期,因此从模型的角度来看,有一个学生星型图&

The datasets are pretty seperate and the only real link between the two is Date, so from a model perspective, there is a Students star schema & a Staff Star Schema.

显示的数据是快照类型数据,回答以下问题:
-在选定的日期,显示所有在职员工
-对于选定的日期,显示所有已注册的学生

The data displayed is snapshot type data, answering questions like: - For a selected date, show all active employees - for a selected date, show all enrolled students

这意味着选择单个日期后,模型将查找选定日期在就业开始之内的所有员工&结束日期,并找到所选日期落在登记的开始和入学日期之内的所有学生。结束日期。

This means that when a single date is selected, the model then finds all employees where the selected date falls within the employment start & end date , and finds all students where the selected date falls within the enrolled start & end date.

这意味着我必须做出决定,如何从具有单一日期维度的每个架构中返回正确的数据。创建关系将不起作用,因为在表格形式中的关系不允许之间类型查询,因此我有一个不相关的日期维度,每个模型的Dax都找到了适用的行。

This means i had to make a decision, how to return the correct data from each schema with a single date dimension. Creating a relationship would not work as relationships in Tabular dont allow "between" type queries, so i instead have one unrelated Date Dimension and the Dax for each model finds applicable rows.

问题在于它不是性能最高的。对于大约5万行,添加一个度量可能需要5到10秒。

The problem is that its not the most performant. for perhaps 50k rows, adding a measure can take 5-10 seconds.

我问是否有更好的方法编写查询或将模型更改为静态让我在之间样式查询中执行,但性能会更好。

Im asking if there is a better way to either write the queries, or alter the model to still let me do "between" style queries but give better performance.

以下是dax查询的示例,用于返回特定日期就读的所有学生。

Below is an example of a dax query to return all students that were enrolled on a particular date.

All Enrolled Students:=IF (
HASONEVALUE ( 'Date'[Date] ),
CALCULATE (
    DISTINCTCOUNT ( 'Students'[StudentID] ),
    FILTER (
        'Students',
        'Students'[StudentStartDateID] <= MIN ( 'Date'[DateID] )
            && 'Students'[StudentEndDateID] >= MAX ( 'Date'[DateID] )
    )
),
BLANK ())


推荐答案

这种类型的场景通常称为进度或持续时间的事件。看看下面的链接。答案将取决于您的SSAS版本和事件持续时间的长度。

This type of scenario is often called "Events in progress" or "Events with a duration". Take a look at the links below. The answer will depend on your version of SSAS and the event duration length.

https://www.sqlbi.com/articles/analyzing-events-with-a-duration-in-dax/
https://www.sqlbi.com/articles/understanding- dax-query-plans /
https://blog.gbrueckl.at/2014/12/events-in-progress-for-time-periods-in-dax/

如果这些措施执行不佳(持续时间较长的事件可能会发生这种情况),则可能有必要为该事件的每一天生成一个包含一行的表。 SQL看起来像这样:

If these measures don't perform well (Which can happen with events that have a long duration), it may be necessary to generate a table containing a row for each day of the event. The SQL would look something like this:

SELECT        
   d.CalendarDate      
  ,s.StudentID
FROM dbo.Students AS s 
CROSS JOIN dbo.DimDate AS d      
WHERE d.CalendarDate >= StudentStartDateID      
AND d.CalendarDate <= StudentEndDateID

从此表创建一个与日期/日历表的关系。

Create a relationship from this table to the date/calendar table.

通过这种设计,您可以使用简单的DISTINCTCOUNT(Students [StudentID])度量,该度量应该会更好。权衡是该表可能会变得很大。使其尽可能窄,以实现最佳性能和节省内存。另一种优化可能是使用不同的粒度,例如周或月,而不是天。

With this design you can use a simple DISTINCTCOUNT(Students[StudentID]) measure, which should perform better. The trade-off is that this table can become quite large. Keep it as narrow as possible for best performance and memory conservation. Another optimization could be to use a different granularity such as week or month instead of day.

这篇关于优化Dax&amp; “在…之间的日期”模型类型查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆