Power BI 中的多元线性回归 [英] Multiple Linear Regression in Power BI
问题描述
假设我有一组回报,我想计算它与不同市场指数的 Beta 值.为了有一个具体的例子,让我们在名为 Returns
的表中使用以下数据集:
Date Equity Duration Credit Manager-----------------------------------------------01/31/2017 2.907% 0.226% 1.240% 1.78%02/28/2017 2.513% 0.493% 1.120% 3.88%03/31/2017 1.346% -0.046% -0.250% 0.13%04/30/2017 1.612% 0.695% 0.620% 1.04%05/31/2017 2.209% 0.653% 0.480% 1.40%06/30/2017 0.796% -0.162% 0.350% 0.63%07/31/2017 2.733% 0.167% 0.830% 2.06%08/31/2017 0.401% 1.083% -0.670% 0.29%09/30/2017 1.880% -0.857% 1.430% 2.04%10/31/2017 2.151% -0.121% 0.510% 2.33%11/30/2017 2.020% -0.137% -0.020% 3.06%12/31/2017 1.454% 0.309% 0.230% 1.28%
现在在 Excel 中,我可以使用 LINEST
函数来获取 beta 值:
= LINEST(Returns[Manager], Returns[[Equity]:[Credit]], TRUE, TRUE)
它吐出一个看起来像这样的数组:
0.077250253 -0.184974002 0.961578127 -0.0010639710.707796954 0.60202895 0.540811546 0.0082571290.50202386 0.009166729 #N/A #N/A2.688342242 8 #N/A #N/A0.000677695 0.000672231 #N/A #N/A
beta 位于第一行,使用它们可以得到以下线性估计:
Manager = 0.962 * Equity - 0.185 * Duration + 0.077 * Credit - 0.001
问题是如何使用 DAX 在 Power BI 中获取这些值(最好无需编写自定义 R 脚本)?
<小时>对于针对一列的
现在,转到 Edit Queries >编辑查询
并检查您是否拥有:
为了在分析中包含的列数方面保持灵活性,我发现最好删除日期列.这不会对您的回归结果产生影响.只需右键单击日期列并选择Remove
:
请注意,这将在 Query Settings > 下添加一个新步骤.应用步骤
>:
这是您将能够编辑我们将要使用的几行 R 代码的地方.现在,转到 Transform >运行 R Script
打开这个窗口:
注意 # 'dataset' 行保存了这个脚本的输入数据
.值得庆幸的是,您的问题仅与一个输入表有关,因此事情不会变得太复杂(对于多个输入表,请查看
点击OK
,如果一切顺利,你应该得到这样的结果:
点击Table
,你会得到:
在Applied Steps
下,您会看到插入了Run R Script
步骤.单击右侧的星号(齿轮?)进行编辑,或单击df
以格式化输出表.
就是这样!至少对于编辑查询部分.
点击首页>关闭 &应用
返回Power BI Report 部分并验证您在Visualizations > 下有一个新表.字段
:
插入表格或矩阵并激活系数和变量以获得此:
我希望这就是您要找的!
<小时>现在了解有关 R 脚本的一些详细信息:
只要可能,我会避免使用大量不同的 R 库.这样您就可以降低出现依赖性问题的风险.
函数 lm()
处理回归分析.在解释变量的数量方面获得所需灵活性的关键在于 Manager ~ .,数据集
部分.这只是说对数据帧 dataset
中的 Manager
变量运行回归分析,并使用所有剩余的列 ~ .
作为解释变量.coef(model)
部分从估计模型中提取系数值.结果是一个以变量名作为行名的数据框.最后一行只是将这些名称添加到数据帧本身.
Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named Returns
for the sake of having a concrete example:
Date Equity Duration Credit Manager
-----------------------------------------------
01/31/2017 2.907% 0.226% 1.240% 1.78%
02/28/2017 2.513% 0.493% 1.120% 3.88%
03/31/2017 1.346% -0.046% -0.250% 0.13%
04/30/2017 1.612% 0.695% 0.620% 1.04%
05/31/2017 2.209% 0.653% 0.480% 1.40%
06/30/2017 0.796% -0.162% 0.350% 0.63%
07/31/2017 2.733% 0.167% 0.830% 2.06%
08/31/2017 0.401% 1.083% -0.670% 0.29%
09/30/2017 1.880% -0.857% 1.430% 2.04%
10/31/2017 2.151% -0.121% 0.510% 2.33%
11/30/2017 2.020% -0.137% -0.020% 3.06%
12/31/2017 1.454% 0.309% 0.230% 1.28%
Now in Excel, I can just use the LINEST
function to get the beta values:
= LINEST(Returns[Manager], Returns[[Equity]:[Credit]], TRUE, TRUE)
It spits out an array that looks like this:
0.077250253 -0.184974002 0.961578127 -0.001063971
0.707796954 0.60202895 0.540811546 0.008257129
0.50202386 0.009166729 #N/A #N/A
2.688342242 8 #N/A #N/A
0.000677695 0.000672231 #N/A #N/A
The betas are in the top row and using them gives me the following linear estimate:
Manager = 0.962 * Equity - 0.185 * Duration + 0.077 * Credit - 0.001
The question is how can I get these values in Power BI using DAX (preferably without having to write a custom R script)?
For simple linear regression against one column, I can go back to the mathematical definition and write a least squares implementation similar to the one given in this post.
However, when more columns become involved (I need to be able to do up to 12 columns, but not always the same number), this gets messy really quickly and I'm hoping there's a better way.
The essence:
DAX is not the way to go. Use Home > Edit Queries
and then Transform > Run R Script
. Insert the following R snippet to run a regression analysis using all available variables in a table:
model <- lm(Manager ~ . , dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
Edit Manager
to any of the other available variable names to change the dependent variable.
The details:
Good question! Why Microsoft has not introduced more flexible solutions is beyond my understanding. But at the time being, you won't be able to find very good approaches without using R in Power BI.
My suggested approach will therefore ignore your request regarding:
The question is how can I get these values in Power BI using DAX (preferably without having to write a custom R script)?
My answer will however meet your requirements regarding:
A good answer should generalize to more than 3 columns (probably by working on an unpivoted data table with the indices as values rather than column headers).
Here we go:
I'm on a system using comma as a decimal separator, so I'm going to be using the following as the data source (If you copy the numbers directly into Power BI, the column separation will not be maintained. If you first paste it into Excel, copy it again and THEN paste it into Power BI the columns will be fine):
Date Equity Duration Credit Manager
31.01.2017 2,907 0,226 1,24 1,78
28.02.2017 2,513 0,493 1,12 3,88
31.03.2017 1,346 -0,046 -0,25 0,13
30.04.2017 1,612 0,695 0,62 1,04
31.05.2017 2,209 0,653 0,48 1,4
30.06.2017 0,796 -0,162 0,35 0,63
31.07.2017 2,733 0,167 0,83 2,06
31.08.2017 0,401 1,083 -0,67 0,29
30.09.2017 1,88 -0,857 1,43 2,04
31.10.2017 2,151 -0,121 0,51 2,33
30.11.2017 2,02 -0,137 -0,02 3,06
31.12.2017 1,454 0,309 0,23 1,28
Starting from scratch in Power BI (for reproducibility purposes) I'm inserting the data using Enter Data
:
Now, go to Edit Queries > Edit Queries
and check that you have this:
In order to maintain flexibility with regards to the number of columns to include in your analysis, I find it is best to remove the Date Column. This will not have an impact on your regression results. Simply right-click the Date column and select Remove
:
Notice that this will add a new step under Query Settings > Applied Steps
>:
And this is where you are going to be able to edit the few lines of R code we're going to use. Now, go to Transform > Run R Script
to open this window:
Notice the line # 'dataset' holds the input data for this script
. Thankfully, your question is only about ONE input table, so things aren't going to get too complicated (for multiple input tables check out this post). The dataset variable is a variable of the form data.frame in R and is a good (the only..) starting point for further analysis.
Insert the following script:
model <- lm(Manager ~ . , dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
Click OK
, and if all goes well you should end up with this:
Click Table
, and you'll get this:
Under Applied Steps
you'll se that a Run R Script
step has been inserted. Click the star (gear ?) on the right to edit it, or click on df
to format the output table.
This is it! For the Edit Queries part at least.
Click Home > Close & Apply
to get back to Power BI Report section and verfiy that you have a new table under Visualizations > Fields
:
Insert a Table or Matrix and activate Coefficients and Variables to get this:
I hope this is what you were looking for!
Now for some details about the R script:
As long as it's possible, I would avoid using numerous different R libraries. This way you'll reduce the risk of dependency issues.
The function lm()
handles the regression analysis. The key to obtain the required flexibilty with regards to the number of explanatory variables lies in the Manager ~ . , dataset
part. This simply says to run a regression analysis on the Manager
variable in the dataframe dataset
, and use all remaining columns ~ .
as explanatory variables. The coef(model)
part extracts the coefficient values from the estimated model. The result is a dataframe with the variable names as row names. The last line simply adds these names to the dataframe itself.
这篇关于Power BI 中的多元线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!