在Power BI中使用编辑查询和R对多个表/数据集进行操作 [英] Operations on multiple tables / datasets with Edit Queries and R in Power BI

查看:351
本文介绍了在Power BI中使用编辑查询和R对多个表/数据集进行操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Power BI文件中有两个表 tbl_A和tbl_B ,我想使用 Run R脚本进行转换和分析编辑查询中的code>功能。





这将包括处理缺失值和加入桌子。但是,启动R时,似乎一次只能对一个表进行操作。这是因为 Run R脚本功能仅从单击 Run R脚本时处于活动状态的表中导入数据。按钮。然后将此数据存储在数据集变量中。





如果这是正确的,在我看来,Power BI中R`的实际使用将非常有限。我知道我可以在释放R之前加入表对于像这样的简单情况,这将是一个可行的解决方案,但对于更复杂的数据结构肯定不是。关于如何在Power BI中使用R 在多个表上进行操作的任何建议?

解决方案

简短版本:



编辑查询中,插入R脚本时,只需添加 [dataset =重命名的列,在公式栏中,数据集2 = tbl_A] 。在这种情况下,重命名列指的是您要插入R脚本的表的状态(在 APPLIED STEPS 下),而 tbl_A 引用另一个可供您使用的表。并检查有关隐私的所有设置。






长版



在我的评论之后,这是一个基于



现在,转到选项和设置 |设置。数据源设置。选择源,然后单击编辑权限。将其设置为公共








现在我们可以出发了:



我要从零开始,因为我不了解PowerBI中任何其他数据加载方法都会触发什么怪癖。我有两个单独的Excel文件,每个文件包含一个分别名为 tbl_A tbl_B 的工作表。
这两个表的数据如下所示:



tbl_A数据

 日期价格1价格2 
2016.05.23 23,615 24,775
04.05.2016 23,58 24,75
03.05.2016 0 24,35
02.05.2016 22,91 24,11
29.04.2016 22,93 24,24

tbl_A屏幕截图





tbl_B数据

 日期Price3价格4 
2016年6月2日19,35 22 ,8
2016年6月1日19 22,35
31.05.2016 19,35 22,71
30.05.2016 15,5 21,85
27.05.2016 19,43 22 ,52

tbl_B屏幕截图








在PowerBI的主窗口中,加载 tbl_A 使用获取数据





tbl_B 做同样的事情,这样您最终在 Fields 菜单下得到两个单独的表:





点击编辑查询 Home 标签下,并确保 Formula Bar 可见。如果没有,则可以在 View 下激活它:





根据表的加载方式,PowerBI将在此过程中增加一些步骤。这些步骤在查询设置下可见:





在其他方面,PowerBI将日期的数据类型更改为Date。稍后可以



完成两个表的操作后,请确保tbl_B处于活动状态,并查看查询设置。您会发现在数据加载过程中添加了新步骤更改类型





我们将添加另一步骤,以使即将到来的R脚本尽可能简单。在该脚本中,我们将使用 rbind()函数联接表。除非不同表中的列名相同,否则将触发错误。因此,将B列中的名称从 Price3 Price4 更改为 Price1 Price2





现在,查询设置下的应用步骤应如下所示:



< a href = https://i.stack.imgur.com/8L5g9.png rel = noreferrer>



最后一步的名称至关重要,因为您将不得不引用重命名列 >(或其他您想调用的名称)在编写R脚本时。最后,我们可以做到这一点。



Transform 下,单击 Run R Script 。如下图所示,变量 dataset 将包含脚本的原始数据。在这种情况下,如果 tbl_B 是单击时的活动表,则它将是数据框形式的 tbl_B 。 kbd>运行R脚本:





现在,保留脚本不变,单击确定,然后查看编辑栏:





上面的图片告诉我们两件事。首先,我们可以看到该过程到目前为止进展顺利,并且我们有一个空表。其次,我们可以看到数据集指的是在步骤后我们离开它的状态下的 tbl_B 重命名列。如果您在其他地方阅读过这些内容,那么这部分可能会造成混淆。在公式栏中,可以通过添加,dataset2 = tbl_A 输入第二个数据集,这样公式现在看起来像这样:





点击 Enter



查询设置下,您将看到有一个可以编辑R脚本的新步骤:





点击它返回R并添加以下代码段:

  df_B<-数据集
df_A<-数据集2
df_temp<-rbind(df_A,df_B)

输出<-df_temp

单击 OK 时,这将是您看到的:





请注意,编辑栏看起来一团糟,只需继续单击并单击输出旁边的 Table



就是这样!





转到主页,然后单击关闭并关闭;应用退出查询编辑器。现在,您可以在 Fields 下或 Data 选项卡中检查R脚本的输出,如下图所示:





最终结果将是原始 tbl_B 中添加了 tbl_A 中的列。不太花哨,但是现在您已经在R脚本中合并了两个数据集,就可以将R的较大部分释放到工作流程中。


I have two tables tbl_A and tbl_B in a Power BI file that I'd like to transform and analyze using the Run R Script functionality in Edit Queries.

This would include handling missing values and joining the tables. However, when starting R, it seems I'm only able to do operations on one table at a time. This is because the Run R Script functionality only imports data from the table that is active when you click the Run R Script button. This data is then stored in the dataset variable.

If this is correct, it seems to me that the practical use of R` in Power BI would be very limited. I know I could join the tables before I unleash R. That would be a feasible solution for a simple case like this, but certainly not for more complex data structures. Any suggestions on how to do operations on multiple tables with R in Power BI?

解决方案

Short version:

In Edit Queries, when inserting an R script, just add [dataset = "Renamed Columns", dataset2 = tbl_A] in the Formula bar. In this case Renamed Columns refers to the state of your table (under APPLIED STEPS) where you're inserting your R script, and tbl_A refers to another table that is available to you. And check all your settings with regards to Privacy.


Long version

Following up on my comment, here is a solution that builds on suggestions from a business intelligence blog and contributions in the PowerBI forum:

First you'll have to edit a few settings. Go to Options and Settings | Options. Under Privacy, select Always ignore Privacy Level settings. On your own risk of course...

Now, go to Options and Settings | Data Source Settings. Select source and click Edit permissons. Set it to Public:


Now we're good to go:

I'm gonna go from scratch here since I don't know what quirks any other data loading method would trigger in PowerBI. I've got two separate Excel files, each containing one worksheet called tbl_A and tbl_B, respectively. The data for the two tables look like this:

tbl_A Data

Date        Price1  Price2
05.05.2016  23,615  24,775
04.05.2016  23,58   24,75
03.05.2016  0       24,35
02.05.2016  22,91   24,11
29.04.2016  22,93   24,24

tbl_A Screenshot

tbl_B Data

Date        Price3  Price4
02.06.2016  19,35   22,8
01.06.2016  19      22,35
31.05.2016  19,35   22,71
30.05.2016  15,5    21,85
27.05.2016  19,43   22,52

tbl_B Screenshot


In the main window in PowerBI, load tbl_A using Get Data:

Do the same thing with tbl_B so that you end up with two separate tables under the Fields menu:

Click Edit Queries under the Home tab and make sure that the Formula Bar is visible. If not, you can activate it under View:

Depending on how your tables are loaded, PowerBI will add a few steps in the process. Those steps are visible under Query Settings:

Among other things, PowerBI changes the data type of dates to, you guessed it, Date. This can trigger problems later. To avoid this, we can change the data type for date in both tables to Text:

After you've done this for both tables, make sure tbl_B is active, and have a look at the Query Settings. You'll se that a new step Changed Type has been added in the data loading process:

We're going to add another step in order to keep our up-coming R script as simple as possible. In that script we're going to join the tables using the rbind() function. This will trigger an error unless the column names in the different tables are the same. So go ahead and change the names in column B from Price3 and Price4 to Price1 and Price2, respectively:

Now, the Applied steps under Query settings should look like this:

The name of the last step is crucial since you're going to have to reference Renamed Columns (or whatever else you'd like to call it) when you write your R script. And finally we can do exactly that.

Under Transform, click Run R Script. As the picture below describes, the variable dataset will contain the original data for your script. In this case, it will be tbl_B in the form of a dataframe if tbl_B was the active table when you clicked Run R Script:

For now, leave the script as it is, click OK, and have a look at the formula bar:

The picture above tells us two important things. First, we can see that the process has gone smoothly so far and that we have an empty table. Second, we can see that dataset refers to tbl_B in the state that we left it after the step Renamed Columns. And this is the part that can be confusing if you've read about these things elsewhere. In the Formula bar, you can enter a second dataset by adding , dataset2=tbl_A, so that the formula now looks like this:

Hit Enter

Under Query Settings, you will now see that there's a new step where you can edit your R script:

Click it to get back into R and add this little snippet:

df_B <- dataset
df_A <- dataset2
df_temp <- rbind(df_A, df_B)

output <- df_temp

When you click OK, this is what you'll see:

Nevermind that the formula bar looks like a mess, just go ahead and click Table next to output.

This is it!!

Go to Home and click Close & Apply to get out of the Query Editor. Now you can inspect the output from your R script under Fields, or in the Data tab like in the picture below:

The end result will be a version of your original tbl_B with the columns from tbl_A added to it. Not too fancy, but now that you've combined two datasets in you R script you're able to unleash a bigger part of R to your work flow.

这篇关于在Power BI中使用编辑查询和R对多个表/数据集进行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆