在Power BI中使用编辑查询和R对多个表/数据集进行操作 [英] Operations on multiple tables / datasets with Edit Queries and R in Power BI
问题描述
我在Power BI文件中有两个表 tbl_A和tbl_B
,我想使用 Run R脚本进行转换和分析
编辑查询
中的code>功能。
这将包括处理缺失值和加入桌子。但是,启动R时,似乎一次只能对一个表进行操作。这是因为 Run R脚本
功能仅从单击 Run R脚本
时处于活动状态的表中导入数据。按钮。然后将此数据存储在数据集
变量中。
如果这是正确的,在我看来,Power BI中R`的实际使用将非常有限。我知道我可以在释放R之前加入表。对于像这样的简单情况,这将是一个可行的解决方案,但对于更复杂的数据结构肯定不是。关于如何在Power BI中使用R 在多个表上进行操作的任何建议?
简短版本:
在编辑查询中,插入R脚本时,只需添加 [dataset =重命名的列,在公式栏中,数据集2 = tbl_A]
。在这种情况下,重命名列
指的是您要插入R脚本的表的状态(在 APPLIED STEPS 下),而 tbl_A
引用另一个可供您使用的表。并检查有关隐私的所有设置。
长版
在我的评论之后,这是一个基于
现在,转到选项和设置 |设置。数据源设置。选择源,然后单击编辑权限。将其设置为公共:
现在我们可以出发了:
我要从零开始,因为我不了解PowerBI中任何其他数据加载方法都会触发什么怪癖。我有两个单独的Excel文件,每个文件包含一个分别名为 tbl_A
和 tbl_B
的工作表。
这两个表的数据如下所示:
tbl_A数据
日期价格1价格2
2016.05.23 23,615 24,775
04.05.2016 23,58 24,75
03.05.2016 0 24,35
02.05.2016 22,91 24,11
29.04.2016 22,93 24,24
tbl_A屏幕截图
tbl_B数据
日期Price3价格4
2016年6月2日19,35 22 ,8
2016年6月1日19 22,35
31.05.2016 19,35 22,71
30.05.2016 15,5 21,85
27.05.2016 19,43 22 ,52
tbl_B屏幕截图
在PowerBI的主窗口中,加载 tbl_A
使用获取数据:
用 tbl_B
做同样的事情,这样您最终在 Fields 菜单下得到两个单独的表:
点击编辑查询 Home 标签下,并确保 Formula Bar 可见。如果没有,则可以在 View 下激活它:
根据表的加载方式,PowerBI将在此过程中增加一些步骤。这些步骤在查询设置下可见:
在其他方面,PowerBI将日期的数据类型更改为Date。稍后可以
完成两个表的操作后,请确保tbl_B处于活动状态,并查看查询设置。您会发现在数据加载过程中添加了新步骤更改类型
:
我们将添加另一步骤,以使即将到来的R脚本尽可能简单。在该脚本中,我们将使用 rbind()
函数联接表。除非不同表中的列名相同,否则将触发错误。因此,将B列中的名称从 Price3
和 Price4
更改为 Price1
和 Price2
:
现在,查询设置下的应用步骤应如下所示:
< a href = https://i.stack.imgur.com/8L5g9.png rel = noreferrer>
最后一步的名称至关重要,因为您将不得不引用重命名列 >(或其他您想调用的名称)在编写R脚本时。最后,我们可以做到这一点。
在 Transform 下,单击 Run R Script 。如下图所示,变量 dataset
将包含脚本的原始数据。在这种情况下,如果 tbl_B
是单击时的活动表,则它将是数据框形式的 tbl_B
。 kbd>运行R脚本:
现在,保留脚本不变,单击确定,然后查看编辑栏:
上面的图片告诉我们两件事。首先,我们可以看到该过程到目前为止进展顺利,并且我们有一个空表。其次,我们可以看到数据集
指的是在步骤后我们离开它的状态下的 tbl_B
重命名列
。如果您在其他地方阅读过这些内容,那么这部分可能会造成混淆。在公式栏中,可以通过添加,dataset2 = tbl_A
输入第二个数据集,这样公式现在看起来像这样:
点击 Enter
在查询设置下,您将看到有一个可以编辑R脚本的新步骤:
点击它返回R并添加以下代码段:
df_B<-数据集
df_A<-数据集2
df_temp<-rbind(df_A,df_B)
输出<-df_temp
单击 OK 时,这将是您看到的:
请注意,编辑栏看起来一团糟,只需继续单击并单击输出旁边的 Table 。
就是这样!
转到主页,然后单击关闭并关闭;应用退出查询编辑器。现在,您可以在 Fields 下或 Data 选项卡中检查R脚本的输出,如下图所示:
最终结果将是原始 tbl_B $ c的版本$ c>中添加了
tbl_A
中的列。不太花哨,但是现在您已经在R脚本中合并了两个数据集,就可以将R的较大部分释放到工作流程中。
I have two tables tbl_A and tbl_B
in a Power BI file that I'd like to transform and analyze using the Run R Script
functionality in Edit Queries
.
This would include handling missing values and joining the tables. However, when starting R, it seems I'm only able to do operations on one table at a time. This is because the Run R Script
functionality only imports data from the table that is active when you click the Run R Script
button. This data is then stored in the dataset
variable.
If this is correct, it seems to me that the practical use of R` in Power BI would be very limited. I know I could join the tables before I unleash R. That would be a feasible solution for a simple case like this, but certainly not for more complex data structures. Any suggestions on how to do operations on multiple tables with R in Power BI?
Short version:
In Edit Queries, when inserting an R script, just add [dataset = "Renamed Columns", dataset2 = tbl_A]
in the Formula bar. In this case Renamed Columns
refers to the state of your table (under APPLIED STEPS) where you're inserting your R script, and tbl_A
refers to another table that is available to you. And check all your settings with regards to Privacy.
Long version
Following up on my comment, here is a solution that builds on suggestions from a business intelligence blog and contributions in the PowerBI forum:
First you'll have to edit a few settings. Go to Options and Settings | Options. Under Privacy, select Always ignore Privacy Level settings. On your own risk of course...
Now, go to Options and Settings | Data Source Settings. Select source and click Edit permissons. Set it to Public:
Now we're good to go:
I'm gonna go from scratch here since I don't know what quirks any other data loading method would trigger in PowerBI. I've got two separate Excel files, each containing one worksheet called tbl_A
and tbl_B
, respectively.
The data for the two tables look like this:
tbl_A Data
Date Price1 Price2
05.05.2016 23,615 24,775
04.05.2016 23,58 24,75
03.05.2016 0 24,35
02.05.2016 22,91 24,11
29.04.2016 22,93 24,24
tbl_A Screenshot
tbl_B Data
Date Price3 Price4
02.06.2016 19,35 22,8
01.06.2016 19 22,35
31.05.2016 19,35 22,71
30.05.2016 15,5 21,85
27.05.2016 19,43 22,52
tbl_B Screenshot
In the main window in PowerBI, load tbl_A
using Get Data:
Do the same thing with tbl_B
so that you end up with two separate tables under the Fields menu:
Click Edit Queries under the Home tab and make sure that the Formula Bar is visible. If not, you can activate it under View:
Depending on how your tables are loaded, PowerBI will add a few steps in the process. Those steps are visible under Query Settings:
Among other things, PowerBI changes the data type of dates to, you guessed it, Date. This can trigger problems later. To avoid this, we can change the data type for date in both tables to Text:
After you've done this for both tables, make sure tbl_B is active, and have a look at the Query Settings. You'll se that a new step Changed Type
has been added in the data loading process:
We're going to add another step in order to keep our up-coming R script as simple as possible. In that script we're going to join the tables using the rbind()
function. This will trigger an error unless the column names in the different tables are the same. So go ahead and change the names in column B from Price3
and Price4
to Price1
and Price2
, respectively:
Now, the Applied steps under Query settings should look like this:
The name of the last step is crucial since you're going to have to reference Renamed Columns (or whatever else you'd like to call it) when you write your R script. And finally we can do exactly that.
Under Transform, click Run R Script. As the picture below describes, the variable dataset
will contain the original data for your script. In this case, it will be tbl_B
in the form of a dataframe if tbl_B
was the active table when you clicked Run R Script:
For now, leave the script as it is, click OK, and have a look at the formula bar:
The picture above tells us two important things. First, we can see that the process has gone smoothly so far and that we have an empty table. Second, we can see that dataset
refers to tbl_B
in the state that we left it after the step Renamed Columns
. And this is the part that can be confusing if you've read about these things elsewhere. In the Formula bar, you can enter a second dataset by adding , dataset2=tbl_A
, so that the formula now looks like this:
Hit Enter
Under Query Settings, you will now see that there's a new step where you can edit your R script:
Click it to get back into R and add this little snippet:
df_B <- dataset
df_A <- dataset2
df_temp <- rbind(df_A, df_B)
output <- df_temp
When you click OK, this is what you'll see:
Nevermind that the formula bar looks like a mess, just go ahead and click Table next to output.
This is it!!
Go to Home and click Close & Apply to get out of the Query Editor. Now you can inspect the output from your R script under Fields, or in the Data tab like in the picture below:
The end result will be a version of your original tbl_B
with the columns from tbl_A
added to it. Not too fancy, but now that you've combined two datasets in you R script you're able to unleash a bigger part of R to your work flow.
这篇关于在Power BI中使用编辑查询和R对多个表/数据集进行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!