在 Power BI 中使用编辑查询和 R 对多个表/数据集进行操作 [英] Operations on multiple tables / datasets with Edit Queries and R in Power BI

查看:12
本文介绍了在 Power BI 中使用编辑查询和 R 对多个表/数据集进行操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Power BI 文件中有两个表 tbl_A 和 tbl_B,我想使用 Edit Queries 中的 Run R Script 功能对其进行转换和分析.

这将包括处理缺失值和加入表格.但是,在启动 R 时,似乎我一次只能对一张表进行操作.这是因为 Run R Script 功能仅从您单击 Run R Script 按钮时处于活动状态的表中导入数据.然后将该数据存储在 dataset 变量中.

如果这是正确的,在我看来,R` 在 Power BI 中的实际使用将非常有限.我知道我可以在释放 R 之前加入表.对于像这样的简单情况,这将是一个可行的解决方案,但对于更复杂的数据结构肯定不是.关于如何在 Power BI 中使用 R 对多个表进行操作有什么建议吗?

解决方案

短版:

Edit Queries中,插入R脚本时,只需在公式栏中添加[dataset = "Renamed Columns", dataset2 = tbl_A]kbd>.在这种情况下,Renamed Columns 是指您要插入 R 脚本的表的状态(在 APPLIED STEPS 下),而 tbl_A 是指到您可以使用的另一张桌子.并检查有关隐私的所有设置.

<小时>

加长版

根据我的评论,这是一个基于

现在,转到选项和设置|数据源设置.选择源并单击编辑权限.将其设置为 Public:

<小时>

现在我们可以开始了:

我将在这里从头开始,因为我不知道任何其他数据加载方法会在 PowerBI 中触发什么怪癖.我有两个单独的 Excel 文件,每个文件分别包含一个名为 tbl_Atbl_B 的工作表.这两个表的数据如下所示:

tbl_A 数据

日期 Price1 Price205.05.2016 23,615 24,77504.05.2016 23,58 24,7503.05.2016 0 24,3502.05.2016 22,91 24,1129.04.2016 22,93 24,24

tbl_A 截图

tbl_B 数据

日期 Price3 Price402.06.2016 19,35 22,801.06.2016 19 22,3531.05.2016 19,35 22,7130.05.2016 15,5 21,8527.05.2016 19,43 22,52

tbl_B 截图

<小时>

在 PowerBI 的主窗口中,使用 Get Data 加载 tbl_A:

tbl_B 做同样的事情,这样你最终会在 Fields 菜单下得到两个单独的表格:

单击 Home 选项卡下的 Edit Queries 并确保 Formula Bar 可见.如果没有,可以在View下激活:

根据您的表的加载方式,PowerBI 将在该过程中添加几个步骤.这些步骤在 Query Settings 下可见:

除其他外,PowerBI 将日期的数据类型更改为,您猜对了,日期.这可能会

对两个表都完成此操作后,确保 tbl_B 处于活动状态,并查看 查询设置.您会看到在数据加载过程中添加了一个新步骤Changed Type:

我们将添加另一个步骤,以使我们即将推出的 R 脚本尽可能简单.在该脚本中,我们将使用 rbind() 函数连接表.这将触发错误,除非不同表中的列名相同.因此,继续将 B 列中的名称从 Price3Price4 分别更改为 Price1Price2:

现在,查询设置下的应用步骤应该如下所示:

最后一步的名称至关重要,因为在编写 R 脚本时,您将不得不引用 重命名的列(或您想调用的任何其他名称).最后我们可以做到这一点.

Transform 下,单击 运行 R 脚本.如下图所示,变量 dataset 将包含脚本的原始数据.在这种情况下,如果当您单击 Run R Scripttbl_B 是活动表,它将是数据框形式的 tbl_B:

现在,让脚本保持原样,单击确定,然后查看编辑栏:

上图告诉我们两个重要的事情.首先,我们可以看到该过程到目前为止进展顺利,并且我们有一张空桌子.其次,我们可以看到 dataset 指的是 tbl_B我们在 step 之后离开它的状态 Renamed Columns.如果你在其他地方读过这些东西,这部分可能会让人感到困惑.在公式栏中,您可以通过添加来输入第二个数据集,dataset2=tbl_A,因此公式现在看起来像这样:

点击回车

Query Settings 下,您现在会看到有一个新步骤可以编辑您的 R 脚本:

点击它返回 R 并添加这个小片段:

df_B <- 数据集df_A <- 数据集2df_temp <- rbind(df_A, df_B)输出 <- df_temp

当您单击确定时,您将看到以下内容:

不用担心公式栏看起来很乱,直接点击output旁边的Table.

就是这样!!

转到主页并点击关闭&应用 退出查询编辑器.现在您可以在 Fields 下或在 Data 选项卡中检查 R 脚本的输出,如下图所示:

最终结果将是原始 tbl_B 的一个版本,其中添加了 tbl_A 中的列.不是太花哨,但是现在您已经在 R 脚本中组合了两个数据集,您可以将 R 的更大部分释放到您的工作流程中.

I have two tables tbl_A and tbl_B in a Power BI file that I'd like to transform and analyze using the Run R Script functionality in Edit Queries.

This would include handling missing values and joining the tables. However, when starting R, it seems I'm only able to do operations on one table at a time. This is because the Run R Script functionality only imports data from the table that is active when you click the Run R Script button. This data is then stored in the dataset variable.

If this is correct, it seems to me that the practical use of R` in Power BI would be very limited. I know I could join the tables before I unleash R. That would be a feasible solution for a simple case like this, but certainly not for more complex data structures. Any suggestions on how to do operations on multiple tables with R in Power BI?

解决方案

Short version:

In Edit Queries, when inserting an R script, just add [dataset = "Renamed Columns", dataset2 = tbl_A] in the Formula bar. In this case Renamed Columns refers to the state of your table (under APPLIED STEPS) where you're inserting your R script, and tbl_A refers to another table that is available to you. And check all your settings with regards to Privacy.


Long version

Following up on my comment, here is a solution that builds on suggestions from a business intelligence blog and contributions in the PowerBI forum:

First you'll have to edit a few settings. Go to Options and Settings | Options. Under Privacy, select Always ignore Privacy Level settings. On your own risk of course...

Now, go to Options and Settings | Data Source Settings. Select source and click Edit permissons. Set it to Public:


Now we're good to go:

I'm gonna go from scratch here since I don't know what quirks any other data loading method would trigger in PowerBI. I've got two separate Excel files, each containing one worksheet called tbl_A and tbl_B, respectively. The data for the two tables look like this:

tbl_A Data

Date        Price1  Price2
05.05.2016  23,615  24,775
04.05.2016  23,58   24,75
03.05.2016  0       24,35
02.05.2016  22,91   24,11
29.04.2016  22,93   24,24

tbl_A Screenshot

tbl_B Data

Date        Price3  Price4
02.06.2016  19,35   22,8
01.06.2016  19      22,35
31.05.2016  19,35   22,71
30.05.2016  15,5    21,85
27.05.2016  19,43   22,52

tbl_B Screenshot


In the main window in PowerBI, load tbl_A using Get Data:

Do the same thing with tbl_B so that you end up with two separate tables under the Fields menu:

Click Edit Queries under the Home tab and make sure that the Formula Bar is visible. If not, you can activate it under View:

Depending on how your tables are loaded, PowerBI will add a few steps in the process. Those steps are visible under Query Settings:

Among other things, PowerBI changes the data type of dates to, you guessed it, Date. This can trigger problems later. To avoid this, we can change the data type for date in both tables to Text:

After you've done this for both tables, make sure tbl_B is active, and have a look at the Query Settings. You'll se that a new step Changed Type has been added in the data loading process:

We're going to add another step in order to keep our up-coming R script as simple as possible. In that script we're going to join the tables using the rbind() function. This will trigger an error unless the column names in the different tables are the same. So go ahead and change the names in column B from Price3 and Price4 to Price1 and Price2, respectively:

Now, the Applied steps under Query settings should look like this:

The name of the last step is crucial since you're going to have to reference Renamed Columns (or whatever else you'd like to call it) when you write your R script. And finally we can do exactly that.

Under Transform, click Run R Script. As the picture below describes, the variable dataset will contain the original data for your script. In this case, it will be tbl_B in the form of a dataframe if tbl_B was the active table when you clicked Run R Script:

For now, leave the script as it is, click OK, and have a look at the formula bar:

The picture above tells us two important things. First, we can see that the process has gone smoothly so far and that we have an empty table. Second, we can see that dataset refers to tbl_B in the state that we left it after the step Renamed Columns. And this is the part that can be confusing if you've read about these things elsewhere. In the Formula bar, you can enter a second dataset by adding , dataset2=tbl_A, so that the formula now looks like this:

Hit Enter

Under Query Settings, you will now see that there's a new step where you can edit your R script:

Click it to get back into R and add this little snippet:

df_B <- dataset
df_A <- dataset2
df_temp <- rbind(df_A, df_B)

output <- df_temp

When you click OK, this is what you'll see:

Nevermind that the formula bar looks like a mess, just go ahead and click Table next to output.

This is it!!

Go to Home and click Close & Apply to get out of the Query Editor. Now you can inspect the output from your R script under Fields, or in the Data tab like in the picture below:

The end result will be a version of your original tbl_B with the columns from tbl_A added to it. Not too fancy, but now that you've combined two datasets in you R script you're able to unleash a bigger part of R to your work flow.

这篇关于在 Power BI 中使用编辑查询和 R 对多个表/数据集进行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆