在 Power BI 中使用编辑查询和 R 对多个表/数据集进行操作 [英] Operations on multiple tables / datasets with Edit Queries and R in Power BI
问题描述
我在 Power BI 文件中有两个表 tbl_A 和 tbl_B
,我想使用 Edit Queries 中的
Run R Script
功能对其进行转换和分析代码>.
这将包括处理缺失值和加入表格.但是,在启动 R 时,似乎我一次只能对一张表进行操作.这是因为 Run R Script
功能仅从您单击 Run R Script
按钮时处于活动状态的表中导入数据.然后将该数据存储在 dataset
变量中.
如果这是正确的,在我看来,R` 在 Power BI 中的实际使用将非常有限.我知道我可以在释放 R 之前加入表.对于像这样的简单情况,这将是一个可行的解决方案,但对于更复杂的数据结构肯定不是.关于如何在 Power BI 中使用 R 对多个表进行操作有什么建议吗?
短版:
在Edit Queries中,插入R脚本时,只需在公式栏中添加[dataset = "Renamed Columns", dataset2 = tbl_A]
kbd>.在这种情况下,Renamed Columns
是指您要插入 R 脚本的表的状态(在 APPLIED STEPS 下),而 tbl_A
是指到您可以使用的另一张桌子.并检查有关隐私的所有设置.
加长版
根据我的评论,这是一个基于
现在,转到选项和设置|数据源设置.选择源并单击编辑权限.将其设置为 Public:
<小时>现在我们可以开始了:
我将在这里从头开始,因为我不知道任何其他数据加载方法会在 PowerBI 中触发什么怪癖.我有两个单独的 Excel 文件,每个文件分别包含一个名为 tbl_A
和 tbl_B
的工作表.这两个表的数据如下所示:
tbl_A 数据
日期 Price1 Price205.05.2016 23,615 24,77504.05.2016 23,58 24,7503.05.2016 0 24,3502.05.2016 22,91 24,1129.04.2016 22,93 24,24
tbl_A 截图
tbl_B 数据
日期 Price3 Price402.06.2016 19,35 22,801.06.2016 19 22,3531.05.2016 19,35 22,7130.05.2016 15,5 21,8527.05.2016 19,43 22,52
tbl_B 截图
<小时>在 PowerBI 的主窗口中,使用 Get Data 加载 tbl_A
:
对 tbl_B
做同样的事情,这样你最终会在 Fields 菜单下得到两个单独的表格:
单击 Home 选项卡下的 Edit Queries 并确保 Formula Bar 可见.如果没有,可以在View下激活:
根据您的表的加载方式,PowerBI 将在该过程中添加几个步骤.这些步骤在 Query Settings 下可见:
除其他外,PowerBI 将日期的数据类型更改为,您猜对了,日期.这可能会
对两个表都完成此操作后,确保 tbl_B 处于活动状态,并查看 查询设置.您会看到在数据加载过程中添加了一个新步骤Changed Type
:
我们将添加另一个步骤,以使我们即将推出的 R 脚本尽可能简单.在该脚本中,我们将使用 rbind()
函数连接表.这将触发错误,除非不同表中的列名相同.因此,继续将 B 列中的名称从 Price3
和 Price4
分别更改为 Price1
和 Price2
:
现在,查询设置下的应用步骤应该如下所示:
最后一步的名称至关重要,因为在编写 R 脚本时,您将不得不引用 重命名的列(或您想调用的任何其他名称).最后我们可以做到这一点.
在 Transform 下,单击 运行 R 脚本.如下图所示,变量 dataset
将包含脚本的原始数据.在这种情况下,如果当您单击 Run R Script 时 tbl_B
是活动表,它将是数据框形式的 tbl_B
:
现在,让脚本保持原样,单击确定,然后查看编辑栏:
上图告诉我们两个重要的事情.首先,我们可以看到该过程到目前为止进展顺利,并且我们有一张空桌子.其次,我们可以看到 dataset
指的是 tbl_B
在 我们在 step 之后离开它的状态 Renamed Columns
.如果你在其他地方读过这些东西,这部分可能会让人感到困惑.在公式栏中,您可以通过添加来输入第二个数据集,dataset2=tbl_A
,因此公式现在看起来像这样:
点击回车
在 Query Settings 下,您现在会看到有一个新步骤可以编辑您的 R 脚本:
点击它返回 R 并添加这个小片段:
df_B <- 数据集df_A <- 数据集2df_temp <- rbind(df_A, df_B)输出 <- df_temp
当您单击确定时,您将看到以下内容:
不用担心公式栏看起来很乱,直接点击output旁边的Table.
就是这样!!
转到主页并点击关闭&应用 退出查询编辑器.现在您可以在 Fields 下或在 Data 选项卡中检查 R 脚本的输出,如下图所示:
最终结果将是原始 tbl_B
的一个版本,其中添加了 tbl_A
中的列.不是太花哨,但是现在您已经在 R 脚本中组合了两个数据集,您可以将 R 的更大部分释放到您的工作流程中.
I have two tables tbl_A and tbl_B
in a Power BI file that I'd like to transform and analyze using the Run R Script
functionality in Edit Queries
.
This would include handling missing values and joining the tables. However, when starting R, it seems I'm only able to do operations on one table at a time. This is because the Run R Script
functionality only imports data from the table that is active when you click the Run R Script
button. This data is then stored in the dataset
variable.
If this is correct, it seems to me that the practical use of R` in Power BI would be very limited. I know I could join the tables before I unleash R. That would be a feasible solution for a simple case like this, but certainly not for more complex data structures. Any suggestions on how to do operations on multiple tables with R in Power BI?
Short version:
In Edit Queries, when inserting an R script, just add [dataset = "Renamed Columns", dataset2 = tbl_A]
in the Formula bar. In this case Renamed Columns
refers to the state of your table (under APPLIED STEPS) where you're inserting your R script, and tbl_A
refers to another table that is available to you. And check all your settings with regards to Privacy.
Long version
Following up on my comment, here is a solution that builds on suggestions from a business intelligence blog and contributions in the PowerBI forum:
First you'll have to edit a few settings. Go to Options and Settings | Options. Under Privacy, select Always ignore Privacy Level settings. On your own risk of course...
Now, go to Options and Settings | Data Source Settings. Select source and click Edit permissons. Set it to Public:
Now we're good to go:
I'm gonna go from scratch here since I don't know what quirks any other data loading method would trigger in PowerBI. I've got two separate Excel files, each containing one worksheet called tbl_A
and tbl_B
, respectively.
The data for the two tables look like this:
tbl_A Data
Date Price1 Price2
05.05.2016 23,615 24,775
04.05.2016 23,58 24,75
03.05.2016 0 24,35
02.05.2016 22,91 24,11
29.04.2016 22,93 24,24
tbl_A Screenshot
tbl_B Data
Date Price3 Price4
02.06.2016 19,35 22,8
01.06.2016 19 22,35
31.05.2016 19,35 22,71
30.05.2016 15,5 21,85
27.05.2016 19,43 22,52
tbl_B Screenshot
In the main window in PowerBI, load tbl_A
using Get Data:
Do the same thing with tbl_B
so that you end up with two separate tables under the Fields menu:
Click Edit Queries under the Home tab and make sure that the Formula Bar is visible. If not, you can activate it under View:
Depending on how your tables are loaded, PowerBI will add a few steps in the process. Those steps are visible under Query Settings:
Among other things, PowerBI changes the data type of dates to, you guessed it, Date. This can trigger problems later. To avoid this, we can change the data type for date in both tables to Text:
After you've done this for both tables, make sure tbl_B is active, and have a look at the Query Settings. You'll se that a new step Changed Type
has been added in the data loading process:
We're going to add another step in order to keep our up-coming R script as simple as possible. In that script we're going to join the tables using the rbind()
function. This will trigger an error unless the column names in the different tables are the same. So go ahead and change the names in column B from Price3
and Price4
to Price1
and Price2
, respectively:
Now, the Applied steps under Query settings should look like this:
The name of the last step is crucial since you're going to have to reference Renamed Columns (or whatever else you'd like to call it) when you write your R script. And finally we can do exactly that.
Under Transform, click Run R Script. As the picture below describes, the variable dataset
will contain the original data for your script. In this case, it will be tbl_B
in the form of a dataframe if tbl_B
was the active table when you clicked Run R Script:
For now, leave the script as it is, click OK, and have a look at the formula bar:
The picture above tells us two important things. First, we can see that the process has gone smoothly so far and that we have an empty table. Second, we can see that dataset
refers to tbl_B
in the state that we left it after the step Renamed Columns
. And this is the part that can be confusing if you've read about these things elsewhere. In the Formula bar, you can enter a second dataset by adding , dataset2=tbl_A
, so that the formula now looks like this:
Hit Enter
Under Query Settings, you will now see that there's a new step where you can edit your R script:
Click it to get back into R and add this little snippet:
df_B <- dataset
df_A <- dataset2
df_temp <- rbind(df_A, df_B)
output <- df_temp
When you click OK, this is what you'll see:
Nevermind that the formula bar looks like a mess, just go ahead and click Table next to output.
This is it!!
Go to Home and click Close & Apply to get out of the Query Editor. Now you can inspect the output from your R script under Fields, or in the Data tab like in the picture below:
The end result will be a version of your original tbl_B
with the columns from tbl_A
added to it. Not too fancy, but now that you've combined two datasets in you R script you're able to unleash a bigger part of R to your work flow.
这篇关于在 Power BI 中使用编辑查询和 R 对多个表/数据集进行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!