使用数据工厂将表从SAP BW提取到Azure Data Lake Gen2 [英] Extract table from SAP BW to Azure Data Lake Gen2 using data factory

查看:139
本文介绍了使用数据工厂将表从SAP BW提取到Azure Data Lake Gen2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道将表从安装在Azure云上的SAP BW提取到Azure数据湖gen2的过程.我想使用ADF将数据从SAP BW复制到Data Lake.

I would like to know the procedure to extract table from SAP BW installed on Azure cloud to Azure data lake gen2. I want to use ADF to copy data from SAP BW to Data lake.

我们可以使用SAP连接器将ADF直接连接到SAP吗?为此连接是否需要安装Runtime Integration和任何VM? SAP BW Open Hub连接器和通过MDX的SAP BW有什么区别?

Can we connect ADF to SAP directly with SAP connector? Do I have to install Runtime Integration and any VM for this connection? What's the difference between SAP BW Open Hub connector and SAP BW via MDX?

当SAP也托管在Azure上时,想听听专家关于如何从SAP BW提取数据的信息.谢谢.

Would like to hear from experts on how to extract data from SAP BW, when SAP is also hosted on Azure. Thanks.

推荐答案

我不是专家,但BW人员向我解释了两者之间的区别,您可以在其中同时使用两者,但是使用OpenHub可以在在没有BW人员参与的情况下进行BW查询,但是性能不会很好.我相信使用MDX时,需要在BW上进行其他开发,但性能会更好.

I am not an expert, but the difference was explained to me by a BW person where you can use both, but with OpenHub you can run an extract on a BW query without involvement of a BW person, but performance would not be great. With MDX I believe there is additional development that would need to be set up on BW but the performance is better.

还请记住,当我运行这些查询时,发现很难并行化它,而Microsoft文档没有提供一个很好的示例,我发现我推送给BW的所有内容都是作为单个查询发送的.

Also keep in mind that when I was running those queries I found it hard to parallelize it and while Microsoft docs did not provide a good example I found that what whatever I pushed to BW it was sent as a single query.

或者,我最近的用例是从SAP BW与多维数据集中的表中获取数据,所以这可能行得通.

Alternatively my recent use case was to get data out of a table in SAP BW vs a cube so this might work.

我遵循了&列出的说明. SAP表"连接器

要使此过程正常运行,您将需要一个自托管的IR(在笔记本电脑上或连接到ADF的VM上),并且您需要安装以下驱动程序:

For this process to work you will need a self hosted IR (either on your laptop or a VM that is attached to an ADF) and you will need to install the following drivers:

要获取这些驱动程序,您可能需要与您的基础团队联系.他们还需要创建一个Interface角色(特别是如果这是您第一次建立此连接,并且您希望服务帐户可以被其他进程重用).

To get those drivers you will probably need to reach out to your Basis team. They will also need to also create an Interface role (esp if this is your first time making this connection and you want a service account to be re-used by other processes).

所有这些之后,您还需要向该接口添加RFC授权.以下是对我有用的. Microsoft网站确实给出了建议的RFC授权,但是这些授权几乎处于管理员级别,我们的Basis团队基本上不想这样做:

After all of that you also need to have RFC authorizations added to this Interface. The below ones are the ones that worked for me. Microsoft website does give out a suggested RFC authorization, but those are almost on admin level and our Basis team basically did not want to do that:

S_RFC: FUGR-RFC1,SYST,SYSU FUNC-RFCPING,RFC_FUNCTION_SEARCH ACTVT – 16

S_RFC: FUGR - RFC1, SYST, SYSU FUNC - RFCPING, RFC_FUNCTION_SEARCH ACTVT – 16

除上述内容外,我们还必须运行一些测试,发现根据要从中提取数据的表的数量,它们可能需要添加其他授权,以便您只能从该表中读取数据.

In addition to above we had to run a couple of tests and found that depending on the number of tables you want to pull data from they might need to add additional authorizations so that you can only read from that table.

上面的过程是我遵循的过程,因此您的过程可能看起来有些不同,但是要完成此工作,您需要:自托管IR,在这些IR上安装的SAP驱动程序,允许您访问BW系统ID,接口的防火墙规则由Basis创建,然后由RFC授权创建.

The above process was the one I followed so your's might look a little different, but to make this work you need: Self Hosted IR, SAP drivers installed on those IRs, Firewall rules allowing you to access the BW system id, Interface created by Basis, then also RFC authorizations.

我已经在Microsoft github文档上提出了有关不正确的RFC授权列表的问题: https://github.com/MicrosoftDocs/azure-docs/issues/60637

I have opened up an issue on the microsoft github documentation about the incorrect RFC authorization list: https://github.com/MicrosoftDocs/azure-docs/issues/60637

还请记住,ADF首先将查询发送到BW的方式提取数据,然后BW在其末端创建一个收集该信息的文件,然后将该文件发送回自托管IR,然后由该IR写入数据通过ADF进入存储帐户.可能发生的情况是,如果文件太大,则管道可能会失败,但这不是由于ADF,而是由于BW方面的限制.

Also keep in mind that the way that ADF pulls the data it first sends query to BW, BW then creates a file on its end collecting that info, the file then is sent back to the Self Hosted IR which then writes the data into a storage account through ADF. What might happen is that if the file is too large then the pipeline can fail, but not because of ADF, but because of limitations on BW side.

希望我的经验可以帮助陷入困境的其他人:)

Hopefully my experience can help someone else stuck :)

这篇关于使用数据工厂将表从SAP BW提取到Azure Data Lake Gen2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆