将数据从 SAP 提取到 SQL Server [英] Extracting Data from SAP to SQL Server

查看:78
本文介绍了将数据从 SAP 提取到 SQL Server的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 SSIS 包将数据从 SAP 数据库表中提取到 SQL Server 表中.我正在使用 OLEDB 源/目标连接来实现这一点.

I am using SSIS packages to extract data from SAP database tables into SQL Server tables. I am using OLEDB source/destination connections to achieve this.

现在的问题是 SAP 中的一个表有 500 万条记录,将这些数据提取到我的 SQL Server 表中需要大约 2 个小时.我使用了 trunc-dump 方法(截断 sql server 中的表并将数据从 SAP 表中转储到其中),并尝试使用多哈希键引入更新/新记录.

The problem now is that a table in SAP has 5 Million records and its taking around 2 hours to extract this data into my SQL Server table. I have used the trunc-dump method (truncating the table in sql server and dumping data into it from SAP table) and also tried using Multiple Hash key to bring in the updated/new records.

Hash 键的问题在于它仍然需要扫描整个表以查找更改的/新记录,因此与 trunc-dump 方法花费的时间几乎相同.

The problem with Hash key is that it still has to scan the entire table to look for changed/new records and hence takes almost the same time as the trunc-dump method.

我正在寻找一种新方法或更改现有方法以减少完成此提取所需的时间.

I am looking for a new way or changing the existing way to reduce the time taken to complete this extraction.

推荐答案

正如您提到的,您使用 OLEDB 源连接来访问 SAP,如果这意味着您正在直接访问 SAP 的底层数据库,那么出于三个原因,您应该暂停这样做,直到有明确的 IT 批准:

As you mentioned you were using OLEDB source connection to access SAP, if that means you were accessing SAP's underlying database directly, you should pause doing that for three reasons till there are explicit IT approvals:

  1. 您跳过了 SAP 的应用层安全性.可能存在企业安全合规性问题;
  2. 您公司的 SAP 许可证可能不允许您这样做.如果您的公司只有 SAP 间接访问许可,那么您可能不得不停留在应用层;
  3. 直接访问底层数据库不会获得 SAP 的官方支持.

您有多种选择可以通过 SAP 应用层使用 SSIS 获取数据:

You have multiple options to fetch data using SSIS through SAP application layer:

  1. 为此工作使用商业 SSIS 自定义组件(免责声明:AecorSoft 是提供此类连接组件的领先供应商之一);
  2. 查看 SAP 自己的 OData 网关接口以使用数据.
  3. 请求您的 SAP ABAP 团队编写自定义 ABAP 程序以将 SAP 数据转储到 CSV 文件中,然后使用 SSIS 获取它们.

现在让我们看看性能方面:

Let's now look at the performance side:

SAP ETL 性能取决于许多因素,但一般来说,即使对于具有 100 多个列的 SAP 事务表,每几个小时提取 5 百万行的速度也被认为非常慢.例如,我们已经看到以每 1-2 分钟 100 万行的一致性能提取标准 SAP 总帐标题表 BKPF(近 100 列)的案例.当然,这样的性能是通过商业组件和 SSIS 实现的,但是即使对于上面的 #3 选项,通过中间 CSV 文件,您也应该期望每 10 分钟至少有 1M.在幕后,通过 SAP 应用程序层,所有 3 个选项都将利用 SAP Open SQL(与底层数据库提供的本机 SQL"相反)来访问 SAP 表,因此,如果您遇到应用程序层性能问题,您可以分析 Open SQL 端.

SAP ETL Performance depends on many factors, but in general, even for the SAP transactional tables with 100+ columns, it's considered very slow to extract 5 millions rows per a couple of hours. For example, we've seen cases of extracting standard SAP General Ledger header table BKPF (almost 100 columns) at consistent performance of 1M rows every 1-2 minutes. Of course such performance is achieved through commercial component and SSIS, but you should expect at least 1M per 10 minutes even for the #3 option above, going through an intermediate CSV file. Under the hood, through SAP application layer, all the 3 options would leverage SAP Open SQL (in contrast to the "Native SQL" which the underlying database offers) to access SAP tables, therefore, if you experience application layer performance issue, you can analyze the Open SQL side.

正如您还提到的更新/新记录方案,这是一个典型的增量提取问题.通常,在 SAP 事务表中,有创建日期和更改日期字段可以帮助您捕获增量.在这种情况下,为了避免全表扫描,通过 SAP 应用层在那些增量字段"上应用索引.例如,如果您需要提取 Sales Document Header VBAK 表,您可以按 ERDAT(创建于)和 AEDAT(更改于)进行过滤.Delta 在 SAP 中是一个复杂的主题.没有简单的语句来描述增量解决方案,因为 SAP 数据模型很复杂,而且功能模块之间也有很大不同.增量分析始终是个案的工作.有些人也可能简单地推荐使用delta提取器",但不要将其视为灵丹妙药,因为提取器有其自身的问题.简而言之,如果您研究基于表的提取,请关注这一点,并尝试与您的 SAP 职能团队合作以确定合适的增量字段.尽量避免进行全表扫描和散列.使用先前提取的一些可选重叠进行增量加载(例如加载今天和昨天的记录),并执行 MERGE 以吸收更改.

As you also mentioned about update/new records scenario, it's a typical delta extraction problem. Normally, in SAP transactional tables, there are Create Date and Changed Date fields which can help you capture delta. In this case, in order to avoid full table scan, apply indices through SAP application layer on those "delta fields". For example, if you need to extract Sales Document Header VBAK table, you can filter by ERDAT (Created on) and AEDAT (Changed on). Delta is a complex subject in SAP. There is no simple statement to describe the delta solution, as SAP data models are complex and very different across functional modules. The delta analysis is always a case-by-case effort. Some people may also simply recommend using "delta extractors", but don't treat that as silver bullet, because extractor has its own problem. In short, if you look into table based extraction, focus on that, and try to work with your SAP functional team to determine the suitable delta fields. Try avoiding doing full table scan and hashing. Do incremental load with some optional overlap of previous extract (e.g. loading today and yesterday's records), and do MERGE to absorb the changes.

在少数情况下您可能找不到任何增量字段,并且一直进行满载是不切实际的.一个很好的例子是地址主数据表 ADRC.在这种情况下,如果您需要在此类表上执行增量加载,则您必须要求您的 SAP 功能团队为您计算增量(这意味着他们将自定义逻辑注入到每个可以创建、更新或更新 Address master 的地方)删除),或者您必须请求您的 SAP Basis 团队在底层数据库表上创建 DB 触发器,并在应用程序层公开触发器表.这样就可以在主表和触发器表上创建应用层视图来做delta.尽管如此,您的解决方案仍无法直接访问数据库.DB 层触发器由同样支持数据库的 SAP Basis 团队完全管理和控制.

There are few cases you may not be able to find any delta field, and it is not practical to do full load all the time. One great example is the Address Master data table ADRC. In this case, if you are required to do delta load on such table, you ether have to request your SAP function team to figure out delta for you (meaning they inject custom logic to every place where Address master can be created, updated, or deleted), or you have to request your SAP Basis team to create DB trigger on the underlying database table, and expose the trigger table at application layer. This way, you can create an application layer view on the main table and the trigger table to do delta. Still, there is no direct database access through your solution. The DB layer trigger is fully managed and controlled by your SAP Basis team who also supports the database.

希望这会有所帮助!

这篇关于将数据从 SAP 提取到 SQL Server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆