如何拆分平面文件数据并加载到数据库中的父子表? [英] How do I split flat file data and load into parent-child tables in database?

查看:156
本文介绍了如何拆分平面文件数据并加载到数据库中的父子表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有需要导入父子表的非规范化数据(来自文件)。源数据是这样的:

I have denormalized data (coming from a file) that needs to be imported into parent-child tables. The source data is something like this:

Account#    Name        Membership    Email
101         J Burns     Gold          alpha@foo.com
101         J Burns     Gold          bravo@foo.com
101         J Burns     Gold          charlie@yay.com
227         H Gordon    Silver        red@color.com
350         B Clyde     Silver        italian@food.com
350         B Clyde     Silver        mexican@food.com

什么是零件,零件或者SSIS的策略我应该用前三列读入父表,第四列(Email)读入子表?我可以为父键提供多种选项:

What are the pieces, parts, or tactics of SSIS I should use to read the first three columns into a parent table, and the 4th column (Email) into a child table? I have several options for the parent key which I am permitted to take:


  • 直接使用帐号#作为主键

  • 在导入过程中使用SSIS生成的代理密钥

  • 配置身份主键

我确信我已经按照不断增加的难度列出了我的主键选项。我有兴趣知道如何做第一个和最后一个选项 - 我将推断如何实现中间选项。再次强调,我对一个明确的SSIS解决方案很感兴趣;我正在寻找一个使用SSIS语言的答案,而不是程序性的,技术中立的答案。

I'm sure I've listed my primary key options in increasing order of difficulty. I'd be interested in knowing how to do the first and the last option - I'll infer how to achieve the middle option. To emphasize again, I'm interested in a decidedly SSIS solution; I'm looking for an answer that uses the language of SSIS, rather than a procedural, technology neutral answer.

我的问题有点类似于另一个SO问题,具有模糊可行性的答案。我希望能给出更详细的指导。我已经知道如何通过创建一个临时中间步骤来解决这个问题,其中父子分离实际上是用直接SQL处理的。但是,我很好奇如果没有这种中间步骤可以做到这一点。

My question is somewhat similar to another SO question, having an answer of vague viability. I'm hoping more detailed guidance could be given. I already know how to solve this problem by creating a "staging" middle-step, where the parent-child separation is actually handled with straight SQL. However, I'm curious about how this can be done without that kind of middle-step.

在我看来,这种导入会如此普遍,以至于会有一种公布的方法来处理它 - 这是SSIS擅长的一种技术。到目前为止,我还没有看到任何直接答案。

It seems to me this kind of import would be so common, that there would be a well-published formulaic way to handle it - a technique that SSIS excels at. As yet, I've not quite seen any straight up answer to this.

更新#1 :根据评论,我调整了样本数据以更明显地非规范化。我还从平面文件中删除了flat,因此语义不会干扰问题。

Update #1: Based on comments, I've adjusted the sample data to be more obviously denormalized. I also removed "flat" from "flat file," so that semantics don't interfere with the question.

更新#2 :我对使用SSIS语言的解决方案感兴趣。

Update #2: I've amplified my interest in a solution spoken in the language of SSIS.

推荐答案

以下是加载父子数据时可以考虑的一个可能选项。此选项包含两个步骤。在第一个步骤中,读取源文件并将数据写入父表。在第二个步骤中,再次读取源文件并使用查找转换来获取父信息,以便将数据写入子表。以下示例使用问题中提供的数据。此示例是使用SSIS 2008 R2和SQL Server 2008数据库创建的。

Here is one possible option that you can consider in loading parent-child data. This option consists of two steps. In the first step, read the source file and write data to parent table. In the second step, read the source file again and use lookup transformation to fetch the parent info in order to write data to the child table. Following example uses the data provided in the question. This example was created using SSIS 2008 R2 and SQL Server 2008 database.

分步流程:


  1. 创建一个名为 Source.txt 的示例平面文件,如屏幕截图# 1

  1. Create a sample flat file named Source.txt as shown in screenshot #1.

在SQL数据库中,创建两个名为 dbo.Parent 和<$的表c $ c> dbo.Child 使用 SQL Scripts 部分下提供的脚本。这两个表都有一个自动生成的标识列。

In the SQL database, create two tables named dbo.Parent and dbo.Child using the scripts given under SQL Scripts section. Both the tables have an auto generated identity column.

在包上,放置一个 OLE DB连接连接到SQL Server和平面文件连接以读取源文件,如屏幕截图# 2 所示。配置平面文件连接,如屏幕截图# 3 - # 9 所示。

On the package, place an OLE DB connection to connect to the SQL Server and Flat File connection to read the source file as shown in screenshot #2. Configure the flat file connection as shown in screenshots #3 - #9.

在控件上Flow选项卡,放置两个数据流任务,如屏幕截图# 10 所示。

On the Control Flow tab, place two Data Flow Tasks as shown in screenshot #10.

在名为 Parent 的数据流任务中,放置一个平面文件源,排序转换和OLE DB目标,如屏幕截图# 11 所示。

Inside the data flow task named Parent, place a Flat File source, Sort transformation and an OLE DB destination as shown in screenshot #11.

配置平面文件源,如屏幕截图# 12 和# 13 所示。我们需要阅读平面文件来源。

Configure the flat file source as shown in screenshots #12 and #13. We need to read the flat file source.

配置排序转换,如屏幕截图# 14 所示。我们需要消除重复值,以便只将唯一记录插入父表 dbo.Parent

Configure the sort transformation as shown in screenshot #14. We need to eliminate the duplicate values so that only the unique records are inserted into the parent table dbo.Parent.

配置ole db目标,如屏幕截图# 15 和# 16 所示。我们需要将数据插入父表 dbo.Parent

Configure the ole db destination as shown in screenshots #15 and #16. We need to insert the data into the parent table dbo.Parent.

在数据流任务中命名为 Child ,放置一个Flat File源,Lookup转换和一个OLE DB目标,如屏幕截图# 17 所示。

Inside the data flow task named Child, place a Flat File source, Lookup transformation and an OLE DB destination as shown in screenshot #17.

配置平面文件源,如屏幕截图# 12 和# 13 所示。此配置与上一个数据流任务中的平面文件源相同。

Configure the flat file source as shown in screenshots #12 and #13. This configuration is same as the flat file source in the previous data flow task.

配置查找转换,如屏幕截图# 18 和# 20 。我们需要使用文件中存在的其他键列从表 dbo.Parent 中找到父ID。这里的关键列是帐户,名称和电子邮件。如果文件碰巧有一个唯一列,您可以单独使用该列来获取父ID。

Configure the lookup transformation as shown in screenshots #18 and #20. We need to find the parent id from the table dbo.Parent using the other key columns present in the file. The key columns here are the Account, Name and Email. If the file happened to have a unique column, you could just use that column alone to fetch the parent id.

配置ole db目标,如屏幕截图所示# 21 和# 22 。我们需要将电子邮件列和父ID一起插入表 dbo.Child

Configure the ole db destination as shown in screenshots #21 and #22. We need to insert the Email column along with the Parent id into the table dbo.Child.

屏幕截图# 23 会在执行

Screenshot #23 shows data in the tables before the package execution.

屏幕截图# 24 和# 25 显示示例包执行。

Screenshots #24 and #25 show sample package execution.

屏幕截图# 26 显示数据在表之后执行包。

Screenshot #26 shows data in the tables after the package execution.

希望有所帮助。

SQL脚本:

CREATE TABLE [dbo].[Child](
    [ChildId] [int] IDENTITY(1,1) NOT NULL,
    [ParentId] [int] NULL,
    [Email] [varchar](21) NULL,
CONSTRAINT [PK_Child] PRIMARY KEY CLUSTERED ([ChildId] ASC)) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Parent](
    [ParentId] [int] IDENTITY(1,1) NOT NULL,
    [Account] [varchar](12) NULL,
    [Name] [varchar](12) NULL,
    [Membership] [varchar](14) NULL,
CONSTRAINT [PK_Parent] PRIMARY KEY CLUSTERED ([ParentId] ASC)) ON [PRIMARY]
GO

屏幕截图#1:

屏幕截图#2:

屏幕截图#3:

屏幕截图#4:

屏幕截图#5:

屏幕截图#6:

屏幕截图#7:

屏幕截图#8:

屏幕截图#9:

屏幕截图#10:

屏幕截图#11:

屏幕截图#12:

屏幕截图#13:

屏幕截图#14:

屏幕截图#15:

屏幕截图#16:

屏幕截图#17:

屏幕截图#18:

屏幕截图#19:

屏幕截图#20:

屏幕截图#21:

屏幕截图#22:

< img src =https://i.stack.imgur.com/om2O7.pngalt =22>

屏幕截图#23:

屏幕截图#24:

屏幕截图#25:

屏幕截图#26:

这篇关于如何拆分平面文件数据并加载到数据库中的父子表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆