使用 SSIS 加载多个表保持外键关系 [英] Loading Multiple Tables using SSIS keeping foreign key relationships

查看:32
本文介绍了使用 SSIS 加载多个表保持外键关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 SSIS 将数据从单个文件(包含数百万条记录)加载到 SQL Server 上的多个表中,同时维护文件中定义的关系.

I am trying to load data from a single file (with million+ records) into multiple tables on SQL Server using SSIS while maintaining the relationships defined in the file.

为了更好地举例说明,假设我正在尝试加载一个文件,其中包含员工姓名、他们过去占用的办公室以及用选项卡分隔的职位历史.

To better elaborate with an example, lets assume I am trying to load a file containing employee name, the offices they have occupied in the past and their Job title history separated by a tab.

文件:

EmployeeName<tab>OfficeHistory<tab>JobLevelHistory
John Smith<tab>501<tab>Engineer
John Smith<tab>601<tab>Senior Engineer
John Smith<tab>701<tab>Manager
Alex Button<tab>601<tab>Senior Assistant
Alex Button<tab>454<tab>Manager

如果我的 Office 数据库架构具有以下表格:

If my Office database schema has the following tables:

Employee (nId, name)
Office (nId, number)
JobTitle (nId, titleName)
Employee2Office (nEmpID, nOfficeId)
Employee2JobTitle (nEmpId, nJobTitleID)

如何使用 SSIS 将文件加载到上面的架构中?自动生成员工、办公室和职位的 ID,并维护员工和办公室、员工和职位之间的关系?

How can I use SSIS to load the file into the schema above Auto-Generating IDs for Employee, Office and JobTitle and maintaining the relationship between the employee and offices, and employee and Job Titles?

所以在这种情况下.表格应如下所示:

So in this case. the tables should look like:

Employee
1 John Smith
2 Alex Button

Office
1 501
2 601
3 701
4 454

JobTitle
1 Engineer
2 Senior Engineer
3 Manager
4 Senior Assistant

Employee2Office
1 1
1 2
1 3
2 2
2 4

Employee2JobTitle
1 1
1 2
1 3
2 4
2 3

我是 SSIS 的新手,并没有在执行数据流任务时使用自动生成 ID 和建立外键关系.任何指针将不胜感激.

I am a newbie to SSIS and have not played around with auto-generating IDs and establishing foreign key relationships while executing a Data Flow Task. Any pointers would be appreciated.

谢谢!

推荐答案

一个有趣的问题.这是我的方法(Sql Server 2005).(我假设这是一份月度工作,而不仅仅是一次,所以我添加了代码以实现可重复性.)

An interesting question. Here is how I would do it (Sql Server 2005). (I'm assuming this is a monthly job and not just a one time, so I added code for repeatability.)

  1. 为 Employee、JobTitle 和 Office 表创建三个变量(type = Object)
  2. 使用三个 sql 任务将这三个表的行选择到相应的变量中.
  3. 添加数据流任务.
  4. 使用平面文件目的地从您的平面文件中进行选择.
  5. 输出进入脚本组件,以平面文件中的三列作为输入,将三个表变量导入脚本,脚本组件中的五个输出均具有相同的排除组编号,输入标记为对于该输出同步,七个新列(3 个用于 emp,每个输出一个,2 个用于工作,2 个用于办公室)添加到输出中,并使用以下代码(对 System.xml.dll 的引用必须添加以使这一切正常工作.):

  1. Create three variables for the Employee, JobTitle and Office table (type = Object)
  2. Use three sql tasks to select the rows for of those three tables into the corresponding variables.
  3. Add a dataflow task.
  4. Select from your flat file using a flat file destination.
  5. Output goes into a script component with the three columns in the flat file as input, the three table variable imported into the script, five outputs in the script component each with the same exclusion group number and the input marked as synchronous for that output, seven new columns (3 for emp one for each output it will be in, 2 for job, 2 for office) added to the output, and with the following code (A Reference to System.xml.dll had to be added to make this all work.):

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Collections
Imports System.Data.OleDb

Public Class ScriptMain
    Inherits UserComponent

    Private da As New OleDbDataAdapter
    Private emp As New DataTable
    Private emph As New Hashtable()
    Private job As New DataTable
    Private jobh As New Hashtable()
    Private off As New DataTable
    Private offh As New Hashtable()
    Private maxempid As Integer
    Private maxjobid As Integer
    Private maxoffid As Integer

    Public Overrides Sub PreExecute()
        maxempid = 0
        maxjobid = 0
        maxoffid = 0
        da.Fill(emp, Me.Variables.EmpTab)
        For Each dr As DataRow In emp.Rows
            emph.Add(dr.Item("Name"), dr.Item("nID"))
            If (CInt(dr.Item("nID").ToString) > maxempid) Then
                maxempid = CInt(dr.Item("nID").ToString)
            End If
        Next
        da.Fill(job, Me.Variables.JobTab)
        For Each dr As DataRow In job.Rows
            jobh.Add(dr.Item("titleName"), dr.Item("nID"))
            If (CInt(dr.Item("nID").ToString) > maxempid) Then
                maxjobid = CInt(dr.Item("nID").ToString)
            End If
        Next
        da.Fill(off, Me.Variables.OffTab)
        For Each dr As DataRow In off.Rows
            offh.Add(dr.Item("number"), dr.Item("nID"))
            If (CInt(dr.Item("nID").ToString) > maxempid) Then
                maxoffid = CInt(dr.Item("nID").ToString)
            End If
        Next
        emp.Dispose()
        job.Dispose()
        off.Dispose()
        da.Dispose()
        MyBase.PreExecute()
    End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    If Not emph.ContainsKey(Row.EmployeeName) Then
        maxempid += 1
        emph.Add(Row.EmployeeName, maxempid)
        Row.EmpId = maxempid
        Row.Emp2Id = maxempid
        Row.Emp3Id = maxempid
        Row.DirectRowToEmployee()
    Else
        Row.EmpId = CInt(emph.Item(Row.EmployeeName).ToString)
        Row.Emp2Id = CInt(emph.Item(Row.EmployeeName).ToString)
        Row.Emp3Id = CInt(emph.Item(Row.EmployeeName).ToString)
    End If
    If Not jobh.ContainsKey(Row.JobLevelHistory) Then
        maxjobid += 1
        jobh.Add(Row.JobLevelHistory, maxjobid)
        Row.JobId = maxjobid
        Row.Job2Id = maxjobid
        Row.DirectRowToJobTitle()
    Else
        Row.JobId = CInt(jobh.Item(Row.JobLevelHistory).ToString)
        Row.Job2Id = CInt(jobh.Item(Row.JobLevelHistory).ToString)
    End If
    If Not offh.ContainsKey(Row.OfficeHistory) Then
        maxoffid += 1
        offh.Add(Row.OfficeHistory, maxoffid)
        Row.OffId = maxoffid
        Row.Off2Id = maxoffid
        Row.DirectRowToOfficeNumber()
    Else
        Row.OffId = CInt(offh.Item(Row.OfficeHistory).ToString)
        Row.Off2Id = CInt(offh.Item(Row.OfficeHistory).ToString)
    End If
    Row.DirectRowToEmp2Job()
    Row.DirectRowToEmp2Off()
End Sub        
End Class

  • 此脚本的结果(脚本为输入数据中的新值生成 id.它通过将现有表加载到脚本预执行部分的哈希表中,然后检查是否存在name 并基于它增加 maxid 并将其添加到散列(如果它添加到散列中),它还会将行添加到适当的(emp、job 或 off)输出,或者从散列中检索每行的 maxid.) 无论上述状态如何,所有行都将写入剩余的两个输出(emp2job 和 emp2off).

  • The results from this script (The script generates id's for new values in the input data. It does this by loading up the existing table into hashtables in the preexecute portion of the script, then by checking for existance of the name and based upon that either increments the maxid and adds it to the hash if it adds to the hash, it also adds the row to the appropriate (emp, job, or off) output, or retrieves the maxid from the hash for each row.) all row regardless of status above will be written to the two remaining outputs (emp2job and emp2off).

    这篇关于使用 SSIS 加载多个表保持外键关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆