批量插入到自引用表中 [英] BULK INSERT into a Self-Referencing Table

查看:29
本文介绍了批量插入到自引用表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是的一部分一个更复杂的问题,我正在分解成更小的块(为了我的理智).

This question is part of a much more complex problem that I am breaking down into smaller chunks (for my sanity).

假设我有一个 Parts 表,它是自引用的,看起来像这样:

Let's say I have a Parts table that is self-referencing and looks something like this:

PartId  ParentPartId  Description             PartNumber   IsCatHeader     ProviderId
---------------------------------------------------------------------------------------
9292       null       'Engine Parts'           null           1      'Engine Parts||1'
9293       9292       'Engine Bolts'           null           1      'Engine Bolts||1'
9294       9293       '6mm Engine Bolt'        'X1-234-ABC'     0      '6mm Engine Bolt|X1-234-ABC|0'
9295       9293       '5mm Engine Bolt'        'X2-934-BCD'     0      '5mm Engine Bolt|X2-934-BCD|0'
9296       9295       '5mm Engine Bolt Washer' 'X2-934-GED'     0      '5mm Engine Bolt Washer|X2-934-GED|0'

你懂的.现在...我们正在导入这些部分的整本书(大量 CSV 文件),以在一本书中调整数百个行项目.

You get the idea. Now... we are importing entire books (massive CSV files) of these parts, to the tune of several hundred line items in a single book.

部分经常在书籍中重复,我们的部分工作是将重复项排除在数据库之外.

Parts are often duplicated across books, and part of our job is to keep duplicates out of the database.

源没有为这些部分提供任何类型的唯一 id,因此我们创建了一个 ProviderId 列,该列是来自创建唯一字符串的每条记录的数据部分的集合.然后我们可以在导入时使用它来检查重复项.(此列中的实际数据比我在此处显示的数据更复杂.)

The source does not provide any kind of unique id for these parts, so we have created a ProviderId column that is a collection of data parts from each record that creates a unique string. We can then use this to check for duplicates as we do the import. (The actual data in this column is more complex than what I've shown here.)

那么,现在是我的问题.我试图找出批量执行此操作的最佳方法.一种选择(不是一个好的选择)是从 C# 应用程序中一次一个地循环遍历每个项目......插入一个父级,获取 SCOPE IDENTITY,插入所有子级等等.哎呀.在一本大书中,这将导致每本书进行数千次数据库调用.不是一个选项.

So, now to my problem. I am trying to figure out the best way to do this in bulk. One option (NOT a good one) is to cycle through each item one at a time from the C# application.... insert a parent, get the SCOPE IDENTITY, insert all children, etc. Yuck. In a large book, this would result in thousands of DB calls per book. Not an option.

我们需要一个批量插入解决方案.但是我们在这方面的自引用方面有一个真正的难题.

We need a bulk insert solution. But we have a real conundrum with the self-referencing aspect of this.

我们最初的想法是用 C# 构建整个数据模型,包括所有的 PartIdParentPartId.然后直接批量插入Parts 表.然而,问题在于知道从哪个 ID 开始.请记住,多个进程将同时运行,并且许多部分将是重复的.我们尝试使用 SEQUENCE 对象,但会出现问题……重复的图书有 100% 可能会被处理,如果我们使用 SEQUENCE,这将导致 ID 中出现巨大的空白.

Our original thought was to build the ENTIRE data model in C#, including all of the PartId's and ParentPartId's. Then bulk insert directly into the Parts table. The problem with this, however, is knowing what ID to start with. Keep in mind, multiple processes will be running simultaneously, and many of the parts will be duplicates. We tried using a SEQUENCE object, but that presented problems... it's 100% possible that duplicate books will be processed, which would result in giant gaps in the ID's if we use a SEQUENCE.

我现在正在追求的课程是...我们创建了一个 Parts_Staging 表,它看起来几乎像实际的 Parts 表.我们可以对它进行批量插入,没问题.然后是一个简单的查询,使用 ProviderId 列在 Parts_Staging 中查找 Parts 中不存在的记录并将它们移过去.

The course I am chasing now is this... We have created a Parts_Staging table that looks almost like the actual Parts table. And we can do a bulk insert to it, no problem. Then it's a simple query to use the ProviderId column to look for records in Parts_Staging that don't exist in Parts and move them over.

但是对于这条路径,我没有足够的创造力(或经验)来想象一种方法来执行此移动/合并并保持自引用 id 的完整性.

But with this path, I am not creative (or experienced) enough to imagine a way to do this move / merge and to keep the self-referencing id's intact.

我一直在阅读诸如如何将数据插入到sql server中的自引用表?"和T-SQL - 将数据插入父表和子表"但到目前为止我仍然没有看到愿景.

I've been reading threads like "How to Insert data into self reference table in sql server?" and "T-SQL - Insert Data into Parent and Child Tables" but so far I am still not seeing the vision.

推荐答案

好的.这就是我要做的.首先,在 C# 中组成集合中的序列及其父/子关系.但我会将其放在不同的列中,可能类似于 BatchPartIdBatchParentPartId.(可能是不同的关联表.没关系.)

Ok. Here's what I would do. First, make up the sequence in your collection in C# along with it's parent/child relationship. But I would put that in a different column, maybe something like BatchPartId and BatchParentPartId. (Maybe a different associated table. It doesn't matter.)

| PartId | ParentPartId | BatchPartId | BatchParentPartId |
|--------|--------------|-------------|-------------------|
|        |              | XX1901      |                   |
|        |              | XX1902      | XX1901            |
|        |              | XX1903      | XX1901            |
|        |              | XX1904      | XX1903            |
|        |              | XX1905      | XX1903            |

然后,插入整个列表,创建 PartId.

Then, insert the entire list, allowing the PartId to get created.

| PartId | ParentPartId | BatchPartId | BatchParentPartId |
|--------|--------------|-------------|-------------------|
| 55     |              | XX1901      |                   |
| 56     |              | XX1902      | XX1901            |
| 57     |              | XX1903      | XX1901            |
| 58     |              | XX1904      | XX1903            |
| 59     |              | XX1905      | XX1903            |

在发布时,这是您使用 BatchParentPartId 中相应的 BatchPartId 行的 PartId 填写 ParentPartId 的地方 子行.

At post, this is where you fill in the ParentPartId with the PartId of the row with the corresponding BatchPartId from BatchParentPartId of the child row.

| PartId | ParentPartId | BatchPartId | BatchParentPartId |
|--------|--------------|-------------|-------------------|
| 55     |              | XX1901      |                   |
| 56     | 55           | XX1902      | XX1901            |
| 57     | 55           | XX1903      | XX1901            |
| 58     | 57           | XX1904      | XX1903            |
| 59     | 57           | XX1905      | XX1903            |

这篇关于批量插入到自引用表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆