导入/规范化方法 - 列-DML或循环 [英] Import/Normalize approach - column-DML, or loop

查看:66
本文介绍了导入/规范化方法 - 列-DML或循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


我刚刚完成了几乎所有已成为

项目真正熊的事情。它必须从另一个

程序的月度电子表格导出中导入数据,并将其转换为标准化数据。由于结构本身每个月都有所不同,因此任务变得更加困难。

(以明确的方式)。


所以,我使用了以SQL为中心的方法,一次采用垂直条纹,所以

,例如,对于每个具有简单重复文本数据的字段,我做了一个

group-by pass,插入到目的地查找表中,然后我再做一次

查询,从输入表连接到查找中的文本字段以获得

主目标表的外键。当我有多个需要成为1-M的列

时,我为每个列进行传递,插入

查找记录来识别分割,然后插入行进入

多边桌子,yadda,yadda,yadda。


所有这一切都在游泳,并且表现得非常好,直到我到达

包含多个分隔值的字段。我的整个设计基于

使用SQL / DML传递所有内容,但是我能想出的唯一方法是使用b / b
来调用用户定义的函数来自在查询中,

拉出,在连续的查询切片,参数1,参数2等等。我是

将使用where子句来排除null结果(输入没有

参数n或以上),并在第一次传递后退出.RecordsAffected =

0.


听起来不错,但只有8000输入行,我等了大约20分钟后我不得不取消第一遍

。请注意,如果省略此部分,整个导入过程需要大约

3分钟,并且包括大约40个垂直分割

通过。 UDF非常简单,并且从VB执行速度非常快,但我想从查询中调用此函数的开销非常严重。只是为了让b / b $ b得到这个东西,我已经决定暂时将这些分开

,因为我们还没有对这些数据进行查询。


所以,我的下一个想法是,也许这一步更简单,

暴力,循环通过输入表的记录集,并将
行插入目标表。但是,如果我正在努力,那么我真的很可能通过做什么来获得任何有益的东西 - 在
垂直切片中作为SQL-DML,或者我会变得更好,只是按照程序方式执行,逐步浏览输入表,并一次创建目标行

?我知道这会更容易写,但在这里我是

试图做正确的事情。如果我再次做这样的事情,我的代码最终会像

那样表现得更好,并且使用简单的迭代更容易编写和维护相比

比SQL-DML?


感谢您的任何意见,


- Steve J

Hi all,

I''ve just finished almost all of what has turned out to be a real bear of a
project. It has to import data from a monthly spreadsheet export from another
program, and convert that into normalized data. The task is made more
difficult by the fact that the structure itself can vary from month to month
(in well defined ways).

So, I used the SQL-centric approach, taking vertical stripes at a time so
that, for instance, for each field with simple, repeating text data, I make a
group-by pass, inserting into the destination lookup table, then I do another
query, joining from the input table to the text fields in the lookups to get
the foreign keys for the main destination table. When I have multiple columns
that need to become 1-M, I make a pass for each of those columns, inserting
lookup record that identifies the split, then inserting the rows into the
many-side table, yadda, yadda, yadda.

All that was going swimmingly, and performing pretty well until I got to the
fields containing multiple, delimited values. My whole dedign is based on
using SQL/DML passes for everything, but the only way I could figure out to
make that work was to call a user defined function from within the query to
pull out, in successive query slices, argument 1, argument 2, etc. I was
going to use a where clause to exclude null results (input doesn''t have
arguments n or above), and quit after the first pass with .RecordsAffected =
0.

Sounds good, but with a mere 8000 input rows, I had to cancel the first pass
after waiting about 20 minutes. Note that a whole import process takes about
3 minutes if this part is ommitted, and that includes about 40 vertical split
passes. The UDF is very simple, and performs quite fast from VB, but I guess
the overhead of calling this function from a query is VERY SEVERE. Just to
get this thing out the door, I''ve decided to just tnot split these for now
since we''re not doing queries of that data yet.

So, my next thought is that, perhaps this step is better done by simply,
brute-force, cycling through a recordset of the input table, and inserting
rows into the destination tables. If I''m soing that, though, then am I really
getting any benefit worth speaking of by doing everything -else- as SQL-DML in
vertical slices, or would I have been much better off, just doing it the
procedural way, walking through the input table, and creating destination rows
one row at a time? I know it would have been easier to write, but here I was
trying to do things the "right" way.

If I do something like this again, would my code end up performing just as
well, and being easier to write and maintain using simple iteration rather
than SQL-DML?

Thanks for any opinions,

- Steve J

推荐答案

>如果我再次做这样的事情,我的代码最终会像
> If I do something like this again, would my code end up performing just as
那样表现得更好,并且使用简单的迭代比使用SQL-DML更容易编写和维护吗?


我的猜测是你可以在这里建立一个相当专业的优化。

毕竟,Jet引擎并不是世界上最重的,它的

查询优化器只能为真正的DML做这么多。

我有一个查询,通过简单的IIF计算几个值,并且

需要三个一个简单的20000记录表的分钟数!


行中某处必须执行任何SQL语句

程序化 - SQL只是为了让我们人类集中发明注意

什么,而不是如何。


我会说:使用程序。额外奖励:您可以显示自己的进度

指标。具有反馈的慢速功能通常被认为比没有反馈的慢速功能更快,即使它运行几秒钟,也可以更快地更新显示器。

更快无需更新显示器。


Steve Jorgensen写道:
所有这些都是游泳,并且表现非常好,直到我到达包含多个分隔值的
字段。我的整个设计是基于
使用SQL / DML传递所有内容,但我能想出的唯一方法是从查询中调用用户定义的函数
拉出,在连续的查询切片中,参数1,参数2等。我是
将使用where子句来排除null结果(输入没有
参数n或更高),并且在第一次传递后退出.RecordsAffected =
0。
well, and being easier to write and maintain using simple iteration rather
than SQL-DML?
My guess is that you can build a fairly specialized optimalisation here.
After all, the Jet engine is not the heaviest in the world, and its
query optimizers can do only so much for true DML.
I have a query that calculates several values by simple IIFs, and that
takes three minutes for a simple 20000 records table!

Somewhere in the line any SQL statement must be executed
procedurally--SQL is invented only for us humans to focus attention on
the what, not the how.

I''d say: use the procedure. Bonus: you can display your own progress
indicator. A slow function with feedback is often perceived as quicker
than a slow function without feedback, even if it runs several seconds
quicker not having to update the display.

Steve Jorgensen wrote:
All that was going swimmingly, and performing pretty well until I got to the
fields containing multiple, delimited values. My whole dedign is based on
using SQL/DML passes for everything, but the only way I could figure out to
make that work was to call a user defined function from within the query to
pull out, in successive query slices, argument 1, argument 2, etc. I was
going to use a where clause to exclude null results (input doesn''t have
arguments n or above), and quit after the first pass with .RecordsAffected =
0.




有时您可以发明一个预处理器(这次是在

内电子表格),这可能吗?我想探索一下这个观点,但是看看

在小组中这样做很有价值(吃掉每个人的带宽,

详细,因为我倾向于)。如果你有时间,请给我发邮件。我的域名不是

org但是nl。


-

Bas Cost Budde



Sometimes you can invent a preprocessor (this time inside the
spreadsheet), is that possible? I''d like to explore the point but see
little value doing that in the group (eating up everyones bandwith,
verbose as I tend to be). If you have time, mail me. My domain is not
org but nl.

--
Bas Cost Budde


2004年1月29日星期四09:44:01 +0100,Bas Cost Budde< ba*@heuveltop.org>写道:
On Thu, 29 Jan 2004 09:44:01 +0100, Bas Cost Budde <ba*@heuveltop.org> wrote:
如果我再次做这样的事情,我的代码最终会表现得好,并且更容易编写和保持使用简单的迭代而不是SQL-DML?
我的猜测是你可以在这里构建一个相当专业的优化。
毕竟,Jet引擎并不是世界上最重的,并且它的查询优化器只能为真正的DML做这么多。
我有一个查询可以通过简单的IIF计算几个值,而且对于一个简单的20000记录表,它需要三分钟!
If I do something like this again, would my code end up performing just as
well, and being easier to write and maintain using simple iteration rather
than SQL-DML?
My guess is that you can build a fairly specialized optimalisation here.
After all, the Jet engine is not the heaviest in the world, and its
query optimizers can do only so much for true DML.
I have a query that calculates several values by simple IIFs, and that
takes three minutes for a simple 20000 records table!




我的表现通常要好得多。事实上,如果我们不是在谈论多处理器或多用户系统,我通常会发现

JET比大多数SQL Server快,因为它处理的事故更少。

我会说:使用程序。额外奖励:您可以显示自己的进度
指标。具有反馈的慢速功能通常被认为比没有反馈的慢速功能更快,即使它运行几秒钟
更快,无需更新显示器。


实际上,由于我的大多数垂直切片插入查询都在2或3 / b
秒内运行,我已经有了一个非常好的进度条 - 直到我得到到了

多值分割,就是这样。

Steve Jorgensen写道:



I generally have much better performance than that. In fact, if we''re not
talking about a multi-processor or multi-user system, I generally find that
JET is faster than most SQL Servers since it has fewer contingencies to deal
with.
I''d say: use the procedure. Bonus: you can display your own progress
indicator. A slow function with feedback is often perceived as quicker
than a slow function without feedback, even if it runs several seconds
quicker not having to update the display.
Actually, since most of my vertical slice insert queries run in under 2 or 3
seconds, I have a pretty good progress bar already - until I get to the
multi-value split, that is.

Steve Jorgensen wrote:


所有这些都是游泳,并且表演很好,直到我到达包含多个分隔值的
字段。我的整个设计是基于
使用SQL / DML传递所有内容,但我能想出的唯一方法是从查询中调用用户定义的函数
拉出,在连续的查询切片中,参数1,参数2等。我是
将使用where子句来排除null结果(输入没有
参数n或更高),并且第一次传递后退出.RecordsAffected =
0.
有时你可以发明一个预处理器(这次是在
电子表格中),这可能吗?我想探讨一下这个观点但是看看

All that was going swimmingly, and performing pretty well until I got to the
fields containing multiple, delimited values. My whole dedign is based on
using SQL/DML passes for everything, but the only way I could figure out to
make that work was to call a user defined function from within the query to
pull out, in successive query slices, argument 1, argument 2, etc. I was
going to use a where clause to exclude null results (input doesn''t have
arguments n or above), and quit after the first pass with .RecordsAffected =
0.
Sometimes you can invent a preprocessor (this time inside the
spreadsheet), is that possible? I''d like to explore the point but see




实际上,我在电子表格中进行了少量的预处理(仅在
$ b中) $ b前几行)使其成为Access wil导入正确的东西,然后我将
导入到临时数据库中,并链接到Access导入器中的那个

数据库应用程序(也链接到目标后端)。转换,

导入和链接步骤大约需要6或7秒。从那里,它是一个普通的桌子(虽然,所有的文字列),所以我有我选择的DML

或循环来做复杂的部分来自那里。

在小组中做这个小小的价值(吃掉每个人的带宽,就像我倾向于那样冗长)。如果你有时间,请给我发邮件。我的域名不是org,而是nl。



Actually, I do a small amount of pre-processing in the spreadsheet (just in
the top few rows) to make it into something Access wil import properly, then I
import that into a staging database, and link to that from the Access importer
database app (also linked to the target back-end). The transformation,
import, and link steps take about 6 or 7 seconds altogether. From there, it''s
an ordinary table (albeit, with all Text columns), so I have my choice of DML
or looping to do the complicated part from there.
little value doing that in the group (eating up everyones bandwith,
verbose as I tend to be). If you have time, mail me. My domain is not
org but nl.




我不需要那种细节。我真的只是在为这类工作寻找人们对DML和循环的价值的经验/选择。



I don''t need that kind of detail. I was really just looking for people''s
experience/optinions of the values of DML vs looping for this kind of job.


On 2004年1月29日星期四09:44:01 + 0100,Bas Cost Budde< ba*@heuveltop.org>写道:


啊 - 忘记了一点...
On Thu, 29 Jan 2004 09:44:01 +0100, Bas Cost Budde <ba*@heuveltop.org> wrote:

Ah - forgot a point...
行中某处必须执行任何SQL语句
程序性 - SQL发明只有我们人类把注意力集中在什么,而不是如何。
Somewhere in the line any SQL statement must be executed
procedurally--SQL is invented only for us humans to focus attention on
the what, not the how.




嗯,不仅如此,它还不止于此。使用过程代码,每个单独的读取/写入必须经过几个代码层。 SQL引擎被设计为最优化地处理集合操作而不会跨越所有这些层,并且它可以使用覆盖索引(一个隐藏在VB代码中的概念)等等
仍然,对于数据转换操作来说,这些优点可能没有实现,

或更差。



Well, it''s more than that, though. Using procedural code, every single
read/write must go through several code layers. The SQL engine is designed to
handle set operations optimally without crossing all those layers, and it can
use covering indexes (a concept hidden from VB code), etc. Still, it may be
that for a data transformation operation, those advantages are not realized,
or worse.


这篇关于导入/规范化方法 - 列-DML或循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆