如何实现可扩展性 [英] How to achieve scalability

查看:72
本文介绍了如何实现可扩展性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的表(700,000行左右),我想运行一个

流程。但是,我们现在拥有的程序设计为

表,更像是20,000行,并且无法处理它。访问将

总是崩溃才能完成。


一些背景:该过程用于数据清理过程和

旨在将公司名称处理为标准化形式,以便我们可以使用它来确认各种数据集中的数据。该程序需要输入

,例如,The Yummy and Tasty Waffle Corporation等。或者

" Yummy& Tasty Waffle,Incorporated然后将两者变成TASTYWAFFLE。

我们可以在这个字段上与其他人一起排序,链接和过滤等

以查看是否有重复或者检查具有不同

ID的公司是否实际上是同一家公司。


具体来说,另一个程序需要指定的表和字段和

创建一个新字段,用原始字段的内容填充它。

该程序然后将DAO记录集和新字段的名称

传递给主程序然后执行11次操作以达到

TASTYWAFFLE阶段。填补这个新领域显然是双重工作,并且我将把这部分工作完成。


现在我的具体问题:目前,程序需要整个现场的整个

内容,并使用大量的InStr,Mid,Left和

正确的函数来执行所有操作,然后继续

下一行。直到用字符读字符似乎更直接

我有一个完整的单词(即我打了一个空格或其他分隔符),处理

那个位,然后继续前进到了该领域的下一部分。这些

方法中的哪一种更有效?另外,在哪里可以找到一些关于编写最具可伸缩性的VBA代码的指南?我知道Access有

的限制,但是我想受到那些限制,而不是我们自己的限制。


提前致谢,


Carlos

解决方案

7月1日,8:45 * pm, Carlos Nunes-Ueno < sulla ... @ athotmaildot.com>

写道:


我有一个相当大的表(700,000行左右)我想运行一个

流程。 *但是,我们现在的程序设计为

表更像20,000行并且无法处理它。 *访问将

总是崩溃才能完成。


一些背景:该程序用于数据清理过程和

旨在将公司名称处理为标准化形式,以便我们可以使用它来确认各种数据集中的数据。 *该程序需要

输入,例如,The Yummy and Tasty Waffle Corporation等。或者

" Yummy& Tasty Waffle,Incorporated并将两者变成TASTYWAFFLE。 *

然后我们可以在这个字段上与其他人一起排序,链接和过滤等等

以查看是否有重复项或检查是否有不同的公司/>
ID实际上是同一家公司。


具体来说,另一个程序需要一个指定的表和字段,

创建一个新字段,用原始字段的内容填充它。

该程序然后将DAO记录集和新字段的名称

传递给主程序,然后执行11操作到达

TASTYWAFFLE阶段。 *填写新领域显然是双重工作和

我将把这一部分搞定。


现在我的具体问题:目前,程序一次取整个字段的整个

内容,并使用大量的InStr,Mid,Left和

正确的函数来执行所有操作,然后继续到下一行

。 *看起来更直接的是只读字符直到

我有一个完整的单词(即我打了一个空格或其他分隔符),处理

那个位,然后移动到了该领域的下一部分。 *这些

方法中的哪一种更有效? *另外,在哪里可以找到关于编写最具可扩展性的VBA代码的一些

指南? *我知道Access有

限制,但我希望受到限制,而不是我们自己的限制。


提前致谢,


Carlos



你所说的功能陈旧而陈旧;他们总是低效率。我的猜测是正则表达式可以解决你的问题
问题并且数百,可能比直接VBA代码快几千倍。当然,那些不知道正则表达式的人可能不同意。毫无疑问,最初的正则表达式可能会让人感到困惑。但是一点点的工作和一点点的耐心可以导致简单的单调乏味,而且不可能,只有挑战。


谢谢,莱尔。我不知道VBA中的正则表达式是否全部可用,但经过一些检查后,我看到了VBS

正则表达式库的参考资料。 />

所以,基本上,你建议我现在像

那样单步执行记录集,但是使用常规的探索对象来执行操作

的VBA功能?或者是否有更有效的方式?


谢谢,


Carlos


lyle fairfield < ly ************ @ gmail.comwrote在

新闻:21 ******************* *************** @ 25g2000h sx.googlegroups.com:


您命名的功能陈旧且陈旧;他们总是低效率。我的猜测是正则表达式可以解决你的问题
问题并且数百,可能比直接VBA代码快几千倍。当然,那些不知道正则表达式的人可能不同意。毫无疑问,最初的正则表达式可能会让人感到困惑。但是一点点的工作和一点点的耐心可以导致简单的单调乏味,而且不可能,只有挑战。


< blockquote> 2008年7月2日星期三00:45:11 +0000(UTC),Carlos Nunes-Ueno

< su ****** @ athotmaildot.comwrote:


这是一个真实的例子吗?似乎难以提出一致的转换美味和美味的华夫饼干公司的规则

规则。进入

TASTYWAFFLE。类似于取第4和第5个字,并省略

空格?你能告诉我们你用什么样的规则进行

转换吗?


我用Ratcliff / Obershelp算法取得了很好的成功

返回两个字符串之间的相似性(0到1之间的数字)。我检查了
,对于你的两个公司名称,相似度为0.75。使用

一些截止值,你可以缩小最相似的公司和

将它们捆绑起来。

我们最近实现了这个算法SQL中的.NET组装

Server 2005,它非常快。 10,000次比较不到

1秒。


我完全不相信RegEx是这里的门票。

-Tom。


>我有一个相当大的表(700,000行左右)我想运行
过程。但是,我们现在的程序设计有更多像20,000行的表,并且无法处理它。访问将在完成之前始终崩溃。

一些背景:该过程用于数据清理过程,
旨在将公司名称处理为标准化形式,这样我们就可以用它来确认各种数据集中的数据。该程序需要输入,例如,The Yummy and Tasty Waffle Corporation等。或者是美国和美国Tasty Waffle,Incorporated然后将它们变成TASTYWAFFLE。
然后我们可以在这个字段上与其他人一起排序,链接和过滤等等,以查看是否存在重复或检查是否有不同的公司ID实际上是同一家公司。

具体来说,另一个过程采用指定的表和字段,并创建一个新字段,用原始字段的内容填充它。然后,该过程将DAO记录集和新字段的名称传递给主程序,然后主程序执行11次操作以到达
TASTYWAFFLE。阶段。填补新领域显然是双重工作,并且我将把这一部分搞定。

现在针对我的具体问题:目前,该程序需要完整的内容该字段一次,并使用大量的InStr,Mid,Left和
Right函数来执行所有操作,然后转到
下一行。直到用字符读取字符似乎更直接
我有一个完整的单词(即我打了一个空格或其他分隔符),处理那个位,然后继续前进到字段的下一部分。以下哪种方法更有效?另外,在哪里可以找到关于编写最具可扩展性的VBA代码的一些指南?我知道Access有限制,但我希望受到这些限制,而不是我们自己的低效率。

提前致谢,

Carlos


I have a fairly large table (700,000 rows or so) that I''d like to run a
process. However, the procedure we have right now was designed with
tables of more like 20,000 rows and isn''t able to handle it. Access will
always crash before it can complete.

Some background: the procedure is used in the process of data cleanup and
is designed to process company names into a standardized form, so that we
can use it to confirm data across various datasets. The procedure takes
input like, for example, "The Yummy and Tasty Waffle Corporation" or
"Yummy & Tasty Waffle, Incorporated" and turns both into "TASTYWAFFLE".
We can then sort, link, and filter, etc. on this field along with others
to see if there are duplicates or check if companies that have different
IDs are in fact the same company.

Specifically, another procedure takes a specified table and field and
creates a new field, filling it with the contents of the original field.
That procedure then passes a DAO recordset and the name of the new field
to the main procedure which then performs 11 operations to arrive at the
"TASTYWAFFLE" stage. Filling the new field is clearly double work and
I''ll be triming that part out.

Now for my specific questions: Currently, the procedure takes the whole
contents of the field at once, and uses a lot of InStr, Mid, Left, and
Right functions to perform all of the operations, then moves on to the
next row. It seems more direct to just read character by charater until
I have a complete word (i.e. I hit a space or other delimiter), process
that bit, then move on to the next part of the field. Which of these
approaches is more efficient? Also, where could I go to find some
guidelines on writing the most scalable VBA code? I know Access has
limitations, but I''d like to be limited by those and not by our own
inefficiencies.

Thanks in advance,

Carlos

解决方案

On Jul 1, 8:45*pm, "Carlos Nunes-Ueno" <sulla...@athotmaildot.com>
wrote:

I have a fairly large table (700,000 rows or so) that I''d like to run a
process. *However, the procedure we have right now was designed with
tables of more like 20,000 rows and isn''t able to handle it. *Access will
always crash before it can complete.

Some background: the procedure is used in the process of data cleanup and
is designed to process company names into a standardized form, so that we
can use it to confirm data across various datasets. *The procedure takes
input like, for example, "The Yummy and Tasty Waffle Corporation" or
"Yummy & Tasty Waffle, Incorporated" and turns both into "TASTYWAFFLE". *
We can then sort, link, and filter, etc. on this field along with others
to see if there are duplicates or check if companies that have different
IDs are in fact the same company.

Specifically, another procedure takes a specified table and field and
creates a new field, filling it with the contents of the original field.
That procedure then passes a DAO recordset and the name of the new field
to the main procedure which then performs 11 operations to arrive at the
"TASTYWAFFLE" stage. *Filling the new field is clearly double work and
I''ll be triming that part out.

Now for my specific questions: Currently, the procedure takes the whole
contents of the field at once, and uses a lot of InStr, Mid, Left, and
Right functions to perform all of the operations, then moves on to the
next row. *It seems more direct to just read character by charater until
I have a complete word (i.e. I hit a space or other delimiter), process
that bit, then move on to the next part of the field. *Which of these
approaches is more efficient? *Also, where could I go to find some
guidelines on writing the most scalable VBA code? *I know Access has
limitations, but I''d like to be limited by those and not by our own
inefficiencies.

Thanks in advance,

Carlos

The functions you name are old and stale; they were always
inefficient. My guess is that Regular Expressions would solve your
problem and be hundreds, possibly thousands of times faster than
straight VBA code. Of course, those who don''t know Regular Expressions
may disagree. No doubt, initially Regular Expressions can be
bewildering. But a little work and a modicum of patience can result in
the tedious being made simple, and the impossible, only challenging.


Thanks, Lyle. I didn''t know that regular expressions were available at all
in VBA, but after some checking around I see the reference for the VBS
regular expressions library.

So, basically, you''re recommending that I step through the recordset like
now but using regular expessions objects to perform the operations instead
of the VBA functions? Or is there an even more efficient way?

Thanks,

Carlos

lyle fairfield <ly************@gmail.comwrote in
news:21**********************************@25g2000h sx.googlegroups.com:

The functions you name are old and stale; they were always
inefficient. My guess is that Regular Expressions would solve your
problem and be hundreds, possibly thousands of times faster than
straight VBA code. Of course, those who don''t know Regular Expressions
may disagree. No doubt, initially Regular Expressions can be
bewildering. But a little work and a modicum of patience can result in
the tedious being made simple, and the impossible, only challenging.


On Wed, 2 Jul 2008 00:45:11 +0000 (UTC), "Carlos Nunes-Ueno"
<su******@athotmaildot.comwrote:

Was that a real example? It seems difficult to come up with consistent
rules that convert "The Yummy and Tasty Waffle Corporation" into
"TASTYWAFFLE". Something like "Take the 4th and 5th word, and omit the
spaces"? Can you tell us what kinds of rules you''re applying for the
conversion?

I have had good success with the Ratcliff/Obershelp algorithm that
returns a similarity (a number between 0 and 1) between two strings. I
checked and for your two company names the similarity is 0.75. Using
some cutoff value you can narrow down the most similar companies and
bunch them up that way.
We recently implemented this algorithm as a .Net assembly in SQL
Server 2005, and it is very fast. 10,000 comparisons in way less than
1 second.

I''m not at all convinced RegEx is the ticket here.

-Tom.

>I have a fairly large table (700,000 rows or so) that I''d like to run a
process. However, the procedure we have right now was designed with
tables of more like 20,000 rows and isn''t able to handle it. Access will
always crash before it can complete.

Some background: the procedure is used in the process of data cleanup and
is designed to process company names into a standardized form, so that we
can use it to confirm data across various datasets. The procedure takes
input like, for example, "The Yummy and Tasty Waffle Corporation" or
"Yummy & Tasty Waffle, Incorporated" and turns both into "TASTYWAFFLE".
We can then sort, link, and filter, etc. on this field along with others
to see if there are duplicates or check if companies that have different
IDs are in fact the same company.

Specifically, another procedure takes a specified table and field and
creates a new field, filling it with the contents of the original field.
That procedure then passes a DAO recordset and the name of the new field
to the main procedure which then performs 11 operations to arrive at the
"TASTYWAFFLE" stage. Filling the new field is clearly double work and
I''ll be triming that part out.

Now for my specific questions: Currently, the procedure takes the whole
contents of the field at once, and uses a lot of InStr, Mid, Left, and
Right functions to perform all of the operations, then moves on to the
next row. It seems more direct to just read character by charater until
I have a complete word (i.e. I hit a space or other delimiter), process
that bit, then move on to the next part of the field. Which of these
approaches is more efficient? Also, where could I go to find some
guidelines on writing the most scalable VBA code? I know Access has
limitations, but I''d like to be limited by those and not by our own
inefficiencies.

Thanks in advance,

Carlos


这篇关于如何实现可扩展性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆