何时在 SSIS 的 Lookup 组件中选择缓存 [英] when to opt for caching in the Lookup component in SSIS

查看:35
本文介绍了何时在 SSIS 的 Lookup 组件中选择缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 SSIS 查找中有 3 种类型的缓存..完全部分和无缓存.

In SSIS Look up there are 3 types of Caches.. Full Partial and No cache.

在我们的解决方案中,它一直使用默认的(Full).是否有任何特定场景,它可以用于部分缓存/无缓存?在我们的解决方案中,锁定表总是很小(例如:-我们一直在查看小表以获取类型或获取描述).这可能是它在默认(全缓存)模式下配置的原因?

In our solution it has always been used the default one(Full).Is there any particular scenarios,where it can go for Partial cache/No Cache? In our solutions lockup tables are always small(example:-we have been looking at small tables to get the types or to get description).That might be the reason it has been configure in default(Full cache) mode?

请告诉我宝贵的建议和意见.

Please let me know the valuable suggestions and opinion.

推荐答案

让我们介绍一下基础知识.

Let's cover the basics.

全缓存 - 在数据流实际执行之前,全缓存模式下的所有查找组件将针对其源运行查询,并在本地缓存所有数据.一旦数据流开始,这些转换的源系统就可以移除,因为集成服务拥有该时间点的所有数据.

Full Cache - prior to a data flow actually executing, all lookup components in full cache mode will run their query against their source and cache all of that data locally. Once the data flow begins, the source systems for these transformations could be removed as integration services has all the data for that point in time.

部分缓存 - 没有为部分缓存预先缓存数据.对于流经转换的每一行,部分缓存将查看其内部缓存以查看查找键是否已经通过.如果有,则将使用本地副本.否则,将针对引用的系统触发单例查询以查找值.如果您有一个未优化的查找,正在拉回大量数据,非常独特的源密钥,这可能会变得非常昂贵.如果在远程系统中找到匹配项,则该数据将在本地缓存,直到包完成或足够多的新查找生成匹配项并且缓存已满.

Partial Cache - No data is pre-cached for a partial cache. For each row that flows through the transform, a partial cache will look at its internal cache to see whether the lookup key(s) has already been through. If they have, then a local copy will be used. Otherwise, a singleton query is fired against the referenced system to find the value. That can get quite expensive if you have a non-optimized lookup, are pulling back lots of data, very unique source keys. If a match is found in the remote system, then that data will be cached locally until the package completes or enough new lookups have generated match and the cache is full.

无缓存 - 类似于部分缓存,但它将始终对源系统执行查询.即使您的整个导入集只有一个唯一的键值.

No Cache - Similar to Partial Cache but it will always perform a query against the source system. Even if your entire import set has but a single, unique value for the key(s).

除非有特殊原因,否则我会使用完整缓存.

I use a Full cache unless I have a specific reason not to.

在数据仓库中,存在一种称为延迟到达维度的场景.您正在加载一些应该在参考表中具有值的内容,但直到现在您才知道该值存在!一般的解决方法是在加载期间将该值打入参考表中.在完整缓存中,引用该缺失值的每一行都找不到它,然后尝试插入它,这会导致重复.部分缓存/无缓存将在第一次未命中时解决此问题,这将导致插入到引用表中,随后的查找会找到它并将其添加到缓存中.

In a data warehouse, there is a scenario known as a late arriving dimension. You are loading something that should have a value in a reference table but you didn't know the value existed until NOW! The general resolution is to punch that value into the reference table during the load. In a full cache, every row that went to reference that missing value would not find it and then try to insert it which would cause duplicates. A partial/no cache would solve this at the first miss would result in an insert into the reference table and subsequent lookups would find it and add it to the cache.

另一种需要部分/不需要缓存的情况是当我需要范围查询时.我有带有 SurrogateKey|BusinessKey|StartDate|StopDate 的表,我需要查找夹在 StartDate 和 StopDate 之间的 BusinessKey + MyDate.我使用 GUI 将 MyDate 拖到 StartDate,然后在高级编辑器中,我修改现有查询以执行 BETWEEN StartDate 和 StopDate(但当然,我不会使用 BETWEEN)

Another scenario I've had where partial/no cache was needed was when I needed a range query. I had tables with SurrogateKey|BusinessKey|StartDate|StopDate I need to do a lookup for BusinessKey + MyDate sandwiched between StartDate and StopDate. I use the GUI to drag MyDate to StartDate and then in the advanced editor, I modify the existing query to do a BETWEEN StartDate and StopDate (but of course, I wouldn't use BETWEEN)

这篇关于何时在 SSIS 的 Lookup 组件中选择缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆