找出导致错误的行 [英] Find out which row caused the error

查看:171
本文介绍了找出导致错误的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的胖查询,它动态编写以集成一些数据.基本上,它的工作是查询一些表,联接其他表,处理一些数据,然后将其插入到最终表中.

问题在于数据太多,我们不能真正信任数据源,因为可能存在一些错误或不一致的数据.

例如,在使用客户数据库进行开发时,我花了近一个小时来寻找错误,因为在我的大型胖查询中间的某个地方,将某些varchar转换为日期时间时出错.原来,他们的某些销售约会日期为"2009-02-29",这是超出范围的日期. 是的,我知道.为什么将其存储为varchar?好吧,源数据库有3个日期列,分别是月",日"和年".我不知道为什么会这样,但是仍然如此.

但是,如果来源不可靠,我将如何处理呢?

我无法处理异常,我真的需要它与原始消息一起提升到另一个层次,但是我想提供更多信息,以便用户至少可以尝试在致电给我们之前解决它.

因此,我想到了向用户显示行号或一些ID,这至少会使他对他必须更正的记录有所了解.这也是一项艰巨的工作,因为有时集成会运行多达80000条记录. 在80000条记录的集成中,只有一条虚拟错误消息:将varchar数据类型转换为datetime数据类型会导致超出范围的datetime值" 毫无意义. /p>

因此,任何想法都将不胜感激.

哦,我将SQL Server 2005与Service Pack 3一起使用.


好吧,对于我所读的答案,最好的办法是检查可能对引发错误至关重要的每一列,如果它们确实符合条件,我应该自己提出一条错误,并显示以下信息:查找更具描述性的信息,并添加一些可能存储在单独的表中的信息或一些变量,例如行的ID或其他一些根信息.

解决方案

这听起来像是标准的ETL问题:提取,转换和加载. (除非您必须针对同一组数据反复执行此查询,否则,您几乎会一次又一次地执行相同的操作.那么性能有多重要?)

您被允许提供什么样的错误处理和/或不良数据报告"?如果您将所有内容都视为一个大型胖查询",则您的选择将变得非常有限-该查询有效还是无效,或者如果没有,我猜您最多只能得到一条RAISERROR消息来告诉调用者什么.

在这种情况下,我尝试设置的通用框架是:

  • 从源表开始
  • 生成一个临时表集(SQLMenace暂存表),您知道这些表是一致的且格式正确(有效数据,键等)
  • 针对这些表编写不太庞大的查询"

通过这种方式,即使它为空,您也应该总是能够返回(或存储)有效的数据集.诀窍在于确定例程何时失败-数据何时损坏得太厉害以至于无法处理并无法产生所需的结果,因此您将返回措辞正确的错误消息?

I have a big fat query that's written dynamically to integrate some data. Basically what it does is query some tables, join some other ones, treat some data, and then insert it into a final table.

The problem is that there's too much data, and we can't really trust the sources, because there could be some errored or inconsistent data.

For example, I've spent almost an hour looking for an error while developing using a customer's database because somewhere in the middle of my big fat query there was an error converting some varchar to datetime. It turned out to be that they had some sales dating '2009-02-29', an out-of-range date. And yes, I know. Why was that stored as varchar? Well, the source database has 3 columns for dates, 'Month', 'Day' and 'Year'. I have no idea why it's like that, but still, it is.

But how the hell would I treat that, if the source is not trustable?

I can't HANDLE exceptions, I really need that it comes up to another level with the original message, but I wanted to provide some more info, so that the user could at least try to solve it before calling us.

So I thought about displaying to the user the row number, or some ID that would at least give him some idea of what record he'd have to correct. That's also a hard job because there will be times when the integration will run up to 80000 records. And in an 80000 records integration, a single dummy error message: 'The conversion of a varchar data type to a datetime data type resulted in an out-of-range datetime value' means nothing at all.

So any idea would be appreciated.

Oh I'm using SQL Server 2005 with Service Pack 3.


EDIT:

Ok, so for what I've read as answers, best thing to do is check each column that could be critical to raising errors, and if they do attend the condition, I should myself raise an error, with the message I find more descriptive, and add some info that could have been stored in a separate table or some variables, for example the ID of the row, or some other root information.

解决方案

This sounds like a standard ETL issue: Extract, Transform, and Load. (Unless you have to run this query over and over again against the same set of data, in which case you'd pretty much do the same thing, only over and over again. So how critical is performance?)

What kind of error handling and/or "reporting of bad data" are you allowed to provide? If you have everything as "one big fat query", your options become very limited -- either the query works or it doesn't, and if it doesn't I'm guessing you get at best one RAISERROR message to tell the caller what's what.

In a situation like this, the general framework I'd try to set up is:

  • Starting with the source table(s)
  • Produce an interim set of tables (SQLMenace's staging tables) that you know are consistant and properly formed (valid data, keys, etc.)
  • Write the "not quite so big and fat query" against those tables

Done this way, you should always be able to return (or store) a valid data set... even if it is empty. The trick will be in determining when the routine fails -- when is the data too corrupt to process and produce the desired results, so you return a properly worded error message instead?

这篇关于找出导致错误的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆