SQL原型设计:使用varchar(N)面向静默截断数据 - 任何更好的选择? (Teradata) [英] SQL prototype design: facing silent truncation of data using varchar(N) -- any better alternatives? (Teradata)

查看:175
本文介绍了SQL原型设计:使用varchar(N)面向静默截断数据 - 任何更好的选择? (Teradata)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

情况



varchar(20)似乎 truncate在Teradata中静默地,而会在遇到长度超过20个字符的字符串时展开或投诉...这是一个惊喜,因为我希望自动扩展列以适应更大的字符串,例如30个字符,如果遇到较大的字符串,则返回错误。沉默的截断似乎让我成为世界上最差的一个...



并发症:



对于我的应用程序(原型分析设计),我不知道我将在几周内吸收的数据有多大。这似乎排除了使用varchar(N),除了最大值



问题:



现在我有几个选择,我正在寻找一些指导:



Q1。用户错误?我误解了关于 varchar(N)的关键概念?



如果这实际上是Teradata处理 varchar 字段,然后



Q2。为什么任何人都不能指定任何小于 varchar(max)的事情,特别是如果事先不明确,可能需要在该字段中存储多少个字符。



Q3。是否有不同的数据类型,允许灵活的字符串大小 - 即一个真正的变量长度字符串?



如果我记得,其他SQL方言实现 varchar(n)作为字符串的推荐初始大小,但允许它根据需要扩展以适应投入的数据字符串的最大长度。是否有Teradata中的类似数据类型?



(注意:由于我是原型表,所以我不太在意这个时候的性能效率;更关心快速但安全的设计允许原型进行。)

解决方案

我不熟悉任何实现varchar(n)的SQL方言按照您的建议行为 - 推荐的初始尺寸,然后让其增长。这将适用于Oracle,SQL Server,MySQL和Postgres。在所有这些数据库中,varchar(n)的行为几乎与您在Teradata中在具有显式转换的SELECT语句中的行为相似。当一个较长的字符串被放入一个较短的字符串中时,我不相信会导致截断错误。



正如Branko在他的评论中注意到的,数据修改的行为是不同的步骤,隐式转换确实导致错误。



我不熟悉Teradata的所有细节。在SQL Server中,varchar(max)和varchar(8000)之间存在一个差异的世界。前者将分配在单独的数据页面上,后者分配在与数据相同的页面上。 (这些规则已经在更新的版本中被修改,所以varchars可以溢出数据页面。)



换句话说,当使用varchar(max) ,涉及数据如何存储在页面上,如何构建索引,以及其他注意事项。



我的建议是,你选择一个相当大的大小,说1000等等,让应用程序从那里继续。如果你想要真正的灵活性,那么使用varchar(max)。您还应该通过Teradata文档和/或技术联系人调查问题是如何声明非常大的字符串。


Situation:

varchar(20) seems to truncate silently in Teradata and not to expand or complain when encountering strings larger than 20 characters long... This is a bit of a surprise as I expected either automatic expansion of the column to fit larger strings, say 30 characters, OR for an error to be thrown if a larger string were encountered. Silent truncation seems to get me the worst of all worlds...

Complication:

For my application (prototype analytics design) I don't know in advance how large will be the data I will be ingesting over the course of a few weeks. That seems to rule out using varchar(N), except for max

Questions:

So now I have a few choices, and am looking for some guidance:

Q1. User error? Am I misunderstanding a key concept about varchar(N)?

If this is in fact how Teradata handles varchar fields, then

Q2. why would anyone specify anything less than varchar(max) especially when it is not clear in advance how many characters might need to be stored in the field.

Q3. Is there a different data type that permits flexible sizing of the string -- i.e. a true variable length character string?

If I recall, other SQL dialects implement varchar(n) as a recommended initial size for the string but allow it to expand as needed to fit the maximum length of the data strings thrown in. Is there a similar data type in Teradata?

(Note: since I'm prototyping the tables, I am less concerned about performance efficiency at this point; more concerned about quick but safe designs that allow the prototype to progress.)

解决方案

I am not familiar with any dialect of SQL that implements a varchar(n) that behaves as you suggest -- a recommended initial size and then letting it grow. This would apply to Oracle, SQL Server, MySQL, and Postgres. In all these databases, varchar(n) behaves pretty much as you see it behave in Teradata in SELECT statements with explicit casts. I don't believe any cause a truncation error when a longer string is placed into a shorter string.

As Branko notes in his comment, the behavior is different in data modification steps, where an implicit cast does cause an error.

I am not familiar with all the details of Teradata. In SQL Server, there is historically a world of difference between varchar(max) and varchar(8000). The former would be allocated on a separate data page, with the latter allocated on the same page as the data. (The rules have been modified in more recent versions so varchars can spill off the data page.)

In other words, there may be other considerations when using varchar(max), involving how the data is stored on pages, how indexes are built on them, and perhaps other considerations.

My suggestion is that you pick a reasonably large size, say 1000 or so, and let the application continue from there. If you want real flexibility, then use varchar(max). You should also investigate through Teradata documentation and/or technical contacts what the issues are with declaring very large strings.

这篇关于SQL原型设计:使用varchar(N)面向静默截断数据 - 任何更好的选择? (Teradata)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆