对所有基于文本的字段使用通用varchar(255)有不利之处吗? [英] Are there disadvantages to using a generic varchar(255) for all text-based fields?

查看:153
本文介绍了对所有基于文本的字段使用通用varchar(255)有不利之处吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个contacts表,其中包含诸如postcodefirst namelast nametowncountryphone number等的字段,所有这些均被定义为VARCHAR(255)尽管这些字段都不会接近255个字符. (如果您想知道,是因为Ruby on Rails迁移默认情况下将String字段映射到VARCHAR(255),而我从不费心重写它).

由于VARCHAR将仅存储字段的实际字符数(以及字段长度),因此使用VARCHAR(16)而不是VARCHAR(255)有什么明显的优势(性能还是其他方面)?

此外,大多数这些字段都具有索引.字段上较大的VARCHAR大小是否会完全影响索引的大小或性能?

仅供参考,我正在使用MySQL 5.

解决方案

在存储中,VARCHAR(255)足够聪明,可以仅存储给定行上所需的长度,而CHAR(255)则总是存储255个字符. /p>

但是由于您使用MySQL标记了这个问题,所以我将提到一个MySQL特定的技巧:当将行从存储引擎层复制到SQL层时,VARCHAR字段将转换为CHAR以获取优势使用固定宽度的行.因此,内存中的字符串会填充到声明的VARCHAR列的最大长度.

当查询隐式生成临时表时(例如在排序或GROUP BY时),这会占用大量内存.如果您对不需要太长的数据使用大量的VARCHAR(255)字段,这会使临时表变得非常大.

您可能还想知道,这种填充"行为意味着用utf8字符集声明的字符串每个字符填充了三个字节,即使您存储的是单字节内容的字符串(例如ascii或latin1字符) .同样,utf8mb4字符集会导致字符串在内存中每个字符填充为四个字节.

因此utf8中的VARCHAR(255)存储一个短字符串(如无意见")在磁盘上需要11个字节(十个低字符字符,再加上一个字节的长度),但是在内存中需要765个字节,因此在临时表中或排序结果.

我已经帮助不知不觉地频繁创建1.5GB临时表并填满磁盘空间的MySQL用户.他们有很多VARCHAR(255)列,实际上它们存储的字符串很短.

最好根据要存储的数据类型定义列.如其他人所提到的,强制执行与应用程序相关的约束很有好处.但是它具有物理上的好处,可以避免上述的内存浪费.

当然,很难知道最长的邮政地址是什么,这就是为什么很多人选择长于任何地址的长VARCHAR的原因.通常使用255,因为它是VARCHAR的最大长度,该长度可以用一个字节编码.这也是MySQL早于5.0的最大VARCHAR长度.

I have a contacts table which contains fields such as postcode, first name, last name, town, country, phone number etc, all of which are defined as VARCHAR(255) even though none of these fields will ever come close to having 255 characters. (If you're wondering, it's this way because Ruby on Rails migrations map String fields to VARCHAR(255) by default and I never bothered to override it).

Since VARCHAR will only store the number of actual characters of the field (along with the field length), is there any distinct advantage (performance or otherwise) to using, say, VARCHAR(16) over VARCHAR(255)?

Additionally, most of these fields have indexes on them. Does a larger VARCHAR size on the field affect the size or performance of the index at all?

FYI I'm using MySQL 5.

解决方案

In storage, VARCHAR(255) is smart enough to store only the length you need on a given row, unlike CHAR(255) which would always store 255 characters.

But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: as rows are copied from the storage engine layer to the SQL layer, VARCHAR fields are converted to CHAR to gain the advantage of working with fixed-width rows. So the strings in memory become padded out to the maximum length of your declared VARCHAR column.

When your query implicitly generates a temporary table, for instance while sorting or GROUP BY, this can use a lot of memory. If you use a lot of VARCHAR(255) fields for data that doesn't need to be that long, this can make the temporary table very large.

You may also like to know that this "padding out" behavior means that a string declared with the utf8 character set pads out to three bytes per character even for strings you store with single-byte content (e.g. ascii or latin1 characters). And likewise utf8mb4 character set causes the string to pad out to four bytes per character in memory.

So a VARCHAR(255) in utf8 storing a short string like "No opinion" takes 11 bytes on disk (ten lower-charset characters, plus one byte for length) but it takes 765 bytes in memory, and thus in temp tables or sorted results.

I have helped MySQL users who unknowingly created 1.5GB temp tables frequently and filled up their disk space. They had lots of VARCHAR(255) columns that in practice stored very short strings.

It's best to define the column based on the type of data that you intend to store. It has benefits to enforce application-related constraints, as other folks have mentioned. But it has the physical benefits to avoid the memory waste I described above.

It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR that is certainly longer than any address. And 255 is customary because it is the maximum length of a VARCHAR for which the length can be encoded with one byte. It was also the maximum VARCHAR length in MySQL older than 5.0.

这篇关于对所有基于文本的字段使用通用varchar(255)有不利之处吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆