将base64编码的数据存储为BLOB或TEXT数据类型 [英] Storing base64 encoded data as BLOB or TEXT datatype

查看:110
本文介绍了将base64编码的数据存储为BLOB或TEXT数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个MySQL InnoDB表,其中包含约10列由base64编码的小型javascript文件和以base64编码的png(小于2KB大小)图像.

We have a MySQL InnoDB table holding ~10 columns of small base64 encoded javascript files and png (<2KB size) images base64 encoded as well.

插入的次数较少,读取的次数相对较多,但是输出会在Memcached实例上缓存几分钟,以避免随后的读取.

There are few inserts and a lot of reads comparatively, however the output is being cached on a Memcached instance for some minutes to avoid subsequent reads.

目前,我们在这些列中使用BLOB,但是我想知道从性能或快照备份的角度来看,切换到TEXT数据类型是否有优势.

As it is right now we are using BLOB for those columns, but I am wondering if there is an advantage in switching to TEXT datatype in terms of performance or snapshot backing up.

我的搜索结果表明,我的案例中的BLOBTEXT几乎相同,并且由于我事先不知道实际要存储的数据类型,因此我选择了BLOB.

My search digging indicates that BLOB and TEXT for my case are close to identical and since I do not know before-hand what type of data are actually going to be stored I went for BLOB.

针对此特定情况,您在TEXT vs BLOB辩论中是否有任何指点?

Do you have any pointers on the TEXT vs BLOB debate for this specific case?

推荐答案

一个人不应该在自己的数据库中存储Base64编码的数据...

Base64是仅使用可打印文本字符表示任意二进制数据的一种方式:它设计用于需要跨只能处理可打印文本(例如SMTP/电子邮件)的协议或介质传输此类二进制数据的情况.这样会增加数据大小(增加33%)并增加编码/解码的计算成本,因此除非绝对必要,否则应避免这样做.

One shouldn't store Base64-encoded data in one's database...

Base64 is a means of representing arbitrary binary data using only printable text characters: it was designed for situations where one needs to transfer such binary data across a protocol or medium that can handle only printable-text (e.g. SMTP/email). It increases the data size (by 33%) and adds the computational cost of encoding/decoding, so it should be avoided unless absolutely necessary.

相反, BLOB列的要点是它们存储原始二进制字符串 .因此,只需将您的内容直接存储到您的BLOB列中,而无需首先对其进行Base64编码.通常,您需要将相关的元数据存储在其他列中,例如文件版本/上次修改日期,媒体类型和(对于文本文件,例如JavaScript源)字符编码.您可能决定对文本文件使用TEXT类型的列,不仅使MySQL将为您本地跟踪字符编码,而且还使MySQL可以将其转码为备用字符集和/或检查/处理文本.必需(现在或将来).

By contrast, the whole point of BLOB columns is that they store raw binary strings. So just go ahead and store your stuff directly into your BLOB columns without first Base64-encoding them. Usually you'll want to store related metadata in other columns, such as file version/last modified date, media type, and (in the case of text files, such as JavaScript sources) character encoding. You might decide to use TEXT type columns for the text files, not only so that MySQL will natively track character encoding for you, but also so that it can transcode to alternative character sets and/or inspect/manipulate the text as may be required (now or in the future).

SQL错误的想法是,SQL数据库需要可打印的文本编码(例如Base64)来处理任意二进制数据,这种想法已经被大量不了解情况的教程所延续.这个想法似乎被误认为是错误的信念,因为SQL在其他上下文中仅包含可打印文本,因此它肯定也必须对二进制数据(至少对数据传输(如果不是对数据存储)要求它).事实并非如此:SQL可以通过多种方式传递二进制数据,包括纯字符串文字(前提是它们必须像其他字符串一样正确地被引号和转义).当然,将(任何类型的)数据传递到数据库的首选方法是通过参数化查询,参数可以像其他任何东西一样轻松地包含二进制数据.

The (erroneous) idea that SQL databases require printable-text encodings like Base64 for handling arbitrary binary data has been perpetuated by a large number of ill-informed tutorials. This idea appears to be seated in the mistaken belief that, because SQL comprises only printable-text in other contexts, it must surely require it for binary data too (at least for data transfer, if not for data storage). This is simply not true: SQL can convey binary data in a number of ways, including plain string literals (provided that they are properly quoted and escaped like any other string); of course, the preferred way to pass data (of any type) to your database is through parameterised queries, and parameters can just as easily contain binary data as they can anything else.

出于它的价值,我通常完全避免在RDBMS中存储这样的项目,而宁愿使用那些高度优化的文件存储数据库(称为 filesystems ),但这完全是另一回事.

For what it's worth, I usually altogether avoid storing items like this in the RDBMS and prefer instead to use those highly optimised file storage databases known as filesystems—but that's another matter altogether.

存储Base64编码的数据可能会带来一些好处的唯一情况是,经常从数据库中检索数据并通过需要该协议进行编码的协议进行传输,在这种情况下,存储Base64编码的表示将节省时间.不必在每次提取时都对原始数据执行编码操作.

The only situation in which there might be some benefit from storing Base64-encoded data is where data is frequently retrieved from the database and transmitted across a protocol that requires that encoding—in which case, storing the Base64-encoded representation would save from having to perform the encoding operation on the otherwise raw data upon every fetch.

但是,请注意,从这种意义上讲,Base64编码的存储仅充当缓存,就像出于性能原因可能存储非规范化数据一样.

However, note in this sense that the Base64-encoded storage is merely acting as a cache, much like one might store denormalised data for performance reasons.

如上所述,TEXTBLOB之间的差异实际上归因于TEXT列与特定于文本的元数据(例如字符编码归类),而BLOB列则不是.这些额外的元数据使MySQL可以在存储和连接字符集之间(适当时)对字符进行代码转换,并执行花式字符等效/排序.

As alluded to above, the difference between TEXT and BLOB really comes down to the fact that TEXT columns are stored together with text-specific metadata (such as character encoding and collation), whereas BLOB columns are not. This additional metadata enables MySQL to transcode characters between storage and connection character sets (where appropriate) and perform fancy character equivalence/ordering.

一般来说:如果两个使用不同字符集的客户端应该看到相同的 bytes ,那么您需要一个BLOB列;如果他们应该看到相同的字符,那么您需要一个TEXT列.

Generally speaking: if two clients working in different character sets should see the same bytes, then you want a BLOB column; if they should see the same characters then you want a TEXT column.

使用Base64,这两个客户端必须最终发现数据解码为相同的 bytes ;但是他们应该看到编码后的数据具有相同的字符.例如,假设有人希望插入'Hello world!'(它是'SGVsbG8gd29ybGQh')的Base64编码.如果插入的应用程序正在使用UTF-8字符集,则它将把字节序列0x53475673624738676432397962475168发送到数据库.

With Base64, those two clients must ultimately find that the data decodes to the same bytes; but they should see that the encoded data has the same characters. For example, suppose one wishes to insert the Base64-encoding of 'Hello world!' (which is 'SGVsbG8gd29ybGQh'). If the inserting application is working in the UTF-8 character set, then it will send the byte sequence 0x53475673624738676432397962475168 to the database.

  • 如果该字节序列存储在BLOB列中,然后由在UTF-16 * 中工作的应用程序检索,则相同的字节将返回-表示'升噳扇㡧搲㥹扇全',而不是所需的Base64编码值;而

  • if that byte sequence is stored in a BLOB column and later retrieved by an application that is working in UTF-16*, the same bytes will be returned—which represent '升噳扇㡧搲㥹扇全' and not the desired Base64-encoded value; whereas

如果该字节序列存储在TEXT列中,然后被运行在UTF-16中的应用程序检索,则MySQL将即时进行转码以返回字节序列0x0053004700560073006200470038006700640032003900790062004700510068-表示所需的原始Base64编码值'SGVsbG8gd29ybGQh'.

if that byte sequence is stored in a TEXT column and later retrieved by an application that is working in UTF-16, MySQL will transcode on-the-fly to return the byte sequence 0x0053004700560073006200470038006700640032003900790062004700510068—which represents the original Base64-encoded value 'SGVsbG8gd29ybGQh' as desired.

当然,您仍然可以使用BLOB列并以其他方式跟踪字符编码-但这将不必要地重新发明轮子,从而增加了维护复杂性并带来了意外错误的风险.

Of course, you could nevertheless use BLOB columns and track the character encoding in some other way—but that would just needlessly reinvent the wheel, with added maintenance complexity and risk of introducing unintentional errors.

*实际上,MySQL不支持使用与ASCII字节不兼容的客户端字符集(因此,Base64编码在它们的任何组合中始终保持一致),但是该示例仍可用来说明不同之处在BLOBTEXT列类型之间定位,从而解释了为什么TEXT在技术上是正确的,即使BLOB实际上可以正常工作(至少在MySQL添加对非ASCII兼容的客户端字符集的支持之前).

* Actually MySQL doesn't support using client character sets that are not byte-compatible with ASCII (and therefore Base64 encodings will always be consistent across any combination of them), but this example nevertheless serves to illustrate the difference between BLOB and TEXT column types and thus explains why TEXT is technically correct for this purpose even though BLOB will actually work without error (at least until MySQL adds support for non-ASCII compatible client character sets).

这篇关于将base64编码的数据存储为BLOB或TEXT数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆