为什么Delphi IBX TWideMemoField转换UTF8字符串中的字节顺序,如何避免呢? [英] Why Delphi IBX TWideMemoField converts byte order in UTF8 string and how to avoid it?

查看:103
本文介绍了为什么Delphi IBX TWideMemoField转换UTF8字符串中的字节顺序,如何避免呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Firebird 3数据库上将Delphi 2009和IBX一起使用(我没有选择其他技术的选择,我必须适应这种情况)。我有以下定义:

I am using Delphi 2009 with IBX on Firebird 3 database (I have no choice to choose other technologies, I have to adapt to the situation). I have the following defintions:

Firebird BLOB字段定义为:

Firebird BLOB field is defined as:

BLOB SUB_TYPE 0 SEGMENT SIZE 80

TWideMemoField定义为:

TWideMemoField is defined as:

object MainQryNOTES: TWideMemoField
  FieldName = 'NOTES'
  Origin = 'INVOICES.NOTES'
  ProviderFlags = [pfInUpdate]
  BlobType = ftWideMemo
end

测试字符串为Цельпоинфляции,% ,然后可以从IBExpert软件的BLOB字段中将其读取为:

The test string is "Цель по инфляции, %" and in it can be read from the BLOB field in the IBExpert software as:

26 04 35 04 3B 04 4C 04 20 00 3F 04 3E 04 20 00
38 04 3D 04 44 04 3B 04 4F 04 46 04 38 04 38 04
2C 00 20 00 25 00

奇怪的是,Delphi会反转字节顺序,例如西里尔字母Ц的十六进制UTF8表示为04 26,但它存储在数据库中的格式为26 04,其他字符也是如此(也可以通过 https://www.w3schools.com/charsets/ref_utf_basic_latin.asp https://www.w3schools.com/charsets/ref_utf_cyrillic.asp )。就我而言,我只有2个字节的字符,但我想类似的情况也将是3个字节和4个字节的UTF8字符。

The strange thing is that the Delphi inverts byte order, e.g. cyrillic character Ц has HEX UTF8 representation as 04 26, but it is stored in database as 26 04 and the similar situation is exactly with the other characters as well (one can check this with the help of tables https://www.w3schools.com/charsets/ref_utf_basic_latin.asp and https://www.w3schools.com/charsets/ref_utf_cyrillic.asp). In my case I have only 2-byte charactes, but I guess that the similar situation will be with 3 and 4 byte UTF8 characters as well.

所以-我怎么能配置TWideMemoField询问是否不转换UTF8字符串的字节顺序?

So - how can I configure TWideMemoField to ask not to convert byte order of UTF8 strings?

推荐答案

您的文本未编码为UTF8,而是被编码为UTF16。字符Ц是 U + 0426 。按照惯例,16位代码单元以小尾数字节顺序存储,即$ 26 $ 04。

Your text is not encoded as UTF8, it is encoded as UTF16. The character Ц is U+0426. And by convention the 16 bit code unit is being stored in little endian byte order, $26 $04.

换句话说,一切都按预期方式进行,并且可以看到您不需要尝试修复任何东西,因为没有损坏。

In other words, everything is behaving as expected and as designed and I can see no need for you to try to fix anything because nothing is broken.

这篇关于为什么Delphi IBX TWideMemoField转换UTF8字符串中的字节顺序,如何避免呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆