在Delphi 7中获取char值 [英] Getting char value in Delphi 7
问题描述
我正在Delphi 7中编写一个程序,该程序应该将unicode字符串编码为html实体字符串.例如,"ABCģķī
"将导致" ABCģķī
"
I am making a program in Delphi 7, that is supposed to encode a unicode string into html entity string.
For example, "ABCģķī
" would result in "ABCģķī
"
现在有2项基本要求:
- Delphi 7是非Unicode的,所以我不能直接在代码中直接编写unichar字符以对其进行编码.
- 代码页包含255个条目,每个条目包含一个特定于该代码页的字符,除了前127个字符外,所有代码页均相同.
所以-如何获取一个介于1-255范围内的char值?
我尝试过 Ord(Integer)
,但它也返回超过255的值.基本上,一切都很好(A返回65等等),直到我的字符串到达非拉丁Unicode为止.
I tried Ord(Integer)
, but it also returns values way past 255. Basically, everything is fine (A returns 65 an so on) until my string reaches non-Latin unicode.
还有其他返回char值的方法吗?任何帮助表示赞赏
Is there any other method for returning char value? Any help appreciated
推荐答案
在HTML 4中,数字字符引用相对于HTML使用的字符集.无论是通过< meta>
标签在HTML本身中指定该字符集,还是通过HTTP/MIME Content-Type
标头或其他方式进行带外指定,没关系.这样,仅在HTML使用UTF的情况下,"ABCģķī"
才是ABCģķī"
的准确表示.-16.如果HTML使用的是UTF-8,则正确的表示将是"ABCģķī"
或"ABCģķÄ& #xAB;"
代替.大多数其他字符集不支持那些特定的Unicode字符.
In HTML 4, numeric character references are relative to the charset used by the HTML. Whether that charset is specified in the HTML itself via a <meta>
tag, or out-of-band via an HTTP/MIME Content-Type
header or other means, it does not matter. As such, "ABCģķī"
would be an accurate representation of "ABCģķī"
only if the HTML were using UTF-16. If the HTML were using UTF-8, the correct representation would be either "ABCģķī"
or "ABCģķī"
instead. Most other charsets do no support those particular Unicode characters.
在HTML 5中,数字字符引用包含原始Unicode代码点值,而与HTML使用的字符集无关.这样,ABCģķī"
将表示为"ABC#291;ķī"
或"ABCģķī"
.
In HTML 5, numeric character references contain original Unicode codepoint values regardless of the charset used by the HTML. As such, "ABCģķī"
would be represented as either "ABC#291;ķī"
or "ABCģķī"
.
因此,要回答您的问题,首先要做的就是确定是否需要使用HTML 4或HTML 5语义来引用数字字符.然后,您需要将Unicode数据分配给使用UTF-16的 WideString
(这是Delphi 7本机支持的唯一Unicode字符串类型),然后:
So, to answer your question, the first thing you have to do is decide whether you need to use HTML 4 or HTML 5 semantics for numeric character references. Then, you need to assign your Unicode data to a WideString
(which is the only Unicode string type that Delphi 7 natively supports), which uses UTF-16, then:
-
如果您需要HTML 4:
if you need HTML 4:
A.如果HTML字符集不是UTF-16,则使用 WideCharToMultiByte()
(或等效方法)将 WideString
转换为该字符集,然后循环遍历结果值,输出未保留的字符保留值的原样和字符引用,对于小数点表示法使用 IntToStr()
,对于十六进制表示法则使用 IntToHex()
.
A. if the HTML charset is not UTF-16, then use WideCharToMultiByte()
(or equivalent) to convert the WideString
to that charset, then loop through the resulting values outputting unreserved characters as-is and character references for reserved values, using IntToStr()
for decimal notation or IntToHex()
for hex notation.
B.如果HTML字符集为UTF-16,则只需遍历 WideString
中的每个 WideChar
,使用原样输出未保留的字符和保留值的字符引用.IntToStr()
用于十进制表示法,或者 IntToHex()
用于十六进制表示法.
B. if the HTML charset is UTF-16, then simply loop through each WideChar
in the WideString
, outputting unreserved characters as-is and character references for reserved values, using IntToStr()
for decimal notation or IntToHex()
for hex notation.
如果您需要HTML 5:
If you need HTML 5:
A.如果 WideString
不包含任何代理对,则只需遍历 WideString
中的每个 WideChar
,按原样输出未保留的字符和字符引用对于保留值,请使用 IntToStr()
十进制表示法,或者将 IntToHex()
十六进制表示法.
A. if the WideString
does not contain any surrogate pairs, then simply loop through each WideChar
in the WideString
, outputting unreserved characters as-is and character references for reserved values, using IntToStr()
for decimal notation or IntToHex()
for hex notation.
B.否则,使用 WideStringToUCS4String()
将 WideString
转换为UTF-32,然后循环遍历结果值,使用 IntToStr()
(十进制表示法)或 IntToHex()
(十六进制表示法).
B. otherwise, convert the WideString
to UTF-32 using WideStringToUCS4String()
, then loop through the resulting values outputting unreserved codepoints as-is and character references for reserved codepoints, using IntToStr()
for decimal notation or IntToHex()
for hex notation.
这篇关于在Delphi 7中获取char值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!