0xF8是UTF-8编码的XML文档中的有效字节吗? [英] Is 0xF8 a valid byte in a UTF-8 encoded XML document?

查看:397
本文介绍了0xF8是UTF-8编码的XML文档中的有效字节吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到一个声称是UTF-8(<?xml version="1.0" encoding="UTF-8"?>)的文档.过去,我曾遇到过一些问题,其中发件人的编码声明并不是那么可靠(例如,文档实际上被声明为具有给定的编码,而实际上却并非如此),所以我尝试使用 http://utf8checker.codeplex.com/根据此工具,0xF8字节表示此文档不是UTF-8编码的.

I am receiving a document that claims to be UTF-8 (<?xml version="1.0" encoding="UTF-8"?>). I've had some problems in the past where the encoding declaration from the sender has not been all that reliable (i.e. documents are declared to have a given encoding when in fact they do not), so I try to check using http://utf8checker.codeplex.com/ According to this tool, a 0xF8 byte means that this document is not UTF-8 encoded.

相反,此页面将挪威字符ø"列为在UTF-8中表示为0xF8. (该页面使用挪威语,但是,我指的数据来自页面底部的表.)

However, to the contrary, this page lists the Norwegian character 'ø' as being represented in UTF-8 as 0xF8. (The page is in Norwegian, however, the data I am referring to stems from the table at the bottom of the page.)

有人可以帮我解决这个问题吗?我在这里感到很困惑.

Can anyone help me sort this out? I'm feeling rather confused here.

谢谢!

推荐答案

ø是U + 00F8,由于它不是ASCII格式,因此不能是单个UTF-8代码单元.它由UTF-8中的 0xC3 0xB8表示.因此,如果您将0xF8单独放置在某处的文档中,是的,它是无效的UTF-8.

ø is U+00F8 and since it is not in ASCII it cannot be a single UTF-8 code unit. It is represented by 0xC3 0xB8 in UTF-8. Therefore, if you have 0xF8 standing alone in a document somewhere, yes, it is invalid UTF-8.

该文档似乎使用了Latin-1或Windows代码页1252.

It seems that the document uses either Latin-1 or the Windows code page 1252.

这篇关于0xF8是UTF-8编码的XML文档中的有效字节吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆