寻找样本以验证UTF-8 [英] looking for samples to validate UTF-8
问题描述
假设我有一个字节流(数组),并且我想编写代码(使用.Net C#)来验证它是否是有效的UTF-8字节序列.我想从头开始编写代码,因为我需要报告存在无效字节序列的确切位置,甚至可能删除无效字节-不仅仅是想获得有关字节流/数组是否有效的肯定或否.
Suppose I have a byte stream (array), and I want to write code (using .Net C#) to validate whether it is valid UTF-8 byte sequence or not. I want to write code from scratch because I need to report the exact location where there is invalid byte sequences and may even remove invalid bytes -- not just want to get yes or no about whether the byte stream/array is valid.
是否有示例代码可供参考?如果没有C#代码,则还希望使用C ++/Java中的简单示例.谢谢!
Are there any sample codes to make reference? If no C# code, simple samples in C++/Java are also appreciated. Thanks!
对于UTF-8的无效字节序列,我的意思是
For the invalid byte sequences of UTF-8, I mean
http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
先谢谢了, 乔治
推荐答案
您需要的是 DecoderFallback .当Encoding
类尝试将字节序列转换为目标编码时,可以指定回退行为:
What you need is DecoderFallback. When the Encoding
class is trying to convert a sequence of bytes to the target encoding, you can specify fallback behaviour:
- Either report error and stop processing.
- Or find the error and replace it.
使用UTF8Encoding
和DecoderReplacementFallback
您可以实现所需的内容.
Using UTF8Encoding
and DecoderReplacementFallback
you can achieve just what you're looking for.
这篇关于寻找样本以验证UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!