寻找样本以验证UTF-8 [英] looking for samples to validate UTF-8

查看:99
本文介绍了寻找样本以验证UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个字节流(数组),并且我想编写代码(使用.Net C#)来验证它是否是有效的UTF-8字节序列.我想从头开始编写代码,因为我需要报告存在无效字节序列的确切位置,甚至可能删除无效字节-不仅仅是想获得有关字节流/数组是否有效的肯定或否.

Suppose I have a byte stream (array), and I want to write code (using .Net C#) to validate whether it is valid UTF-8 byte sequence or not. I want to write code from scratch because I need to report the exact location where there is invalid byte sequences and may even remove invalid bytes -- not just want to get yes or no about whether the byte stream/array is valid.

是否有示例代码可供参考?如果没有C#代码,则还希望使用C ++/Java中的简单示例.谢谢!

Are there any sample codes to make reference? If no C# code, simple samples in C++/Java are also appreciated. Thanks!

对于UTF-8的无效字节序列,我的意思是

For the invalid byte sequences of UTF-8, I mean

http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

先谢谢了, 乔治

推荐答案

您需要的是 DecoderFallback .当Encoding类尝试将字节序列转换为目标编码时,可以指定回退行为:

What you need is DecoderFallback. When the Encoding class is trying to convert a sequence of bytes to the target encoding, you can specify fallback behaviour:

  • Either report error and stop processing.
  • Or find the error and replace it.

使用UTF8EncodingDecoderReplacementFallback您可以实现所需的内容.

Using UTF8Encoding and DecoderReplacementFallback you can achieve just what you're looking for.

这篇关于寻找样本以验证UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆