是否有一个ASCII扩展编码列表？ [英] Is there a list of ASCII-extending encodings?

查看：224 发布时间：2016/11/19 14:20:50 character-encoding ascii

本文介绍了是否有一个ASCII扩展编码列表？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要决定何时（不是）根据已知的文件编码和所需的输出编码转换文本文件。

如果文本是US-ASCII ，我不需要转换它，如果输出编码是ASCII，UTF-8，Latin1，...

显然，我需要将US-ASCII文件转换为UTF-16或UTF-32

标准编码列表存在于

http://www.iana.org/assignments/character-sets/character-sets.xml

如果符合以下条件，则必须进行转换：

最小字符大小> 1字节或

前127个代码点与US-ASCII不相同。

我想知道：

是否有类似的列表，包含有关每个编码的实现的详细信息（bytelenght，ASCII兼容性）？
- 我很高兴一个只包含 Qt5支持的编解码器。

< >

EDIT

我已经找到问题的答案

是否所有8位或8位的编解码器都是ASCII的超集？
- 字词：US-ASCII可以解释为任何8或8位编码吗？

此处：字符集这不是ASCII的超集

相反，这将有助于知道：

有一个字符集列表，它们是ASCII的超集。

这看起来很有前景：

mime.charsets - 是ASCII超集的字符集列表，

，但我找不到实际的mime.charsets档案。

解决方案

解码给定编码中的字节0x00 - 0x7F，并检查字符是否与ASCII匹配。例如，在Python 3.x中：

  def is_ascii_superset（encoding）：
代码范围：
 if bytes（[codepoint]）。decode（encoding，'ignore'）！= chr（codepoint）：
 return False 
 return True 
  
 
 
 这给出：
 > > is_ascii_superset（'US-ASCII'）
 true 
>>> is_ascii_superset（'windows-1252'）
 True 
>>> is_ascii_superset（'ISO-8859-15'）
 True 
>>> is_ascii_superset（'UTF-8'）
 True 
>>> is_ascii_superset（'UTF-16'）
 False 
>>> is_ascii_superset（'IBM500'）＃EBCDIC的变体
 False 
  
 
 
 
 
 
   EDIT：获取C ++中的Qt版本支持的每个编码的US-ASCII兼容性：
 code> #include< QTextCodec> 
 #include< QMap> 
 
 typedef enum 
 {
 eQtCodecUndefined，
 eQtCodecAsciiIncompatible，
 eQtCodecAsciiCompatible，
} tQtCodecType; 
 
 QMap< QByteArray，tQtCodecType> QtCodecTypes（）
 {
 QMap< QByteArray，tQtCodecType> CodecTypes; 
 //如何测试Qt对ASCII数据的解释？ 
 QList< QByteArray> available = QTextCodec :: availableCodecs（）; 
 QTextCodec * referenceCodec = QTextCodec :: codecForName（UTF-8）; //因为Qt没有US-ASCII，但我们只测试字节0-127和UTF-8是US-ASCII的超集
 if（referenceCodec == 0）
 {
 qDebug （Unable to get reference codec'UTF-8'）; 
 return CodecTypes; 
} 
 for（int i = 0; i  {
 const QByteArray name = available.at（i）; 
 QTextCodec * currCodec = QTextCodec :: codecForName（name）; 
 if（currCodec == NULL）
 {
 qDebug（Unable to get codec for'％s'，qPrintable（QString（name）））; 
 CodecTypes.insert（name，eQtCodecUndefined）; 
 continue; 
} 
 tQtCodecType type = eQtCodecAsciiCompatible; 
 for（uchar j = 0; j <128; j ++）// UTF-8 == US-ASCII在低7位
 {
 const char c = ; // character to test< 2 ^ 8 
 QString sRef，sTest; 
 sRef = referenceCodec-> toUnicode（& c，1）; //将字符转换为UTF-16（QString内部），假设它是ASCII（通过UTF-8）
 sTest = currCodec-> toUnicode（& c，1）; //将字符转换为UTF-16，假设它是类型[currCodec] 
 if（sRef！= sTest）//比较两个UTF-16表示 - >如果它们相等，这些编解码器对于Qt 
 {
 type = eQtCodecAsciiIncompatible; 
 break; 
} 
} 
 CodecTypes.insert（name，type）; 
} 
 
 return CodecTypes; 
} 
  
 
I need to decide when (not) to convert a text file based on the known file encoding and the desired output encoding.

If the text is US-ASCII, I don't need to convert it if the output encoding is ASCII, UTF-8, Latin1, ...

Obviously I need to convert a US-ASCII file to UTF-16 or UTF-32.

A list of standard encodings exists at

http://www.iana.org/assignments/character-sets/character-sets.xml

A conversion is necessary if:


the minimal character size is > 1 byte or
the first 127 code points are not the same as US-ASCII.


I'd like to know:


Is there a similar list with details (bytelenght, ASCII-compatibility) about the implementation of each encoding?

I'd be happy about a list containing only codecs supported by Qt5.





EDIT

I already found an answer to the question


Are all 8-or-variable8-bit-based codecs a superset of ASCII?

In other words: Can US-ASCII be interpreted as any 8-or-variable8-bit-based encoding?



here: Character set that is not a superset of ASCII

Instead, it would be helpful to know:


Is there a list of character sets which are supersets of ASCII?


This looks promising:

mime.charsets - list of character sets which are ASCII supersets,

but I couldn't find an actual mime.charsets file.
 解决方案 
An alternative approach is to decode the bytes 0x00 - 0x7F in the given encoding, and check that the characters match ASCII.  For example, in Python 3.x:
def is_ascii_superset(encoding):
    for codepoint in range(128):
       if bytes([codepoint]).decode(encoding, 'ignore') != chr(codepoint):
           return False
    return True
This gives:
>>> is_ascii_superset('US-ASCII')
True
>>> is_ascii_superset('windows-1252')
True
>>> is_ascii_superset('ISO-8859-15')
True
>>> is_ascii_superset('UTF-8')
True
>>> is_ascii_superset('UTF-16')
False
>>> is_ascii_superset('IBM500') # a variant of EBCDIC
False




EDIT: Get US-ASCII compatibility for each encoding supported by your Qt version in C++:
#include <QTextCodec>
#include <QMap>

typedef enum
{
    eQtCodecUndefined,
    eQtCodecAsciiIncompatible,
    eQtCodecAsciiCompatible,
} tQtCodecType;

QMap<QByteArray, tQtCodecType> QtCodecTypes()
{
    QMap<QByteArray, tQtCodecType> CodecTypes;
    // How to test Qt's interpretation of ASCII data?
    QList<QByteArray> available = QTextCodec::availableCodecs();
    QTextCodec *referenceCodec = QTextCodec::codecForName("UTF-8"); // because Qt has no US-ASCII, but we only test bytes 0-127 and UTF-8 is a superset of US-ASCII
    if(referenceCodec == 0)
    {
        qDebug("Unable to get reference codec 'UTF-8'");
        return CodecTypes;
    }
    for(int i = 0; i < available.count(); i++)
    {
        const QByteArray name = available.at(i);
        QTextCodec *currCodec = QTextCodec::codecForName(name);
        if(currCodec == NULL)
        {
            qDebug("Unable to get codec for '%s'", qPrintable(QString(name)));
            CodecTypes.insert(name, eQtCodecUndefined);
            continue;
        }
        tQtCodecType type = eQtCodecAsciiCompatible;
        for(uchar j = 0; j < 128; j++) // UTF-8 == US-ASCII in the lower 7 bit
        {
            const char c = (char)j; // character to test < 2^8
            QString sRef, sTest;
            sRef = referenceCodec->toUnicode(&c, 1); // convert character to UTF-16 (QString internal) assuming it is ASCII (via UTF-8)
            sTest = currCodec->toUnicode(&c, 1); // convert character to UTF-16 assuming it is of type [currCodec]
            if(sRef != sTest) // compare both UTF-16 representations -> if they are equal, these codecs are transparent for Qt
            {
                type = eQtCodecAsciiIncompatible;
                break;
            }
        }
        CodecTypes.insert(name, type);
    }

    return CodecTypes;
}


                        
这篇关于是否有一个ASCII扩展编码列表？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

是否有一个ASCII扩展编码列表？ [英] Is there a list of ASCII-extending encodings?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否有一个ASCII扩展编码列表？ [英] Is there a list of ASCII-extending encodings?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭