在哪里可以找到每个C99字符集的所有字符的表? [英] Where can I find a table of all the characters for every C99 Character Set?

查看:72
本文介绍了在哪里可以找到每个C99字符集的所有字符的表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为以下C个字符集中的每个字符寻找一个表(或生成一个表的方法):

I'm looking for a table (or a way to generate one) for every character in each of the following C Character Sets:


  • 基本字符集

  • 基本执行字符集

  • 基本源字符集

  • 执行字符集

  • 扩展字符集

  • 源字符集

  • Basic Character Set
  • Basic Execution Character Set
  • Basic Source Character Set
  • Execution Character Set
  • Extended Character Set
  • Source Character Set

C99在第5.2.1节中提到了所有这六个。但是,我发现它很难阅读且缺乏详细说明。

C99 mentions all six of these under section 5.2.1. However, I've found it extremely cryptic to read and lacking in detail.

它唯一定义的唯一字符集是基本执行字符集 strong>和基本源字符集

The only character sets that it clearly defines is the Basic Execution Character Set and the Basic Source Character Set:


52个大写和小写字母
拉丁字母:

ABCDEFGHIJKLMNOPQRSTU VWXYZ

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

abcdefghijklmnopqrstu vwxyz

a b c d e f g h i j k l m n o p q r s t u v w x y z

十个十进制数字:

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

29个图形字符:

! #%&'()* +,–。/:;< =>?[\] ^ _ _ {|}〜

! " # % & ' ( ) * + , – . / : ; < = > ? [ \ ] ^ _ { | } ~

4空格字符:

空格,水平制表符,垂直制表符,换页符

space, horizontal tab, vertical tab, form feed

我相信它们与基本字符集相同,尽管我猜测C99没有明确说明这一点,其余的字符集对我来说还是个谜。

I believe these are the same as the Basic Character Set, though I'm guessing as C99 does not explicitly state this. The remaining Character Sets are a bit of a mystery to me.

感谢您提供的任何帮助!:)

Thanks for any help you can offer! :)

推荐答案

除了您提到的基本字符集外,其余所有字符集都是实现定义的。这意味着它们可以是任何东西,但 implementation (是C编译器/库/工具链的实现)必须记录这些决定。

Except for the Basic Character Set as you mentioned, all of the rest of the character sets are implementation-defined. That means that they could be anything, but the implementation (that is, the C compiler/libraries/toolchain implementation) must document those decisions. The key paragraphs here are:


§3.4.1实现定义的行为

未指定的行为,其中ea ch实施文件记录了如何做出选择

§3.4.1 implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made

§3.4.2特定于语言环境的行为

取决于本地的行为每个实施文件记录的国籍,文化和语言惯例

§3.4.2 locale-specific behavior
behavior that depends on local conventions of nationality, culture, and language that each implementation documents

§5.2.1.1字符集

两组字符及其相关的整理顺序必须定义:写入源文件的集合(源字符集),以及在执行环境中解释的集合(执行字符集)。每个集合又分为一个基本字符集(其内容由本小节给出)和一组零个或多个特定于语言环境的成员(不是基本字符集)称为扩展字符。组合集也称为扩展字符集。执行字符集的成员的值是实现定义的

§5.2.1.1 Character sets
Two sets of characters and their associated collating sequences shall be defined: the set in which source files are written (the source character set), and the set interpreted in the execution environment (the execution character set). Each set is further divided into a basic character set, whose contents are given by this subclause, and a set of zero or more locale-specific members (which are not members of the basic character set) called extended characters. The combined set is also called the extended character set. The values of the members of the execution character set are implementation-defined.

因此,请看一下C编译器的文档以找出其他字符集。例如,在我的gcc手册页中,一些命令行选项指出:

So, look at your C compiler's documentation to find out what the other character sets are. For example, in my man page for gcc, some of the command line options state:


   -fexec-charset=charset
       Set the execution character set, used for string and character
       constants.  The default is UTF-8.  charset can be any encoding
       supported by the system's "iconv" library routine.

   -fwide-exec-charset=charset
       Set the wide execution character set, used for wide string and
       character constants.  The default is UTF-32 or UTF-16, whichever
       corresponds to the width of "wchar_t".  As with -fexec-charset,
       charset can be any encoding supported by the system's "iconv"
       library routine; however, you will have problems with encodings
       that do not fit exactly in "wchar_t".

   -finput-charset=charset
       Set the input character set, used for translation from the
       character set of the input file to the source character set used by
       GCC.  If the locale does not specify, or GCC cannot get this
       information from the locale, the default is UTF-8.  This can be
       overridden by either the locale or this command line option.
       Currently the command line option takes precedence if there's a
       conflict.  charset can be any encoding supported by the system's
       "iconv" library routine.

要获取 iconv 支持的编码列表,请运行 iconv -l <​​/ code>。我的系统有143种不同的编码可供选择。

To get a list of the encodings supported by iconv, run iconv -l. My system has 143 different encodings to choose from.

这篇关于在哪里可以找到每个C99字符集的所有字符的表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆