如何检查区域设置是否为UTF-8? [英] How to check if a locale is UTF-8?

查看:363
本文介绍了如何检查区域设置是否为UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在与Yocto合作,为ARM设备(i.MX 6Quad处理器)创建一个嵌入式linux发行版。



我配置了所需的列表具有变量的区域设置:

  IMAGE_LINGUAS =de-de fr-fr en-gb en-gb.iso-8859-1 en-us en-us.iso-8859-1 zh-cn

因此我有获得包含以下文件夹的文件系统:

  root @ lam_icu:/ usr / lib / locale#cd / usr / share / locale / 
root @ lam_icu:/ usr / share / locale#ls -la
total 0
drwxr-xr-x 6 root root 416 Nov 17 2016。
drwxr-xr-x 30 root root 2056 Nov 17 2016 ..
drwxr-xr-x 4 root root 296 Nov 17 2016 de
drwxr-xr-x 3 root root 232 Nov 17 2016 en_GB
drwxr-xr-x 4根根296 Nov 17 2016 fr
drwxr-xr-x 4根根296 Nov 17 2016 zh_CN

和:

  root @ lam_icu:/ usr / share / locale#cd / usr / lib / locale / 
root @ lam_icu:/ usr / lib / locale#ls -la
total 0
drwxr-xr-x 9 root root 640 3月13日2017 。
drwxr-xr-x 32 root root 40000 Mar 13 2017 ..
drwxr-xr-x 3 root root 1016 Mar 13 2017 de_DE
drwxr-xr-x 3 root root 1016 Mar 13 2017 en_GB
drwxr-xr-x 3根根1016 3月13日2017 en_GB.ISO-8859-1
drwxr-xr-x 3根根1016 3月13日2017 en_US
drwxr-xr- x 3根根1016 3月13日2017 en_US.ISO-8859-1
drwxr-xr-x 3根根1016 3月13日2017 fr_FR
drwxr-xr-x 3根根1016 3月13日2017 zh_CN

所有非ISO-8859-1语言环境的编码是哪一个?我可以假设en_GB还是en_US使用UTF-8编码?



我试图打开LC_IDENTIFICATION文件,结果是: / p>


?Hc c美国空手道软件的区域设置
基金会,
Inc. http://www.gnu.org/software/libc/bug-glibc-locales@gnu.orgEnglishUSA1.02000-06-24en_US: 2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000 UTF-8


在文件末尾有一些回忆UTF-8的东西。这是否足以假设编码是UTF-8?



如何检查区域设置是否为UTF-8?

解决方案

LC_IDENTIFICATION 不会告诉你很多:


LC_IDENTIFICATION - 这不是用户可见的类别,它包含有关本地区本身的信息,对于用户或开发人员来说很少有用(但是为了完整起见,这里列出)。


您必须查看完整的文件集。



似乎没有标准的命令行实用程序来执行此操作,但是有一个运行时调用(比原始语言环境函数稍晚添加)。这是一个示例程序,它说明了这个函数 nl_langinfo

  #include < stdio.h中> 
#include< locale.h>
#include< langinfo.h>

int
main(int argc,char ** argv)
{
int n;
for(n = 1; n if(setlocale(LC_ALL,argv [n])!= 0){

char * code = nl_langinfo(CODESET);
if(code!= 0)
printf(%s - >%s\\\
,argv [n],代码);
else
printf(?%s(nl_langinfo)\\\
,argv [n]);
} else {
printf(?%s(setlocale)\\\
,argv [n]);
}
}
return 0;
}

和某些输出,例如 foo $( locale -a)

  aa_DJ  - > ISO-8859-1 
aa_DJ .iso88591 - > ISO-8859-1
aa_DJ.utf8 - > UTF-8
aa_ER - > UTF-8
aa_ER @ saaho - > UTF-8
aa_ER.utf8 - > UTF-8
aa_ER.utf8@saaho - > UTF-8
aa_ET - > UTF-8
aa_ET.utf8 - > UTF-8
af_ZA - > ISO-8859-1
af_ZA.iso88591 - > ISO-8859-1
af_ZA.utf8 - > UTF-8
am_ET - > UTF- 8
am_ET.utf8 - > UTF-8
an_ES - > ISO-8859-15
an_ES.iso885915 - > ISO-8859-15
an_ES.utf8 - > UTF-8
ar_AE - > ISO-8859-6
ar_AE.iso88596 - > ISO-8859-6
ar_AE.utf8 - > UTF-8
ar_BH - > ISO-8859-6
ar_BH.iso88596 - > ISO-8859-6

您所指的目录名称通常(但不是必需的)与编码名称相同。这是在示例程序中做出的假设。 如何获取终端的字符编码有一个相关的问题,但没有任何有用的答案。一个是有趣的,因为它声称

  locale charmap 

将给出区域编码。根据标准,这不一定是这样的:




  • 命令 locale charmap 给出了在 localedef中使用的名称-f


  • 然而, localedef -f 选项。


  • localedef 有一个不同的选项 -u 标识代码集,但 locale (在标准中)没有提供显示此信息的方法。



像往常一样,实现可能(或可能不)以不同的方式处理未指定的要素。 GNU C库的文档在某些方面与标准有所不同(请参阅 locale localedef ),但不提供显示代码集名称的明确选项。


I'm working with Yocto to create an embedded linux distribution for an ARM device (i.MX 6Quad Processors).

I've configured the list of desired locales with the variable:

IMAGE_LINGUAS = "de-de fr-fr en-gb en-gb.iso-8859-1 en-us en-us.iso-8859-1 zh-cn"

As result I've obtained a file systems that contains the following folders:

root@lam_icu:/usr/lib/locale# cd /usr/share/locale/
root@lam_icu:/usr/share/locale# ls -la
total 0
drwxr-xr-x  6 root root  416 Nov 17  2016 .
drwxr-xr-x 30 root root 2056 Nov 17  2016 ..
drwxr-xr-x  4 root root  296 Nov 17  2016 de
drwxr-xr-x  3 root root  232 Nov 17  2016 en_GB
drwxr-xr-x  4 root root  296 Nov 17  2016 fr
drwxr-xr-x  4 root root  296 Nov 17  2016 zh_CN

and:

root@lam_icu:/usr/share/locale# cd /usr/lib/locale/
root@lam_icu:/usr/lib/locale# ls -la
total 0
drwxr-xr-x  9 root root   640 Mar 13  2017 .
drwxr-xr-x 32 root root 40000 Mar 13  2017 ..
drwxr-xr-x  3 root root  1016 Mar 13  2017 de_DE
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_GB
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_GB.ISO-8859-1
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_US
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_US.ISO-8859-1
drwxr-xr-x  3 root root  1016 Mar 13  2017 fr_FR
drwxr-xr-x  3 root root  1016 Mar 13  2017 zh_CN

Which is the encoding of all non ISO-8859-1 locales? Can I assume that "en_GB" or "en_US" use the UTF-8 encoding?

I've tried to open the "LC_IDENTIFICATION" file, the result is:

Hc�������������cEnglish locale for the USAFree Software Foundation, Inc.http://www.gnu.org/software/libc/bug-glibc-locales@gnu.orgEnglishUSA1.02000-06-24en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000UTF-8

At the end of the file there is something that recalls "UTF-8". Is this enough to assume that the encoding is UTF-8?

How to check if a locale is UTF-8?

解决方案

LC_IDENTIFICATION doesn't tell you much:

LC_IDENTIFICATION - this is not a user-visible category, it contains information about the locale itself and is rarely useful for users or developers (but is listed here for completeness sake).

You'd have to look at the complete set of files.

There appears to be no standard command-line utility for doing this, but there is a runtime call (added a little later than the original locale functions). Here is a sample program which illustrates the function nl_langinfo:

#include <stdio.h>
#include <locale.h>
#include <langinfo.h>

int
main(int argc, char **argv)
{
    int n;
    for (n = 1; n < argc; ++n) {
        if (setlocale(LC_ALL, argv[n]) != 0) {

            char *code = nl_langinfo(CODESET);
            if (code != 0)
                printf("%s ->%s\n", argv[n], code);
            else
                printf("?%s (nl_langinfo)\n", argv[n]);
        } else {
            printf("? %s (setlocale)\n", argv[n]);
        }
    }
    return 0;
}

and some output, e.g., by foo $(locale -a):

aa_DJ ->ISO-8859-1
aa_DJ.iso88591 ->ISO-8859-1
aa_DJ.utf8 ->UTF-8
aa_ER ->UTF-8
aa_ER@saaho ->UTF-8
aa_ER.utf8 ->UTF-8
aa_ER.utf8@saaho ->UTF-8
aa_ET ->UTF-8
aa_ET.utf8 ->UTF-8
af_ZA ->ISO-8859-1
af_ZA.iso88591 ->ISO-8859-1
af_ZA.utf8 ->UTF-8
am_ET ->UTF-8
am_ET.utf8 ->UTF-8
an_ES ->ISO-8859-15
an_ES.iso885915 ->ISO-8859-15
an_ES.utf8 ->UTF-8
ar_AE ->ISO-8859-6
ar_AE.iso88596 ->ISO-8859-6
ar_AE.utf8 ->UTF-8
ar_BH ->ISO-8859-6
ar_BH.iso88596 ->ISO-8859-6

The directory names you're referring to are often (but not required) to be the same as encoding names. That is the assumption made in the example program. There was a related question in How to get terminal's Character Encoding, but it has no useful answers. One is interesting though, since it asserts that

locale charmap

will give the locale encoding. According to the standard, that's not necessarily so:

  • The command locale charmap gives the name used in localedef -f

  • However, localedef attaches no special meaning to the name given in the -f option.

  • localedef has a different option -u which identifies the codeset, but locale (in the standard) mentions no method for displaying this information.

As usual, implementations may (or may not) treat unspecified features in different ways. The GNU C library's documentation differs in some respects from the standard (see locale and localedef), but offers no explicit options for showing the codeset name.

这篇关于如何检查区域设置是否为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆