文件名os x中的不同utf8编码 [英] Different utf8 encoding in filenames os x

查看:141
本文介绍了文件名os x中的不同utf8编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 .x中有一个小的shellscript

  $ cat .x 
u =Böhmáí
触摸$ u
ls> .list
echo$ u> .text

cat .list .text
diff .list .text
od -bc .list
od -bc .text

当我运行这个scrpit sh -x .x (-x仅用于显示命令)

  $ sh -x .x 
+ u =Böhmáí
+ touchBöhmáí
+ ls
+ echoBöhmáí
+ cat .list .text
Böhmáí
Böhmáí
+列表.text
1c1
< Böhmáí
---
> Böhmáí
+ od -bc .list
0000000 102 157 314 210 150 155 141 314 201 151 314 201 012
B ö** hma ** i ** \\\

0000015
+ od -bc .text
0000000 102 303 266 150 155 303 241 303 255 012
Bö** hmá**í** \\\

0000012

相同的字符串Böhmáí已编码在文件名中的不同字节作为文件的内容。在终端(utf8编码)中,两个变体中的字符串看起来相同



兔子在哪里? / p>

解决方案

(这主要是从
$ b $ Unicode允许一些重音字符以几种不同的方式表示:作为表示重音字符的代码点,或者代表一系列表示该字符的非重要版本的代码点,后面是重音符号, 。例如,ä可以被表示为U + 00E4(UTF-8 0xc3a4,拉丁小字母1,具有diaeresis)或分解为U + 0061 U + 0308(UTF-8 0x61cc88,拉丁语小字母a +组合打点)。



OS X的HFS +文件系统要求将所有文件名存储在其完全分解形式。在HFS +文件名中,ä必须编码为0x61cc88,ö必须编码为0x6fcc88。



所以这里发生的是你的shell脚本包含 Böhmáí,所以它以这种方式存储在变量 a 中,并以.text的形式存储。但是当您使用该名称创建文件(使用 touch )时,文件系统会将其转换为实际文件名的分解表单。而当您 ls 它,它显示文件系统具有的形式:分解的表单。


I have a small shellscript in .x

$ cat .x
u="Böhmáí"
touch "$u"
ls > .list
echo "$u" >.text

cat .list .text
diff .list .text
od -bc .list
od -bc .text

When i run this scrpit sh -x .x (-x only for showing commands)

$ sh -x .x
+ u=Böhmáí
+ touch Böhmáí
+ ls
+ echo Böhmáí
+ cat .list .text
Böhmáí
Böhmáí
+ diff .list .text
1c1
< Böhmáí
---
> Böhmáí
+ od -bc .list
0000000   102 157 314 210 150 155 141 314 201 151 314 201 012            
           B   o   ̈    **   h   m   a   ́    **   i   ́    **  \n            
0000015
+ od -bc .text
0000000   102 303 266 150 155 303 241 303 255 012                        
           B   ö  **   h   m   á  **   í  **  \n                        
0000012

The same string Böhmáí has encoded into different bytes in the filename vs as a content of a file. In the terminal (utf8-encoded) the string looks same in both variants.

Where is the rabbit?

解决方案

(This is mostly stolen from a previous answer of mine...)

Unicode allows some accented characters to be represented in several different ways: as a "code point" representing the accented character, or as a series of code points representing the unaccented version of the character, followed by the accent(s). For example, "ä" could be represented either precomposed as U+00E4 (UTF-8 0xc3a4, Latin small letter 1 with diaeresis) or decomposed as U+0061 U+0308 (UTF-8 0x61cc88, Latin small letter a + combining diaeresis).

OS X's HFS+ filesystem requires that all filenames be stored in the UTF-8 representation of their fully decomposed form. In an HFS+ filename, "ä" MUST be encoded as 0x61cc88, and "ö" MUST be encoded as 0x6fcc88.

So what's happening here is that your shell script contains "Böhmáí" in precomposed form, so it gets stored that way in the variable a, and stored that way in the .text file. But when you create a file with that name (with touch), the filesystem converts it to the decomposed form for the actual filename. And when you ls it, it shows the form the filesystem has: the decomposed form.

这篇关于文件名os x中的不同utf8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆