从Ant tar任务打包的.tar.gz中提取时包含非拉丁字符的文件名的编码 [英] Encoding of filenames containing non-latin characters while extracting from .tar.gz packed by Ant tar task
问题描述
我正在使用Ant构建tar.gz存档:
< tar destfile ="$ {linux86.zip.file}" compression ="gzip" longfile ="gnu">< tarfileset dir ="$ {work.dir}/data" dirmode ="755" filemode ="755"prefix ="$ {app.folder}/data"/></tar>
存档是在Windows上构建的.在Ubuntu 12上解压了名称包含非拉丁字符(例如西里尔字母)的文件后,文件名便损坏了.
有什么办法可以解决或解决此问题?
我在Ant的开发人员邮件列表( 2009年7月1日)和ASF Bugzilla中( 53811 ).这个问题是古老且众所周知的,主要由于意识形态原因尚未解决,因为并非所有untar实现都支持该问题.
Bugzilla问题中提到的补丁已应用于版本 1350857 中.tar中有一个名称为条目名称的编码名称的构造函数:
公共TarOutputStream(OutputStream os,字符串编码){...}
但是,它从未在Tar任务中使用.因此,我在Tar任务中创建了一个编码属性,从修改后的源中重建了Ant,并使用UTF-8作为条目名的编码.
提取已在Ubuntu 11/12和Mandriva下进行测试.
I'm building a tar.gz archive using Ant:
<tar destfile="${linux86.zip.file}" compression="gzip" longfile="gnu">
<tarfileset dir="${work.dir}/data" dirmode="755" filemode="755"
prefix="${app.folder}/data"/>
</tar>
Archive is built on Windows. After being extracted on Ubuntu 12 files with names containing non-latin (for example, cyrillic) characters have broken names.
Is there any way to fix or work around that?
I have found some interesting information in Ant's developer mailing list (30 Jun 2009, 01 Jul 2009) and in ASF Bugzilla (36851, 53811). The problem is old and well-known, it has not been fixed mainly for ideological reasons because not all untar implementations support that.
Patch mentioned in Bugzilla issue has been applied in revision 1350857. There is a constructor with name of encoding for entry name in tar:
public TarOutputStream(OutputStream os, String encoding) { ... }
But it is never used in Tar task though. So I made an encoding attribute in Tar task, rebuilt Ant from modified sources and used UTF-8 as encoding of entry names.
Extraction tested under Ubuntu 11/12 and Mandriva.
这篇关于从Ant tar任务打包的.tar.gz中提取时包含非拉丁字符的文件名的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!