OSX perl批量将文件名作为UTF-16LE中的txt文件中的第一行写入 [英] OSX perl to batch write filename as first line in txt file in UTF-16LE
问题描述
我发现perl真的很有用此处,用于将文本文件的文件名写入文件的第一行.我正在OS X Yosemite的终端上运行此程序:
I found a really useful bit of perl here that writes the filename of a text file to the first line of the file. I am running this from terminal in OS X Yosemite:
perl -i -pe 'BEGIN{undef $/;} s/^/\nFilename:$ARGV\n/' `find . -name '*.TXT'`
经过一些修改,我认为它已经解决了我的特定问题,但是我要拾取的文件是UTF-16LE,因此我发现此命令是用UTF-8编写的,并且使输出真正混乱(文本明显正确,但在excel,filemaker等计算中无法识别.
With some modification I thought it had solved my specific problem however the files I'm picking up are UTF-16LE and I've since discovered this command is writing in UTF-8 and making a real mess of the output (text is visibly correct but is not recognised in calculations in excel, filemaker etc).
经过几次尝试,我需要帮助此脚本将UTF-16LE中的文件名写入文件的开头. (注意:我确实有一种解决方法,可以将文件批量转换为UTF-8,然后运行此方法,但是我更愿意一步完成此工作流程.)
After several attempts I need help with getting this script to write the filename in UTF-16LE to the start of the file. (Note: I do have a workaround now of batch convert files to UTF-8, then run this however I'd prefer to have this workflow in one step).
推荐答案
reinierpost是正确的-它更多是关于删除原始的unicode字节顺序标记(BOM).最后起作用的是:
reinierpost was correct - it was more about removing the original unicode byte order mark (BOM). What worked in the end was:
perl -i -pe 'BEGIN{undef $/;} s/\xFF\xFE/Filename:$ARGV\n/' `find . -name '*.TXT'`
UTF-16LE BOM \ xFF \ xFE替换为我的新字符串.作为参考,其他一些BOM是: -iso-10646-1> \ xFE \ xFF -UTF-16BE> \ xFE \ xFF -UTF-8> \ xEF \ xBB \ xBF
where the UTF-16LE BOM \xFF\xFE is replaced by my new string. For reference some other BOMs are : - iso-10646-1 > \xFE\xFF - UTF-16BE > \xFE\xFF - UTF-8 > \xEF\xBB\xBF
我还可以通过以下方式将新文本写入UTF-16LE:
I was also able to write the new text into UTF-16LE with
perl -i -pe 'BEGIN{binmode STDIN,":encoding(utf8)";binmode STDOUT,":encoding(utf16)"; undef $/;} s/\xFF\xFE/\xFF\xFE\nFilename:$ARGV\n/' `find . -name '*.TXT'`
但是我现在认为我的源数据是UTF8和UTF16的混合包,因为最后一个版本在新的标头和数据之间创建了混合的字符集.感谢reinierpost为我指引正确的方向.我仍然对其他人能否改善这一点很感兴趣.
however I now believe that my source data is a mixed bag of UTF8 and UTF16 as this last version creates a mixed set of characters between the new header and the data. Thanks reinierpost for steering me in the right direction. I remain interested if others can improve this.
这篇关于OSX perl批量将文件名作为UTF-16LE中的txt文件中的第一行写入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!