OSX perl批量将文件名作为UTF-16LE中的txt文件中的第一行写入 [英] OSX perl to batch write filename as first line in txt file in UTF-16LE

查看:104
本文介绍了OSX perl批量将文件名作为UTF-16LE中的txt文件中的第一行写入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现perl真的很有用此处,用于将文本文件的文件名写入文件的第一行.我正在OS X Yosemite的终端上运行此程序:

I found a really useful bit of perl here that writes the filename of a text file to the first line of the file. I am running this from terminal in OS X Yosemite:

perl -i -pe 'BEGIN{undef $/;} s/^/\nFilename:$ARGV\n/' `find . -name '*.TXT'`

经过一些修改,我认为它已经解决了我的特定问题,但是我要拾取的文件是UTF-16LE,因此我发现此命令是用UTF-8编写的,并且使输出真正混乱(文本明显正确,但在excel,filemaker等计算中无法识别.

With some modification I thought it had solved my specific problem however the files I'm picking up are UTF-16LE and I've since discovered this command is writing in UTF-8 and making a real mess of the output (text is visibly correct but is not recognised in calculations in excel, filemaker etc).

经过几次尝试,我需要帮助此脚本将UTF-16LE中的文件名写入文件的开头. (注意:我确实有一种解决方法,可以将文件批量转换为UTF-8,然后运行此方法,但是我更愿意一步完成此工作流程.)

After several attempts I need help with getting this script to write the filename in UTF-16LE to the start of the file. (Note: I do have a workaround now of batch convert files to UTF-8, then run this however I'd prefer to have this workflow in one step).

推荐答案

reinierpost是正确的-它更多是关于删除原始的unicode字节顺序标记(BOM).最后起作用的是:

reinierpost was correct - it was more about removing the original unicode byte order mark (BOM). What worked in the end was:

perl -i -pe 'BEGIN{undef $/;} s/\xFF\xFE/Filename:$ARGV\n/' `find . -name '*.TXT'`

UTF-16LE BOM \ xFF \ xFE替换为我的新字符串.作为参考,其他一些BOM是: -iso-10646-1> \ xFE \ xFF -UTF-16BE> \ xFE \ xFF -UTF-8> \ xEF \ xBB \ xBF

where the UTF-16LE BOM \xFF\xFE is replaced by my new string. For reference some other BOMs are : - iso-10646-1 > \xFE\xFF - UTF-16BE > \xFE\xFF - UTF-8 > \xEF\xBB\xBF

我还可以通过以下方式将新文本写入UTF-16LE:

I was also able to write the new text into UTF-16LE with

perl -i -pe 'BEGIN{binmode STDIN,":encoding(utf8)";binmode STDOUT,":encoding(utf16)"; undef $/;} s/\xFF\xFE/\xFF\xFE\nFilename:$ARGV\n/' `find . -name '*.TXT'`

但是我现在认为我的源数据是UTF8和UTF16的混合包,因为最后一个版本在新的标头和数据之间创建了混合的字符集.感谢reinierpost为我指引正确的方向.我仍然对其他人能否改善这一点很感兴趣.

however I now believe that my source data is a mixed bag of UTF8 and UTF16 as this last version creates a mixed set of characters between the new header and the data. Thanks reinierpost for steering me in the right direction. I remain interested if others can improve this.

这篇关于OSX perl批量将文件名作为UTF-16LE中的txt文件中的第一行写入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆