奇怪的字符在文件的开头 [英] strange characters at beginning of file

查看:144
本文介绍了奇怪的字符在文件的开头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我正在编辑的文件的开头有奇怪的字符(使用textmate ..)
我不知道他们什么时候出现,他们在文本中不可见,但我的脚本读取文件

there are strange characters at the beginning of a file I'm editing (using textmate..) I don't know when they appeared, they're invisible in textmate but my script that reads the file goes crazy..

这是文件中的前几个字符(如od命令所示):

this is the first few chars in the file (as seen with od command):

0000000 177377 000120 000105 000117 000120 000114 000105 000072

不应该在那里我认为..也许他们是由一些奇怪的Dropbox同步引起的?或其他的东西..但他们往往会重现(我还不知道什么时候..)

the first 2 shouldn't be there I think.. maybe they were caused by some strange dropbox sync? Or something else.. but they tend to reappear (I don't yet know when..)

我的问题:什么是177377和一个简单的方法来删除它在我的ruby脚本?
感谢

My question: what is that 177377 and a simple way to remove it in my ruby script? thanks

推荐答案

000000 177377 c $ c> 0x0000FEFF )是字节顺序标记(BOM )。它向消费者表明该文件的其余部分位于 big-endian UTF-32编码。在您的情况下,这可能不是正确的,但这是字节指示的。

The 000000 177377 (hex 0x0000FEFF) is a byte-order mark (BOM). It indicates to consumers that the remainder of the file is in big-endian UTF-32 encoding. This may not be correct in your case, but that's what the bytes indicate.

该怎么做有点棘手。通常,BOM 精确地表示以下数据的编码。检测和跳过它并处理后续内容,就像它在你的本地默认字符集通常是错误的事情做,即使它似乎是正确的在这里。相反,我想知道为什么你的编辑插入一个不正确的BOM,以及是否有一种方法来禁用它。

What to do with it is a little tricky. In general, the BOM does accurately represent the encoding of the following data. Detecting and skipping it and treating the subsequent content as if it were in your local default charset is usually going to be the wrong thing to do, even though it seems to be correct here. Instead, I'd try to figure out why your editor is inserting an incorrect BOM and whether there's a way to disable it.

这篇关于奇怪的字符在文件的开头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆