Visual Studio 2008项目文件由于意外的编码更改而无法加载 [英] Visual Studio 2008 project file does not load because of an unexpected encoding change

查看:235
本文介绍了Visual Studio 2008项目文件由于意外的编码更改而无法加载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我们的团队中,我们在visual Studio 2008中有一个数据库项目,由Team Foundation Server进行源代码控制。每两周左右,一名同事检查后,项目文件将不会在其他开发者机器上加载。错误消息是:


无法加载项目文件。根级别的数据无效。第1行,第1位。


当在Notepad ++中查看项目文件时,文件如下所示:



NUL?NULNNULNNULNULULNNULSNULNNULNNULNNUL ...



和所以(你可以在这里看到<?xml version
,而正常的项目文件如下所示:



<?xml version =1.0encoding =utf-16?> ...



所以可能文件的编码有问题。这对我们来说是一个问题,因为事实证明不可能再次获得文件编码。 解决方案是从项目文件中删除最后一个知识工作版本的源代码控制。



根据该文件,编码应该是UTF-16。根据Notepad ++,损坏的文件实际上是UTF-8。



我的问题是:




  • 为什么Visual Studio搞砸
    项目文件的编码,
    显然是在随机的时间和
    随机机器上?

  • 什么我们应该如何防止这种情况?

  • 发生这种情况时,是否有
    的可能性以正确的编码方式恢复当前的
    文件,而不是

    源代码控制中拉一个旧版本?



最后一个注意事项:问题是单个项目文件,所有其他项目文件不会公开这个问题。



更新:感谢Jon Skeet的建议,我有第三问题的答案。
当我将两个字节FF FE替换前九个字节EF BB BF EF BF BD EF BF BD时,项目文件将再次加载。



仍然是Visual Studio破坏文件的问题。

解决方案

我想我可以提供一些洞察力,



FF FE 是一个 BOM ;它在文件开头的存在表明文件的编码是UTF-16,little-endian。听起来原始文件真的是UTF-16,但有些东西忽略了BOM,就像是UTF-8那样读它。



当发生这种情况时,每个字节 FF FE 被视为无效,并转换为官方Unicode垃圾字符 U + FFFD 。然后,当文本再次写入文件时,每个垃圾字符被转换为UTF-8编码( EF BF BD )和UTF- 8之前添加了BOM( EF BB BF ),导致您报告的九个字节序列:

  EF BB BF#UTF-8 BOM 
EF BF BD#U + FFFD在UTF-8
EF BF BD#ditto

如果是这种情况,只需用 FF FE 不安全。不能保证文件中唯一被解释为UTF-8的字节将无效。只要该文件只包含ASCII字符就可以了,但其他任何东西,如重音字符(é)或卷曲引号(')将被无法克服。



项目文件是否真的应该是UTF-16?如果没有,也许在版本控制系统期待UTF-8时,一个开发者的系统正在生成UTF-16。我在Visual C#Express安装中注意到,在$ code> Environment-> Documents 下有一个选项,称为将数据保存在代码页中时将文档另存为Unicode。这听起来像可能导致编码在显然随机的时候改变的东西。


In our team we have a database project in visual Studio 2008 which is under source control by Team Foundation Server. Every two weeks or so, after one co-worker checks in, the project file won't load on the other developers machines. The error message is:

The project file could not be loaded. Data at the root level is invalid. Line 1, position 1.

When I look at the project file in Notepad++, the file looks like this:

��<NUL?NULxNULmNULlNUL NULvNULeNULrNULsNULiNULoNULnNUL ...

and so on (you can see <?xml version in this) whereas an normal project file looks like:

<?xml version="1.0" encoding="utf-16"?> ...

So probably something is wrong with the encoding of the file. This is a problem for us because it turns out to be impossible to get the file encoding correct again. The 'solution' is to throw away the project file an get the last know working version from source control.

According to the file, the encoding should be UTF-16. According to Notepad++, the corrupted file is actually UTF-8.

My questions are:

  • Why is Visual Studio messing up the encoding of the project file, apparently at random times and at random machines?
  • What should we do to prevent this?
  • When it has happened, is there a possibility to restore the current file in the correct encoding instead of pulling an older version from source control?

As a last note: the problem is with one single project file, all other project files don't expose this problem.

UPDATE: Thanks to Jon Skeet's suggestion I have the answer to question number three. When I replace the first nine bytes EF BB BF EF BF BD EF BF BD by the two bytes FF FE, the project file will load again.

This leaves still the question why Visual Studio corrupts the file.

解决方案

I think I can provide some insight into what's happening, if not why.

FF FE is a BOM; its presence at the beginning of the file indicates that the file's encoding is UTF-16, little-endian. And it sounds like the original file really is UTF-16, but something is ignoring the BOM and reading it as if it were UTF-8.

When that happens, each of the bytes FF and FE is treated as invalid and converted to U+FFFD, the official Unicode garbage character. Then, when the text is written to a file again, each of the garbage characters gets converted to its UTF-8 encoding (EF BF BD) and the UTF-8 BOM (EF BB BF) is added in front of them, resulting in the nine-byte sequence you reported:

EF BB BF  # UTF-8 BOM
EF BF BD  # U+FFFD in UTF-8
EF BF BD  # ditto

If this is the case, simply replacing those nine bytes with FF FE is not safe. There's no guarantee those are the only bytes in the file that would be invalid when interpreted as UTF-8. As long as the file contains only ASCII characters you're okay, but anything else, like accented characters (é) or curly quotes (), will be irretrievably mangled.

Are the project files really supposed to be UTF-16? If not, maybe that one developer's system is generating UTF-16 when the version-control system is expecting UTF-8. I notice in my Visual C# Express install there's an option under Environment->Documents called "Save documents as Unicode when data cannot be saved in codepage". That sounds like something that could cause the encoding to change at apparently random times.

这篇关于Visual Studio 2008项目文件由于意外的编码更改而无法加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆