使用unicode路径打开文件。 [英] Opening a file with unicode path.

查看:264
本文介绍了使用unicode路径打开文件。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我需要大力帮助。我有一个文件,我想打开,但问题是该文件可以在UTF目录中(路径可以是cyrilic或拉丁语)。所以我做了大量的搜索,阅读并尝试了近10个堆栈解决方案但是空出来了,此时我非常绝望。



这是我的确切问题:



我得到一个路径,例如:

Hi every one, I come in need of big help. I have a file that I want to open but the problem is that the file can be in a UTF directory (path can be in a cyrilic or latin). So I did an extensive search, read and tried almost 10 stack solution but came out empty, at this point am really desperate.

Here is my exact problem:

I get with a path, for example:

čovećž/test_file.txt



我可以用_wfopen打开它的方法,但问题是这个函数需要wchar_t。

如果我用unicode编码路径它可以工作:


The way I can open this is with _wfopen, but the problem with this is that this function takes wchar_t.
And it can work if I code the path with unicode:

wchar_t path[100] = _T("\u010d\u006f\u0076\u0065\u0107\u017e/test_file.txt");





一次我知道我需要一个单独的wchar_t字符串我试过转换它。



我试过的事情如下:

< br $>


我要求任何人帮忙解决这个问题,要么将字符串转换为unicode,要么使用其他功能(不是wfopen)!!

你也可以使用BOOST lib,我已经设置好了。



目标平台是:仅限Windows!



我想如果有人可以编写一个例子,因为链接到文章不会做太多,因为我认为我阅读了关于这个主题的所有内容。 :(



基本上我需要这个:

https://www.branah.com/unicode-converter [ ^ ]



提前谢谢。



我有什么尝试过:



手动转换:

我尝试手动转换它,通过字符串使用unicode代码更改字符。但在C / C ++中,大多数字符都是相同的,例如ć=č=š。所以这不起作用。



然后我转向stackoverflow:



我试过这个功能:





Once I knew I needed a Unicoded wchar_t string I tried converting it.

Things I tried bellow:


I am asking anyone to help mi out with this, either convert the string to unicode or use some other function (not wfopen)!!
You can also use BOOST lib, I already got it set up.

Targeted platform is: Windows only!

I would like if somebody can code an example, because links to articles won't do much, because I think I read EVERYTHING that is on this topic. :(

So basically I need this:
https://www.branah.com/unicode-converter[^]

Thank you in advance.

What I have tried:

Manual conversion:
I tried converting it manually, going through the string and changing chars with unicode codes. But in C/C++ most of the characters are the same for example ć = č = š. So this did not work.

Then I turned to stackoverflow:

I tried this function:

std::wstring s2ws(const std::string& s)
{
    int len;
    int slength = (int)s.length() + 1;
    len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0);
    wchar_t* buf = new wchar_t[len];
    MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
    std::wstring r(buf);
    delete[] buf;
    return r;
}
std::wstring stemp = s2ws(x);
LPCWSTR result = stemp.c_str();



没有用,然后有人提交了这个:


Did not work, then someone submitted this:

#include <codecvt>
#include <iostream>
#include <iomanip>
#include <string>

int main() {
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;

    std::string s = "test";

    std::cout << std::hex << std::setfill('0');
    std::cout << "Input `char` data: ";
    for (char c : s) {
      std::cout << std::setw(2) << static_cast<unsigned>(static_cast<unsigned char>(c)) << ' ';
    }
    std::cout << '\n';

    std::wstring ws = convert.from_bytes(s);

    std::cout << "Output `wchar_t` data: ";
    for (wchar_t wc : ws) {
      std::cout << std::setw(4) << static_cast<unsigned>(wc) << ' ';
    }
    std::cout << '\n';
}





但这只适用于ASCII代码,但后来我把它转换为一个Unicode字符(ć) <00> 003f ,但它应该是 0107



所以此时我开始认为这是不可能的在C ++中,我安装了 BOOST 并用wpath尝试了一些东西



But this only works for ASCII code, but then i put put a Unicode char (ć) it converts into 003f, but it should be 0107.

So at this point I started thinking it wasn't possible in C++, and I installed BOOST and tried some things with wpath

boost::filesystem::wpath dirPath



它没有用..


And it did not work..

推荐答案

问题是我将CPP文件保存为ANSI ...我有将其转换为UTF-8。我在发布之前尝试了这个,但是VS 2015把它变成了ANSI,我不得不在VS中改变它,所以我可以让它工作。



我尝试打开cpp文件notepad ++并更改编码但是当我打开VS时它会自动返回。所以我期待另存为选项,但没有编码选项。最后我在Visual Studio 2015中找到它



文件 - > 编码下拉列表中的高级保存选项将其更改为Unicode



窗口图像



有一件事对我来说仍然很奇怪,VS是如何正常显示字符的,但是当我用N ++打开文件了吗? (就像它应该是,因为ANSI)?
The problem was that I was saving the CPP file as ANSI... I had to convert it to UTF-8. I tried this before posting but VS 2015 turns it into ANSI, I had to change it in VS so I could get it working.

I tried opening the cpp file with notepad++ and changing the encoding but when I turn on VS it automatically returns. So I was looking to Save As option but there is no encoding option. Finally i found it, in Visual Studio 2015

File -> Advanced Save Options in the Encoding dropdown change it to Unicode

Image of the window

One thing that is still strange to me, how did VS display the characters normally but when I opened the file in N++ there was ? (like it was supposed to be, because of ANSI)?


这主要是解决方案1中提出的问题的解决方案。



在编辑器中打开文件时,会尝试识别编码。使用Unicode文件,可能会有字节顺序标记 - 维基百科,免费的百科全书 [ ^ ]。如果存在,则编辑器知道编码。



如果没有BOM,编辑器可能会通过进一步检查文件是否包含非ASCII来尝试识别Unicode文件字符(代码0x00和0x080到0xFF)。但这取决于编辑。例如,如果您使用Notepad ++创建一个没有BOM的UTF-8文件,VS编辑器可能无法检测到这一点,而不是假定文件要使用当前代码页进行编码。



如果文件未被识别为Unicode,则使用Windows的当前代码页将其视为ASCII / ANSI。



所以它是在创建Unicode文件时始终使用BOM总是一个好主意。如果您在VS高级保存选项对话框中查看允许的编码,您将注意到除了UTF-16LE(在那里称为 Unicode )之外,所有Unicode编码都带有BOM。这表明VS编辑器具有这种编码的检测方法(这并不困难,因为ASCII,ANSI和UTF-8不包含零字节,而UTF-16文件通常总是包含它们)。
This is mainly a solution to the question raised in solution 1.

When a file is opened in an editor, that tries to identify the encoding. With Unicode files, there may be a Byte order mark - Wikipedia, the free encyclopedia[^]. If that is present, the editor knows the encoding.

If there is no BOM, the editor may try to identify Unicode files by further checks if the file contains non ASCII characters (codes 0x00 and 0x080 to 0xFF). But this depends on the editor. If you for example create an UTF-8 file without BOM with Notepad++, this might not be detected by other editors like the VS editor which than assumes the file to be encoded with the current code page.

If the file has not been identified as Unicode it is treated as ASCII/ANSI using the current code page with Windows.

So it is always a good idea to use a BOM when creating Unicode files. If you have a look at the allowed encodings at the VS Advanced Save Options dialog, you will note that all Unicode encodings are with BOM except UTF-16LE (called Unicode there). This indicates that the VS editor has a detection method for this encoding (which is not difficult because ASCII, ANSI, and UTF-8 did not contain zero bytes while UTF-16 files usually always have them).

这篇关于使用unicode路径打开文件。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆