如何在C ++中逐行读取CSV文件(Google文档) [英] How to read a CSV file (google doc) line by line in c++

查看:74
本文介绍了如何在C ++中逐行读取CSV文件(Google文档)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我必须阅读一个以google doc格式创建的CSV文件,其中包含一些法语口音.当我尝试使用fgets()函数读取此文件时,它用一些垃圾值代替了法语字符.

我是C ++的新手,不知道如何读取未编码的文件.
请指导我以获得解决方案.如果您提供代码库,对我来说会更好.


在此先多谢.


Rajesh

解决方案

第一件事是确定文本文件正在使用的字符集. 垃圾值"是法语字符在字符集中的编码方式.从您的评论中,我非常确定它不是Unicode,但是如果您发布读取的字符串的值(以十六进制表示),将更容易确定.我怀疑文本使用MBCS(多字节字符集),在这种情况下,为了在Windows上正确显示它,您可能需要使用正确的代码页.

顺便说一句,"r"模式是正确的,因为fopen默认为文本模式,但是您可以根据需要使用"rt".在任何情况下,以文本或二进制格式读取文本文件都没有什么区别(除了行终止的方式).


摘录自您的评论:
我在调试时观察Visual Studio IDE内部的垃圾字符.我正在使用VS6.0.

好的,所以我认为这是有关字符集的问题.正如安德鲁所说,您的文本文件可能不是Unicode文本文件,并且由于您的计算机不是法语,因此您无法正确显示字符. (Visual Studio将采用当前的语言环境设置,并为8位字符加载相应的字符集).

我建议您使用以下功能将输入字符串转换为unicode字符串:

 // 只是为字符串缓冲区设置最大大小
#define MAX_SIZE 1000
 // 更改此值以使用其他字符集
#define CODE_PAGE 1250

// 转换ansi字符串(每个字符8位)
// 转换为unicode字符串(每个字符16位)
// 使用常量CODE_PAGE提供的代码页
// 注意:请勿删除或释放返回的指针!
WCHAR * AnsiToUnicode(LPCSTR ansiString)
{
    静态 WCHAR unicodeString [MAX_SIZE];
    MultiByteToWideChar(
        CODE_PAGE,// 代码页
        MB_PRECOMPOSED,// 字符类型选项
        ansiString,// 要映射的字符串的地址
        -1,// 字符串中的字节数
        unicodeString,// 宽字符缓冲区的地址
        MAX_SIZE // 缓冲区的大小
    );
    返回 unicodeString;
}

// 转换unicode字符串(每个字符16位)
// 转换为ansi字符串(每个字符8位)
// 使用常量CODE_PAGE提供的代码页
// 注意:请勿删除或释放返回的指针!
char * UnicodeToAnsi(LPCWSTR unicodeString)
{
    静态 字符 ansiString [MAX_SIZE];
    WideCharToMultiByte(
        CODE_PAGE,// 代码页
         0 // 性能和映射标志
        unicodeString,// 宽字符字符串的地址
        -1,// 字符串中的字符数
        ansiString,// 新字符串的缓冲区地址
        MAX_SIZE,// 缓冲区的大小
        NULL,// 不可映射字符的默认地址
        NULL // 默认情况下设置的标志的地址
    );
    返回 ansiString;
}

// 测试
无效 test()
{
     文件* fp;
     字符 str [ 100 ];
     fp = _tfopen(_T(" ),_ T(" ));
      while (fgets(str, 100 ,fp))
     {
          // 转换字符串
          WCHAR * wstr = AnsiToUnicode(str);
 
          // 做某事……
     }
} 



或者您可以使用这些功能的更干净版本:

  int  AnsiToUnicode(LPCSTR ansiString,LPWSTR unicodeString, int  maxSize)
{
    返回 MultiByteToWideChar(
        CODE_PAGE,// 代码页
        MB_PRECOMPOSED,// 字符类型选项
        ansiString,// 要映射的字符串的地址
        -1,// 字符串中的字节数
        unicodeString,// 宽字符缓冲区的地址
        maxSize // 缓冲区的大小
    );
}

 int  UnicodeToAnsi(LPCWSTR unicodeString,char * ansiString, int  maxSize)
{
    返回 WideCharToMultiByte(
        CODE_PAGE,// 代码页
         0 // 性能和映射标志
        unicodeString,// 宽字符字符串的地址
        -1,// 字符串中的字符数
        ansiString,// 新字符串的缓冲区地址
        maxSize,// 缓冲区的大小
        NULL,// 不可映射字符的默认地址
        NULL // 默认情况下设置的标志的地址
    );
} 



并且不要忘记在Visual Studio中启用unicode字符串显示:
要将调试器选项设置为显示Unicode字符串,请依次单击工具"菜单,选项",调试",然后选中显示Unicode字符串"复选框.给我!).
您可能正在尝试从非法语版本的Windows读取法语"文件?

在读取文件之前尝试更改语言环境:

 // 语言环境功能必需
#include   ><  > 

无效 yourFunction()
{
    // 将当前线程的语言环境设置仅更改为法语
    setlocale(LC_ALL," );
    // 然后打开并阅读您的文本文件
    //  ... 
} 


Hi All,

I have to read a CSV file created as google doc containing some french accent. When I tried to read this file using fgets() function, it replaced the french character with some garbage values.

I am new to c++ and does not has the idea how to read a unicoded file.
Please guide me in order to get the solution. It will be more nice to me if you provide the code base.


Thanks a lot in advance.


Rajesh

解决方案

The first thing is to determine the character set that the text file is using. The "garbage values" are how the French character is encoded in the character set. From your comments I am pretty sure it is not Unicode but if you posted the values (in hex) of the string that you read in that would make it much easier to be sure. I suspect the text uses a MBCS (multi-byte character set) in which case to display it properly on Windows you may need to use the correct code page.

BTW The "r" mode is correct as fopen defaults to text mode, but you can use "rt" if you want. In any case reading a text file as text or binary makes little difference (except for the way lines are terminated).


From your comment:
I am observing the garbage characters inside Visual Studio IDE while debugging. I am using VS6.0.

OK so I think it is a problem about character sets. As Andrew said it, your text file is probably not a Unicode text file and since your computer is not french you can''t display the characters properly. (Visual Studio will take the current locale settings and load the corresponding characters set for the 8-bits characters).

I suggest that you convert your input string into unicode strings using these functions:

//just to set a max size for the string buffers
#define MAX_SIZE 1000
//change this value to use a different characters set
#define CODE_PAGE 1250

//converts an ansi string (8 bits per character)
//into a unicode string (16 bits per character)
//using the code page provided by constant CODE_PAGE
//Note: do not delete or free the returning pointer!
WCHAR* AnsiToUnicode(LPCSTR ansiString)
{
    static WCHAR unicodeString[MAX_SIZE];
    MultiByteToWideChar(
        CODE_PAGE,          // code page
        MB_PRECOMPOSED,     // character-type options
        ansiString,         // address of string to map
        -1,                 // number of bytes in string
        unicodeString,      // address of wide-character buffer
        MAX_SIZE            // size of buffer
    );
    return unicodeString;
}

//converts a unicode string (16 bits per character)
//into an ansi string (8 bits per character)
//using the code page provided by constant CODE_PAGE
//Note: do not delete or free the returning pointer!
char* UnicodeToAnsi(LPCWSTR unicodeString)
{
    static char ansiString[MAX_SIZE];
    WideCharToMultiByte(
        CODE_PAGE,      // code page
        0,              // performance and mapping flags
        unicodeString,  // address of wide-character string
        -1,             // number of characters in string
        ansiString,     // address of buffer for new string
        MAX_SIZE,       // size of buffer
        NULL,           // address of default for unmappable characters
        NULL            // address of flag set when default
    );
    return ansiString;
}

//test
void test()
{
     FILE *fp;
     char str[100];
     fp = _tfopen(_T("D:\\myfile.csv"), _T("rt"));
     while (fgets(str, 100, fp))
     {
          //convert the string
          WCHAR* wstr = AnsiToUnicode(str);
 
          //do something......
     }
}



Or you may use cleaner versions of these functions:

int AnsiToUnicode(LPCSTR ansiString, LPWSTR unicodeString, int maxSize)
{
    return MultiByteToWideChar(
        CODE_PAGE,          // code page
        MB_PRECOMPOSED,     // character-type options
        ansiString,         // address of string to map
        -1,                 // number of bytes in string
        unicodeString,      // address of wide-character buffer
        maxSize             // size of buffer
    );
}

int UnicodeToAnsi(LPCWSTR unicodeString, char* ansiString, int maxSize)
{
    return WideCharToMultiByte(
        CODE_PAGE,      // code page
        0,              // performance and mapping flags
        unicodeString,  // address of wide-character string
        -1,             // number of characters in string
        ansiString,     // address of buffer for new string
        maxSize,        // size of buffer
        NULL,           // address of default for unmappable characters
        NULL            // address of flag set when default
    );
}



And don''t forget to enable unicode string display under Visual Studio:
To set your debugger options to display Unicode strings, click the Tools menu, click Options, click Debug, then check the Display Unicode Strings check box.


This function should work properly (well it does for me!).
You are maybe trying to read a "french" file from a non-french version of windows?

Try changing the locale before reading the file:

//required for the locale function
#include <locale.h>

void yourFunction()
{
    //changed locale settings for the current thread only to french
    setlocale(LC_ALL, "French");
    //then open and read your text file
    //...
}


这篇关于如何在C ++中逐行读取CSV文件(Google文档)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆