读取文件的第一行会给我一个"\ 357 \ 273 \ 277"第一行中的前缀 [英] reading first line in a file gives me a "\357\273\277" prefix in the first row

查看:62
本文介绍了读取文件的第一行会给我一个"\ 357 \ 273 \ 277"第一行中的前缀的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我将函数readTheNRow与row = 0一起使用时(我读了第一行),我发现三个第一个字符为\ 357,\ 273和\ 277.我发现此前缀与UTF-8文件有一些关联,但有些文件具有此前缀而有些则没有:(.我如何在我想从它们读取的文件中忽略所有此类前缀?/p>

when I use the function readTheNRow with row=0 (i read the first row) i find that the three first chars are \357 ,\273 and \277. i found that this prefix is some how related to UTF-8 files, but some files have this prefix and some don't :( . how do i ignore all type of such prefixes in the files that i want to read from them?

int readTheNRow(char buff[], int row) {

int file = open("my_file.txt", O_RDONLY);
if (file < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}

// function's variables
int i = 0;
char ch; // a temp variable to read with it
int check; // helping variable for checking the read function

// read till we reach the needed row
while (i != row) {

    // read one char
    check = read(file, &ch, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    if (check < 0) {
        // if means that we reached the end of file
        return -1; // couldn't read the N row (N is bigger than X)
    }
    printf("%c",ch);
    // check that the char is a \n
    if (ch == '\n') {
        i++;
    }
}

// read the number to the received buffer
i = 0;

do {
    // read one char
    check = read(file, buff + i, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    // if we reached the end of file
    if (check == 0) {
        break;
    }
    i++;

} while (buff[i - 1] != '\n');

// put the \0 in the end of the string
 buff[i - 1] = '\0';
return 1; // return that reading was successful

    // try to close the file
if (close(file) < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}
}

推荐答案

您似乎正在尝试读取带有所谓的BOM(字节顺序标记)的文件.

You seem to be trying to read a file carrying a so called BOM (Byte Ordering Mark).

测试这些前缀,如果它们在周围,则使用从中提取的电位信息,然后继续读取文件,并按BOM指示将其解释.

Test for such prefixes and if they are around used the potenial info draw from it, then go on and read the file, interpreting it as the BOMs indicates.

序列 \ 357 \ 273 \ 277 表示正在跟踪UTF-8.不需要考虑字节顺序,因为字节是此类文件的单位.

The sequence \357 \273 \277 indicates UTF-8 is following. Which does not need to take byte-ordering into account, as the byte is the unit for such files.

此处提供了有关各种现有BOM的更多信息: http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

More on the various existing BOMs here: http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

这篇关于读取文件的第一行会给我一个"\ 357 \ 273 \ 277"第一行中的前缀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆