创建文件的自定义标头(元数据) [英] Create customized header (metadata) for files

查看:59
本文介绍了创建文件的自定义标头(元数据)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里我想创建一个标头,其中包含其他文件详细信息,例如其他文件的元数据.

Here I want to create a header which contains other file details like metadata of other files.

如果我对struct file_header使用静态值,则此代码可以正常工作. 如果我将malloc用于struct file_header,那么我在此代码中遇到问题. 具体来说,我在fread中遇到问题.也许fwrite完美地工作了. 代码在这里:

This code is works fine if I use static values for struct file_header. If I am using malloc for struct file_header then I am getting a problem in this code. Specifically, I'm getting a problem in fread. Maybe fwrite worked perfectly. The code is here:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <string.h>

char path[1024] = "/home/test/main/Integration/testing/package_DIR";

//int count = 5;

struct files {

    char *file_name;
    int file_size;
};

typedef struct file_header {

    int file_count;
    struct files file[5];
} metadata;


metadata *create_header();

int main() {
    FILE *file = fopen("/home/test/main/Integration/testing/file.txt", "w");
    metadata *header;
    header = create_header();
    if(header != NULL)
    {
        printf("size of Header is %d\n",sizeof(header));
    }

    if (file != NULL) {

        if (fwrite(&header, sizeof(header), 1, file) < 1) {
            puts("short count on fwrite");
        }
        fclose(file);
    }
    file = fopen("/home/test/main/Integration/testing/file.txt", "rb");
    if (file != NULL) {
        metadata header = { 0 };
        if (fread(&header, sizeof(header), 1, file) < 1) {
            puts("short count on fread");
        }
        fclose(file);
        printf("File Name = %s\n", header.file[0].file_name);
        printf("File count = %d\n", header.file_count);
        printf("File Size = %d\n", header.file[0].file_size);
    }
    return 0;
}

metadata *create_header()
{
    int file_count = 0;
    DIR * dirp;
    struct dirent * entry;
    dirp = opendir(path);
    metadata *header = (metadata *)malloc(sizeof(metadata));
    while ((entry = readdir(dirp)) != NULL) {
        if (entry->d_type == DT_REG) { /* If the entry is a regular file */

            header->file[file_count].file_name = (char *)malloc(sizeof(char)*strlen(entry->d_name));
            strcpy(header->file[file_count].file_name,entry->d_name);
            //Put static but i have logic for this i will apply later.
            header->file[file_count].file_size = 10;
            file_count++;

        }
    }
    header->file_count = file_count;
    closedir(dirp);
    //printf("File Count : %d\n", file_count);
    return header;
}

输出:

size of Header is 8
short count on fread
File Name = (null)
File count = 21918336
File Size = 0

有人可以帮我解决这个问题吗?

Can anybody please help me to solve this issue?

推荐答案

您正在64位计算机上工作,因为指针的长度为8个字节.

You're working on a 64-bit machine because your pointers are 8 bytes long.

您试图将数据写到文件中,然后再读回.由于指针写得不好,您遇到了麻烦. (更准确地说:可以毫无问题地编写指针,但是指针仅在当前正在运行的程序中具有含义,并且很少适合于写入磁盘,甚至更不适合于从磁盘读回.)

You're trying to write data out to a file and then read it back in. You're running into problems because pointers don't write very well. (More precisely: pointers can be written without any problems, but the pointers only have meaning in the current running program, and are seldom suitable for writing to disk and even more seldom suitable for reading back from disk.)

您的代码的这一部分说明了问题:

This part of your code illustrates the problem:

struct files {
    char *file_name;
    int file_size;
};

typedef struct file_header {
    int file_count;
    struct files file[5];
} metadata;


metadata *create_header();

int main() {
    FILE *file = fopen("/home/test/main/Integration/testing/file.txt", "w");
    metadata *header;
    header = create_header();
    if(header != NULL)
    {
        printf("size of Header is %d\n",sizeof(header));
    }

侧面评论:

  • 使文件名成为main()的参数,或者至少成为变量.两次写出名字很难更改.
  • 最好进行一些错误检测.但是,尽管有很大的改进空间,但我不会批评它.
  • Make the file name into an argument to main(), or, at least, into a variable. Writing the name out twice makes it hard to change.
  • It is good that you are doing some error detection. However, I'm not going to critique it, though there is considerable room for improvement in it.

主要评论:

  • 您会在输出中看到size of Header is 8,因为header是指针. sizeof(metadata)(header指向的类型)要大得多,可能为48个字节,但这取决于编译器如何在结构中对齐和打包数据.

  • You see size of Header is 8 in the output because header is a pointer. The sizeof(metadata) (the type that header points to) is much larger, probably 48 bytes, but it depends on how your compiler aligns and packs data in the structure.

if (file != NULL) {    
    if (fwrite(&header, sizeof(header), 1, file) < 1) {
        puts("short count on fwrite");
    }
    fclose(file);
}

此代码将8个字节的数据写入文件.它写的是您的header变量存储的地址.它不会写入它指向的任何数据.

This code writes 8 bytes of data to the file. What it writes is the address where your header variable is stored. It does not write anything of the data that it points at.

更接近您所追求的(但仍然无法正常工作)的是:

What would get closer to what you are after (but is still not going to work) is:

        if (fwrite(header, sizeof(*header), 1, file) < 1) {
            puts("short count on fwrite");
        }

这会将48字节左右的内容写到文件中.但是,您的文件将不包含文件名.它仅包含指向文件写入时文件名存储位置的指针.在这里要非常小心.如果您阅读此文件,甚至可能会看到它似乎可以正常工作,因为这些名称可能尚未从内存中删除.

This will write 48 bytes or thereabouts out to file. However, your file won't contain the file names; it will merely contain the pointers to where the file names were stored at the time when the file was written. Be very careful here. If you read this file, you might even see it appearing to work because the names may not have been erased from memory yet.

要将文件名放入文件中,必须分别处理每个文件名.您必须决定一个约定.例如,您可能会决定在名称前加上2个字节的unsigned short前缀,其中包含文件名的长度L,然后是包含文件名及其末尾NUL '\0'的L + 1个字节的数据.然后,您将写入每个文件数据的其他(固定大小)部分.您将对每个文件重复此过程.读取文件的相反操作需要理解书面结构.在需要文件名的那一刻,您将读取两个字节的长度,并且可以使用该长度为文件名分配空间.然后,您将L + 1个字节读入新分配的文件名中.然后,您读取文件的其他定长数据,然后移至下一个文件.

To get the file names into the file, you will have to handle each one separately. You'll have to decide on a convention. For example, you might decide that the name will be prefixed by a 2-byte unsigned short which contains the length of the file name, L, followed by L+1 bytes of data containing the file name and its terminal NUL '\0'. Then you'd write the other (fixed size) parts of the per-file data. And you'd repeat this process for each of the files. The converse operation, reading the file, requires understanding of the written structure. At the point where you expect a file name, you'll read the two-byte length, and you can use that length to allocate the space for the file name. Then you read the L+1 bytes into the newly allocated file name. Then you read the other fixed-length data for the file, then move onto the next file.

如果您希望能够在单个fwrite()然后是fread()中完成全部操作,则必须修改数据结构:

If you want to be able to do it all in a single fwrite() and then fread(), you are going to have to revise your data structure:

struct files {
    char  file_name[MAX_PERMITTED_FILENAME_LENGTH];
    int   file_size;
};

您可以决定允许的最大文件名长度是多少,但它是固定的.如果您的名字很短,那么您就不会使用所有空格.如果您的名字很长,它们可能会被截断.现在,您的metadata结构大小急剧增加(至少在MAX_PERMITTED_FILENAME_LENGTH是合理的大小(例如32到1024字节之间)的情况下).但是,您可以通过一次操作读取和写入整个metadata结构.

You get to decide what the maximum permitted filename length is, but it is fixed. If your names are short, you don't use all the space; if your names are long, they may be truncated. Your metadata structure size now increases dramatically (at least if MAX_PERMITTED_FILENAME_LENGTH is a reasonable size, say between 32 and 1024 bytes). But you can read and write the entire metadata structure in a single operation with this.

感谢您的回复,但是我是C语言的新手,那么我该如何实现呢?

Thanks for your reply but I am new in C so how can I achieve this thing?

最终,您将能够像这样进行编码.

Eventually, you'll be able to code it somewhat like this.

#include <dirent.h>
#include <errno.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAX_FILES = 5 };

struct files
{
    char *file_name;
    int file_size;
};

typedef struct file_header
{
    int file_count;
    struct files file[MAX_FILES];
} metadata;

static void err_exit(const char *format, ...);
static metadata *create_header(const char *directory);
static void release_header(metadata *header);
static void write_header(FILE *fp, const metadata *header);
static metadata *read_header(FILE *fp);
static void dump_header(FILE *fp, const char *tag, const metadata *header);

int main(int argc, char **argv)
{
    if (argc != 3)
        err_exit("Usage: %s file directory\n", argv[0]);

    const char *name = argv[1];
    const char *path = argv[2];
    FILE *fp = fopen(name, "wb");

    if (fp == 0)
        err_exit("Failed to open file %s for writing (%d: %s)\n", name, errno, strerror(errno));

    metadata *header = create_header(path);
    dump_header(stdout, "Data to be written", header);
    write_header(fp, header);
    fclose(fp);                     // Ignore error on close
    release_header(header);

    if ((fp = fopen(name, "rb")) == 0)
        err_exit("Failed to open file %s for reading (%d: %s)\n", name, errno, strerror(errno));

    metadata *read_info = read_header(fp);
    dump_header(stdout, "Data as read", read_info);
    release_header(read_info);

    fclose(fp);                     // Ignore error on close
    return 0;
}

static metadata *create_header(const char *path)
{
    int file_count = 0;
    DIR * dirp = opendir(path);
    struct dirent * entry;
    if (dirp == 0)
        err_exit("Failed to open directory %s (%d: %s)\n", path, errno, strerror(errno));
    metadata *header = (metadata *)malloc(sizeof(metadata));
    if (header == 0)
        err_exit("Failed to malloc space for header (%d: %s)\n", errno, strerror(errno));

    header->file_count = 0;
    while ((entry = readdir(dirp)) != NULL && file_count < MAX_FILES)
    {
        // d_type is not portable - POSIX says you can only rely on d_name and d_ino
        if (entry->d_type == DT_REG)
        {   /* If the entry is a regular file */
            // Avoid off-by-one under-allocation by using strdup()
            header->file[file_count].file_name = strdup(entry->d_name);
            if (header->file[file_count].file_name == 0)
                err_exit("Failed to strdup() file %s (%d: %s)\n", entry->d_name, errno, strerror(errno));
            //Put static but i have logic for this i will apply later.
            header->file[file_count].file_size = 10;
            file_count++;
        }
    }
    header->file_count = file_count;
    closedir(dirp);
    //printf("File Count : %d\n", file_count);
    return header;
}

static void write_header(FILE *fp, const metadata *header)
{
    if (fwrite(&header->file_count, sizeof(header->file_count), 1, fp) != 1)
        err_exit("Write error on file count (%d: %s)\n", errno, strerror(errno));
    const struct files *files = header->file;
    for (int i = 0; i < header->file_count; i++)
    {
        unsigned short name_len = strlen(files[i].file_name) + 1;
        if (fwrite(&name_len, sizeof(name_len), 1, fp) != 1)
            err_exit("Write error on file name length (%d: %s)\n", errno, strerror(errno));
        if (fwrite(files[i].file_name, name_len, 1, fp) != 1)
            err_exit("Write error on file name (%d: %s)\n", errno, strerror(errno));
        if (fwrite(&files[i].file_size, sizeof(files[i].file_size), 1, fp) != 1)
            err_exit("Write error on file size (%d: %s)\n", errno, strerror(errno));
    }
}

static metadata *read_header(FILE *fp)
{
    metadata *header = malloc(sizeof(*header));
    if (header == 0)
        err_exit("Failed to malloc space for header (%d:%s)\n", errno, strerror(errno));
    if (fread(&header->file_count, sizeof(header->file_count), 1, fp) != 1)
        err_exit("Read error on file count (%d: %s)\n", errno, strerror(errno));
    struct files *files = header->file;
    for (int i = 0; i < header->file_count; i++)
    {
        unsigned short name_len;
        if (fread(&name_len, sizeof(name_len), 1, fp) != 1)
            err_exit("Read error on file name length (%d: %s)\n", errno, strerror(errno));
        files[i].file_name = malloc(name_len);
        if (files[i].file_name == 0)
            err_exit("Failed to malloc space for file name (%d:%s)\n", errno, strerror(errno));
        if (fread(files[i].file_name, name_len, 1, fp) != 1)
            err_exit("Read error on file name (%d: %s)\n", errno, strerror(errno));
        if (fread(&files[i].file_size, sizeof(files[i].file_size), 1, fp) != 1)
            err_exit("Read error on file size (%d: %s)\n", errno, strerror(errno));
    }
    return(header);
}

static void dump_header(FILE *fp, const char *tag, const metadata *header)
{
    const struct files *files = header->file;
    fprintf(fp, "Metadata: %s\n", tag);
    fprintf(fp, "File count: %d\n", header->file_count);
    for (int i = 0; i < header->file_count; i++)
        fprintf(fp, "File %d: size %5d, name %s\n", i, files[i].file_size, files[i].file_name);
}

static void release_header(metadata *header)
{
    for (int i = 0; i < header->file_count; i++)
    {
        /* Zap file name, and pointer to file name */
        memset(header->file[i].file_name, 0xDD, strlen(header->file[i].file_name)+1);
        free(header->file[i].file_name);
        memset(&header->file[i].file_name, 0xEE, sizeof(header->file[i].file_name));
    }
    free(header);
}

static void err_exit(const char *format, ...)
{
    va_list args;
    va_start(args, format);
    vfprintf(stderr, format, args);
    va_end(args);
    exit(EXIT_FAILURE);
}

我将其编译为dump_file,并按如下所示运行它:

I compiled it as dump_file, and ran it as shown:

$ dump_file xyz .
Metadata: Data to be written
File count: 5
File 0: size    10, name .gitignore
File 1: size    10, name args.c
File 2: size    10, name atob.c
File 3: size    10, name bp.pl
File 4: size    10, name btwoc.c
Metadata: Data as read
File count: 5
File 0: size    10, name .gitignore
File 1: size    10, name args.c
File 2: size    10, name atob.c
File 3: size    10, name bp.pl
File 4: size    10, name btwoc.c
$ odx xyz
0x0000: 05 00 00 00 0B 00 2E 67 69 74 69 67 6E 6F 72 65   .......gitignore
0x0010: 00 0A 00 00 00 07 00 61 72 67 73 2E 63 00 0A 00   .......args.c...
0x0020: 00 00 07 00 61 74 6F 62 2E 63 00 0A 00 00 00 06   ....atob.c......
0x0030: 00 62 70 2E 70 6C 00 0A 00 00 00 08 00 62 74 77   .bp.pl.......btw
0x0040: 6F 63 2E 63 00 0A 00 00 00                        oc.c.....
0x0049:
$

我可能应该将err_exit()重命名为err_sysexit()并修改错误处理,以便在该函数内处理errno和相应的字符串,而不是在对的调用中反复添加errnostrerror(errno). err_exit().

I should probably have renamed err_exit() as err_sysexit() and revised the error handling so that errno and the corresponding string were handled inside that function, rather than repeatedly adding errno and strerror(errno) to the calls to err_exit().

将一些相当广泛的评论转移到这个问题中:

Transferring some of the rather extensive commentary into the question:

我尝试了上面的代码,但在File : 4之后出现了分段错误,这意味着数据写入工作正常,但是我在读取数据时遇到了一些问题. Nimit

I tried above code, but getting segmentation fault after File : 4, which means that the data write is working properly but I'm having some problem with data read. Nimit

我尝试了上面的代码,但是从文件中读取数据时遇到了分段错误. user1089679

I tried above code and I'm getting a segmentation fault while I am reading data from file. user1089679

糟糕:valgrind给我有关release_header()中无效写入的警告.那会把事情搞砸.尽管—并不难解决.是release_header()中的第二个memset()引起了恶作剧;我不小心省略了&符:

Oops: valgrind is giving me warnings about an invalid write in release_header(). That would screw things up. It isn't hard to resolve, though — it is the second memset() in release_header() that's causing the mischief; I accidentally omitted the ampersand:

memset( header->file[i].file_name, 0xEE, sizeof(header->file[i].file_name));  // Broken
memset(&header->file[i].file_name, 0xEE, sizeof(header->file[i].file_name));  // Correct

此问题已在代码中修复.请注意,这两个memset()操作都在代码中,以确保如果内存被重用,则它不包含以前的有效数据,这是有风险的,因为该代码最初会将指针写到磁盘上,然后再次读取它们. memset()调用不会出现在常规生产代码中.

This is fixed in the code. Note that both memset() operations are in the code to ensure that if memory is reused, it does not contain previous valid data, which was a risk given that the code was originally writing pointers out to disk and then reading them back again. The memset() calls would not be present in normal production code.

请注意,odx是自制的十六进制转储程序(默认情况下,Mac OS X没有hd程序).您的系统可能已经有hd用于十六进制转储,或者您可以尝试 hd 或尝试自己的Google Fu寻找替代品.

Note that odx is a home-brew hex dump program (Mac OS X does not have an hd program by default). Your system may already have hd for hex dump, or you could try hd or try your own Google Fu to find alternatives.

只想问一下,我想在跨平台上运行此程序,那么低位计算机有什么问题吗? Nimit

在大端或小端机器上,此代码都没有问题;如果您将数据从小端(Intel)计算机移至大端(SPARC,PPC等)计算机,反之亦然.该代码可能对32位和64位版本也很敏感.我没有将字段大小定义为n位,而是将诸如int这样的方便类型定义为可以在系统之间进行更改的类型.如果要使用可移植数据,请确定字段大小(至少对于非字符串数据至少为1、2、4、8个字节),然后以标准方式写入-MSB优先(大端)或也许是LSB在前(小尾数法).

There is no problem with this code on either big-endian or little-endian machines; there'd be problems if you take data from a little-endian (Intel) machine to a big-endian (SPARC, PPC, ...) machine or vice versa. The code is probably also sensitive to 32-bit vs 64-bit builds; I didn't define field sizes as n-bits but as convenient types like int which can change between systems. If you want portable data, decide on the field sizes (1, 2, 4, 8 bytes, mostly, at least for the non-string data), and then write it in a standard way - MSB first (big-endian) or perhaps LSB first (little-endian).

这篇关于创建文件的自定义标头(元数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆