优化磁盘IO [英] Optimizing disk IO

查看：96 发布时间：2018/8/24 17:34:18 c optimization io

本文介绍了优化磁盘IO的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一段代码可以分析来自非常大（10-100GB）二进制文件的数据流。它运行良好，所以是时候开始优化了，目前磁盘IO是最大的瓶颈。

I have a piece of code that analyzes streams of data from very large (10-100GB) binary files. It works well, so it's time to start optimizing, and currently disk IO is the biggest bottleneck.

有两种类型的文件正在使用中。第一种类型的文件由16位整数流组成，必须在I / O之后进行缩放，以转换为物理上有意义的浮点值。我以块的形式读取文件，并通过一次读取一个16位代码，执行所需的缩放，然后将结果存储在数组中来读取数据块。代码如下：

There are two types of files in use. The first type of file consists of a stream of 16-bit integers, which must be scaled after I/O to convert to a floating point value which is physically meaningful. I read the file in chunks, and I read in the chunks of data by reading one 16-bit code at a time, performing the required scaling, and then storing the result in an array. Code is below:

int64_t read_current_chimera(FILE *input, double *current,
                             int64_t position, int64_t length, chimera *daqsetup)
{
    int64_t test;
    uint16_t iv;

    int64_t i;
    int64_t read = 0;

    if (fseeko64(input, (off64_t)position * sizeof(uint16_t), SEEK_SET))
    {
        return 0;
    }

    for (i = 0; i < length; i++)
    {
        test = fread(&iv, sizeof(uint16_t), 1, input);
        if (test == 1)
        {
            read++;
            current[i] = chimera_gain(iv, daqsetup);
        }
        else
        {
            perror("End of file reached");
            break;
        }
    }
    return read;
}

chimera_gain函数只需要一个16位整数，缩放它并返回双倍存储。

The chimera_gain function just takes a 16-bit integer, scales it and returns the double for storage.

第二种文件类型包含64位双精度数，但它包含两列，其中我只需要第一列。要做到这一点，我会双击双打并丢弃第二个双打。双重必须在使用前进行字节交换。我用来执行此操作的代码如下：

The second file type contains 64-bit doubles, but it contains two columns, of which I only need the first. To do this I fread pairs of doubles and discard the second one. The double must also be endian-swapped before use. The code I use to do this is below:

int64_t read_current_double(FILE *input, double *current, int64_t position, int64_t length)
{
    int64_t test;
    double iv[2];

    int64_t i;
    int64_t read = 0;

    if (fseeko64(input, (off64_t)position * 2 * sizeof(double), SEEK_SET))
    {
        return 0;
    }

    for (i = 0; i < length; i++)
    {
        test = fread(iv, sizeof(double), 2, input);
        if (test == 2)
        {
            read++;
            swapByteOrder((int64_t *)&iv[0]);
            current[i] = iv[0];
        }
        else
        {
            perror("End of file reached: ");
            break;
        }
    }
    return read;
}

任何人都可以建议一种读取这些文件类型的方法我目前在做什么？

Can anyone suggest a method of reading these file types that would be significantly faster than what I am currently doing?

推荐答案

首先，使用 profile r以识别热点。根据您对问题的描述，您会有大量的开销。由于文件很大，增加每个io读取的数据量会有很大的好处。

First off, it would be useful to use a profiler to identify the hot spots in your program. Based on your description of the problem, you have a lot of overhead going on by the sheer number of freads. As the files are large there will be a big benefit to increasing the amount of data you read per io.

通过整理2个小程序来说服自己。流。

Convince yourself of this by putting together 2 small programs that read the stream.

1) read it as you are in the example above, of 2 doubles.

2) read it the same way, but make it 10,000 doubles.

时间都运行了几次，你可能会观察到＃2的运行速度要快得多。

Time both runs a few times, and odds are you will be observe #2 runs much faster.

祝您好运。

这篇关于优化磁盘IO的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

优化磁盘IO [英] Optimizing disk IO

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

优化磁盘IO [英] Optimizing disk IO

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭