读取一些数据后,两次提高restrct的结果是错误的 [英] boost restrct twice after reading some data gave wrong result

查看:70
本文介绍了读取一些数据后,两次提高restrct的结果是错误的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这种情况的描述非常复杂,我只显示一些代码片段

The case is quite complicated to describe, I will just show some code snippet

ifstream ifs = xxx; // total length 1200 bytes
// do something that read 200 bytes from ifs;
filtering_istream limitedIn(restrict(ifs, 0/*off*/, 1000/*len*/));
// do something that read 700 bytes from limitedIn; we still got 300 bytes to read
filtering_istream limitedIn2(restrict(limitedIn, 0/*off*/, 300/*len*/)); 
// but now I can only read 100 bytes from limitedIn2 because when restrict limitedIn, it called limitedIn.seek(0, cur) which in turn calls ifs.seek(0, cur) which return 900 and updated limitedIn _pos to 900

有没有简单的方法可以避免此问题?如果我能够在ifstream周围获得流包装,即使返回200,它将返回0.

Is there any simple way to avoid this problem? If I am able to get a stream wrapper around ifstream that returns 0 for seek even if ifstream return 200 thing will get a lot simpler.

    ofstream out(R"(C:\Users\nick\AppData\Local\Temp\bugTest)", ios_base::binary);
    char* content = new char[1200];
    out.write(content, 1200);
    out.flush();
    out.close();
    ifstream in(R"(C:\Users\nick\AppData\Local\Temp\bugTest)", ios_base::binary);
    in.read(content, 200);
    filtering_istream limitedIn(restrict(in, 0, 1000));
    limitedIn.read(content, 700);
    filtering_istream limitedIn2(restrict(limitedIn, 0, 300));
    std::cout << limitedIn2.rdbuf()->sgetn(content, 300); // print 100
    delete []content;

推荐答案

我创建了一个更详细的自包含示例,可以更好地显示正在发生的事情:

I created a more verbose self-contained example that shows better what is happening:

Live Godbolt

Live On Godbolt

#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/restrict.hpp>
#include <fstream>
#include <fmt/ranges.h>
#include <span>

int main() {
    constexpr int capacity = 12;
    std::array<char, capacity> content{'a','b','c','d','e','f','g','h','i','j','k','l'};
    {
        std::ofstream out(R"(bugTest)", std::ios::binary);
        out.write(content.data(), content.size());
        out.flush();
    }
    using boost::iostreams::restrict;
    using boost::iostreams::filtering_istream;

    auto read_n = [](auto&& is, size_t n) {
        std::array<char, capacity> content{0};
        bool ok(is.read(content.data(), n));
        fmt::print("{} read {} bytes out of {}: {}\n",
                ok?"successful":"failed",
                is.gcount(), n,
                std::span(content.data(), is.gcount()));
    };

    std::ifstream in("bugTest", std::ios::binary);
    read_n(in, 2);

    filtering_istream limitedIn(restrict(in, 0, 10));
    read_n(limitedIn, 7);
    read_n(filtering_istream(restrict(limitedIn, 0, 3)), 3);
}

打印

successful read 2 bytes out of 2: {'a', 'b'}
successful read 7 bytes out of 7: {'c', 'd', 'e', 'f', 'g', 'h', 'i'}
failed read 1 bytes out of 3: {'j'}

设备是否表现得更好?

概念问题似乎是对的重复限制.也许您可以使用设备:

Do Devices Behave "Better"?

The conceptual problem seems to be repeated restriction of a stream. Perhaps you can use a device:

Live Godbolt

Live On Godbolt

#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/restrict.hpp>
#include <fstream>
#include <fmt/ranges.h>
#include <span>

int main() {
    constexpr int capacity = 12;
    std::array<char, capacity> content{'a','b','c','d','e','f','g','h','i','j','k','l'};
    {
        std::ofstream out(R"(bugTest)", std::ios::binary);
        out.write(content.data(), content.size());
        out.flush();
    }
    using boost::iostreams::restrict;
    using boost::iostreams::filtering_istream;
    using boost::iostreams::is_device;

    auto read_n = [](auto&& str_or_dev, size_t n) {
        std::array<char, capacity> content{0};
        bool ok = true;
        size_t count = 0;
        if constexpr (is_device<std::decay_t<decltype(str_or_dev)>>::value) {
            if (auto actual = str_or_dev.read(content.data(), n); actual != -1) {
                ok = true;
                count = actual;
            } else {
                ok = false;
                count = 0;
            }
        } else {
            ok = str_or_dev.good();
            count = str_or_dev.gcount();
        }

        fmt::print("{} read {} bytes out of {}: {}\n",
                ok?"successful":"failed", count, n,
                std::span(content.data(), count));
    };

    boost::iostreams::file_source in("bugTest", std::ios::binary);
    read_n(in, 2);

    auto limitedIn(restrict(in, 0, 10));
    read_n(limitedIn, 7);
    read_n(restrict(limitedIn, 0, 3), 3);
}

确实

  • 简化代码
  • 可能会提高性能
  • 有所作为(因为最终阅读被认为是成功")
  • 没有区别(就算窗口计算异常仍然存在)

现在,看来我们处在简单的库错误/未记录的限制范围内.

By now, it's looking like we're in plain-and-simple library bug/undocumented limitation territory.

我会让您知道无论如何我都认为输出令人惊讶,我希望第二次读取再次从'a'开始,尤其是在Device形式中.那是因为 offset = 0 对我而言意味着绝对的搜寻.

I'll have you know that I consider the output surprising anyways, I'd expect the second read to start from 'a' again, especially in the Device formulation. That's because offset = 0 implies to me an absolute seek.

让我们检测每次读取时相对于限制的实际流位置:

Let's detect the actual stream position relative to the restriction at each read:

auto pos = [](auto&& str_or_dev) {
    return str_or_dev.seek(0, std::ios::cur);
};

然后:

auto read_n = [pos](auto&& str_or_dev, size_t n) {
    fmt::print("read at #: {}\n", pos(str_or_dev));
    // ...

这可以带来更多的见解!它打印

This leads to more insight! It prints

read at #: 0    
successful read 2 bytes out of 2: {'a', 'b'}    
read at #: 2    
successful read 7 bytes out of 7: {'c', 'd', 'e', 'f', 'g', 'h', 'i'}    
read at #: 9    
failed read 0 bytes out of 3: {}

哇.我们了解到:

    实际上,我们并非从"9"开始甚至相对于限制.因此,我的期望似乎在逻辑上应该从'a'开始读
  • 仅观察读取前的位置会使最后读取返回 -1 ,这在某种程度上比返回1个字符更有意义
  • indeed, we don't start at "9" even relative to the restriction. So my expectation that the second read should logically commence at 'a' seems to hold
  • just observing the position prior to read leads the last read to return -1, which in a way makes more sense than returning 1 character

实际上,通过查看相关的构造函数,我们可以推断出 restriction 很难假设基础源位于原始位置:

Indeed, looking at the relevant constructor, we can deduce that restriction makes the hard assumption that the underlying source is at origin:

template<typename Device>
restricted_indirect_device<Device>::restricted_indirect_device
    (param_type dev, stream_offset off, stream_offset len)
    : device_adapter<Device>(dev), beg_(off), pos_(off), 
      end_(len != -1 ? off + len : -1)
{
    if (len < -1 || off < 0)
        boost::throw_exception(BOOST_IOSTREAMS_FAILURE("bad offset"));
    iostreams::skip(this->component(), off);
}

skip 是无条件的,并且不遵守 dev 的流位置.现在,在您的情况下,您会认为这没关系,因为 off 始终为0,但流位置却不是.

The skip is unconditional and doesn't observe the stream-position of dev. Now, in your case you'd think it doesn't matter because off is always 0, but the stream position is not.

在每次读取之前从头开始寻找底层设备确实会消除所有意外的影响:

Seeking the underlying device to the start before every read does remove all unexpected effects:

in.seek(0, std::ios::beg);
read_n(in, 2);

auto limitedIn(restrict(in, 0, 10));

in.seek(0, std::ios::beg);
read_n(limitedIn, 7);

in.seek(0, std::ios::beg);
read_n(restrict(limitedIn, 0, 3), 3);

打印文件:

read at #: 0    
successful read 2 bytes out of 2: {'a', 'b'}
read at #: 0    
successful read 7 bytes out of 7: {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
read at #: 0
successful read 3 bytes out of 3: {'a', 'b', 'c'}

当然不是您想要的,但是它有助于理解它的作用,因此您知道该说些什么来获得我们要做的想要的.

Of course that's not what you want but it helps to understand what it does, so you know what to say to get what we do want.

现在,通过寻求 restricted 开始,我们开始研究该问题:

Now, by instead seeking to the restricted begin we're going to home in on the problem: https://godbolt.org/z/r11zo8. Now the last read_n triggers a bad_seek, but, surprisingly, in the second line only:

str_or_dev.seek(0, std::ios::beg);
str_or_dev.seek(0, std::ios::cur);

更糟糕的是,尝试开始限制两次突然会得到与将基础设备搜索为0相同的行为?!

What's even worse, seeking to start of the restriction twice suddenly gets the same behaviour as seeking the underlying device to 0?!

str_or_dev.seek(0, std::ios::beg);
str_or_dev.seek(0, std::ios::beg);

查看 Live Godbolt

See it Live On Godbolt

successful read 2 bytes out of 2: {'a', 'b'}
successful read 7 bytes out of 7: {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
successful read 3 bytes out of 3: {'a', 'b', 'c'}

错误报告时间

可悲的是,这意味着可能是时候提交错误报告了.当然,您可以自己避免使用嵌套限制,但这在您的实际代码库(可能包括通用流)中可能不那么容易做到.

Bug Report Time

Sadly this means it's probably time to file a bug report. Of course you can avoid the use of nested restrictions yourself, but this might not be as easy to do in your actual code base (which is likely to include generic streams).

有许多快速而肮脏的技巧可以使限制实现".它在底层组件中的位置(如上述顿悟中,我们发现 -1 " .

There are a number of quick and dirty hacks that can make the restriction "realize" its position from the underlying component (as in the above epiphany where we found that "just observing the position prior to read leads the last read to return -1".

我不会提出任何建议,因为如果您具有两个以上级别的限制,它们也可能会崩溃-可能被更多适应性层所隐藏.

I'm not going to suggest any of those, because they, too, would break down if you had more than 2 levels of restrictions - perhaps hidden by some more adaptive layers.

这是需要在库中修复的东西.至少,文档可能会声明在创建受限制的设备或过滤器时,假定基础组件位于开始位置.

This is a thing that needs to be fixed in the library. At the very least the documentation might state that the underlying component is assumed to be at start position on creation of the restricted device or filter.

这篇关于读取一些数据后,两次提高restrct的结果是错误的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆