文本文件二进制搜索 [英] Text File Binary Search

查看:128
本文介绍了文本文件二进制搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的测试文件:

I have a test file that looks like this:

Ampersand           Gregorina           5465874526370945
Anderson            Bob                 4235838387422002
Anderson            Petunia             4235473838457294
Aphid               Bumbellina          8392489357392473
Armstrong-Jones     Mike                8238742438632892

代码如下:

#include <iostream>
#include <string>
#include <fstream>

class CardSearch
{
protected:
    std::ifstream cardNumbers;

public:
    CardSearch(std::string fileName)
    {
        cardNumbers.open(fileName, std::ios::in);

        if (!cardNumbers.is_open())
        {
            std::cout << "Unable to open: " << fileName;
        }
        return;
    }

    std::string Find(std::string lastName, std::string firstName)
    {
        // Creating string variables to hold first and last name
        // as well as card number. Also creating bools to decide whether
        // or not the person has been found or if the last name is the only
        // identifier for a found person
        std::string lN;
        std::string fN;
        std::string creditNumber;
        bool foundPerson = false;

        // By using the seekg and tellg functions, we can find our place
        // in the file and also calculate the amount of lines within the file
        cardNumbers.seekg(0, std::ios::beg);
        cardNumbers.clear();
        std::streamsize first = cardNumbers.tellg();
        cardNumbers.ignore(std::numeric_limits<std::streamsize>::max());
        cardNumbers.clear();
        std::streamsize last = cardNumbers.tellg();
        cardNumbers.seekg(0, std::ios::beg);
        std::streamsize lineNumbers = (last / 57);
        std::streamsize middle;

        while (first <= lineNumbers)
        {
            middle = (first + lineNumbers) / 2;
            // middle * 57 takes us to the beginning of the correct line
            cardNumbers.seekg(middle * 57, std::ios::beg);
            cardNumbers.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

            cardNumbers >> lN >> fN;

            if (lN < lastName)
            {
                first = middle + 1;
            }
            else if (lN > lastName)
            {
                lineNumbers = middle - 1;
            }
            else
            {
                if (fN < firstName)
                {
                    first = middle + 1;
                }
                else if (fN > firstName)
                {
                    lineNumbers = middle - 1;
                }
                else if (fN == firstName)
                {
                    foundPerson = true;
                    break;
                }
            }
        }

        if (foundPerson)
        {
            // When a person is found, we seek to the correct line position and 
            // offset by another 40 characters to receive the card number
            cardNumbers.seekg((middle * 57) + 40, std::ios::beg);
            std::cout << lN << ", " << fN << " ";
            cardNumbers >> creditNumber;
            return creditNumber;
        }
        return "Unable to find person.\n";
    }
};

int main()
{
    CardSearch CS("C:/Users/Rafael/Desktop/StolenNumbers.txt");
    std::string S = CS.Find("Ampersand", "Gregorina");
    std::cout << S;

    std::cin.ignore();
    std::cin.get();

    return 0;
}

我可以检索列表中除第一条记录以外的所有记录.似乎seekg正在寻找正确的位置,但是cardNumbers没有读取正确的信息.当'middle'设置为0时,seekg应该搜索到第0行,(middle * 57),读入Ampersand Gregorina并进行比较.相反,它仍在阅读安德森·鲍勃(Anderson Bob).

I am able to retrieve all but the first record in the list. It seems as though the seekg is seeking to the correct position but cardNumbers is not reading the correct information. When 'middle' is set to 0, the seekg should seek to the 0th line, (middle * 57), read in Ampersand Gregorina and make a comparison. Instead, it remains reading Anderson Bob.

关于为什么会发生这种情况的任何想法吗?

Any ideas as to why this may be happening?

谢谢

推荐答案

使用诸如seekg之类的功能时,总是最好在binary mode中打开文件,而不是像现在的代码那样以文本模式打开文件.换句话说,您应该这样做:

When using functions such as seekg, it is always best to open the file in binary mode, not text mode as your code is doing now. In other words, you should be doing this:

cardNumbers.open(fileName, std::ios::in | std::ios::binary);

原因是,在文本模式下打开文件将允许行尾翻译.这样会呈现出诸如seekgtellg之类的功能,充其量介于不稳定(或很幸运的工作)之间,充其量在最坏的情况下,对文本处理毫无用处.

The reason is that opening a file in text mode will allow end-of-line translations to be done. This renders functions such as seekg, tellg, etc. anything between unstable (or lucky to work) at best, and in the worst case, useless for text processing.

以二进制模式打开文件时,seekg和其他功能家族按预期工作,因为没有行尾翻译.实际上,您将在指定的文件中查找字节偏移,而不会被行尾翻译抛弃.

When a file is opened in binary mode, the seekg and other family of functions work as expected, since there are no end-of-line translations being done. You will actually seek to the byte offset in the file that you specify, and not be thrown off by end-of-line translations.

此外,执行此操作后,行中数据的长度不仅包括可见文本,还包括构成行尾序列的不可见字符.因此,您手动计算的57在二进制模式下将是不正确的-应该分别为58或59,具体取决于您使用的是Linux/Unix还是Windows.

Also, once you do this, the length of the data in the line includes not only the visible text, but also the invisible characters that make up the end-of-line sequence. So your hand calculation of 57 is not going to be correct in binary mode -- it should be 58 or 59, depending on whether you are using Linux / Unix, or Windows, respectively.

这篇关于文本文件二进制搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆