文本文件二进制搜索 [英] Text File Binary Search
问题描述
我有一个看起来像这样的测试文件:
I have a test file that looks like this:
Ampersand Gregorina 5465874526370945
Anderson Bob 4235838387422002
Anderson Petunia 4235473838457294
Aphid Bumbellina 8392489357392473
Armstrong-Jones Mike 8238742438632892
代码如下:
#include <iostream>
#include <string>
#include <fstream>
class CardSearch
{
protected:
std::ifstream cardNumbers;
public:
CardSearch(std::string fileName)
{
cardNumbers.open(fileName, std::ios::in);
if (!cardNumbers.is_open())
{
std::cout << "Unable to open: " << fileName;
}
return;
}
std::string Find(std::string lastName, std::string firstName)
{
// Creating string variables to hold first and last name
// as well as card number. Also creating bools to decide whether
// or not the person has been found or if the last name is the only
// identifier for a found person
std::string lN;
std::string fN;
std::string creditNumber;
bool foundPerson = false;
// By using the seekg and tellg functions, we can find our place
// in the file and also calculate the amount of lines within the file
cardNumbers.seekg(0, std::ios::beg);
cardNumbers.clear();
std::streamsize first = cardNumbers.tellg();
cardNumbers.ignore(std::numeric_limits<std::streamsize>::max());
cardNumbers.clear();
std::streamsize last = cardNumbers.tellg();
cardNumbers.seekg(0, std::ios::beg);
std::streamsize lineNumbers = (last / 57);
std::streamsize middle;
while (first <= lineNumbers)
{
middle = (first + lineNumbers) / 2;
// middle * 57 takes us to the beginning of the correct line
cardNumbers.seekg(middle * 57, std::ios::beg);
cardNumbers.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
cardNumbers >> lN >> fN;
if (lN < lastName)
{
first = middle + 1;
}
else if (lN > lastName)
{
lineNumbers = middle - 1;
}
else
{
if (fN < firstName)
{
first = middle + 1;
}
else if (fN > firstName)
{
lineNumbers = middle - 1;
}
else if (fN == firstName)
{
foundPerson = true;
break;
}
}
}
if (foundPerson)
{
// When a person is found, we seek to the correct line position and
// offset by another 40 characters to receive the card number
cardNumbers.seekg((middle * 57) + 40, std::ios::beg);
std::cout << lN << ", " << fN << " ";
cardNumbers >> creditNumber;
return creditNumber;
}
return "Unable to find person.\n";
}
};
int main()
{
CardSearch CS("C:/Users/Rafael/Desktop/StolenNumbers.txt");
std::string S = CS.Find("Ampersand", "Gregorina");
std::cout << S;
std::cin.ignore();
std::cin.get();
return 0;
}
我可以检索列表中除第一条记录以外的所有记录.似乎seekg正在寻找正确的位置,但是cardNumbers没有读取正确的信息.当'middle'设置为0时,seekg应该搜索到第0行,(middle * 57),读入Ampersand Gregorina并进行比较.相反,它仍在阅读安德森·鲍勃(Anderson Bob).
I am able to retrieve all but the first record in the list. It seems as though the seekg is seeking to the correct position but cardNumbers is not reading the correct information. When 'middle' is set to 0, the seekg should seek to the 0th line, (middle * 57), read in Ampersand Gregorina and make a comparison. Instead, it remains reading Anderson Bob.
关于为什么会发生这种情况的任何想法吗?
Any ideas as to why this may be happening?
谢谢
推荐答案
使用诸如seekg
之类的功能时,总是最好在binary mode
中打开文件,而不是像现在的代码那样以文本模式打开文件.换句话说,您应该这样做:
When using functions such as seekg
, it is always best to open the file in binary mode
, not text mode as your code is doing now. In other words, you should be doing this:
cardNumbers.open(fileName, std::ios::in | std::ios::binary);
原因是,在文本模式下打开文件将允许行尾翻译.这样会呈现出诸如seekg
,tellg
之类的功能,充其量介于不稳定(或很幸运的工作)之间,充其量在最坏的情况下,对文本处理毫无用处.
The reason is that opening a file in text mode will allow end-of-line translations to be done. This renders functions such as seekg
, tellg
, etc. anything between unstable (or lucky to work) at best, and in the worst case, useless for text processing.
以二进制模式打开文件时,seekg
和其他功能家族按预期工作,因为没有行尾翻译.实际上,您将在指定的文件中查找字节偏移,而不会被行尾翻译抛弃.
When a file is opened in binary mode, the seekg
and other family of functions work as expected, since there are no end-of-line translations being done. You will actually seek to the byte offset in the file that you specify, and not be thrown off by end-of-line translations.
此外,执行此操作后,行中数据的长度不仅包括可见文本,还包括构成行尾序列的不可见字符.因此,您手动计算的57在二进制模式下将是不正确的-应该分别为58或59,具体取决于您使用的是Linux/Unix还是Windows.
Also, once you do this, the length of the data in the line includes not only the visible text, but also the invisible characters that make up the end-of-line sequence. So your hand calculation of 57 is not going to be correct in binary mode -- it should be 58 or 59, depending on whether you are using Linux / Unix, or Windows, respectively.
这篇关于文本文件二进制搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!