要阅读超过50,000 txt文件,并在他们链表保存在C ++中 [英] want to read more than 50,000 txt files and save them in linked list in C++

查看:126
本文介绍了要阅读超过50,000 txt文件,并在他们链表保存在C ++中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

#include<iostream>
#include<windows.h>
#include<string>
#include<fstream>
using namespace std;
class linklist     //linked list class
{
    struct main_node;
    struct sub_node;

    struct main_node   // main node that only have head pointers in it
    {
        sub_node *head;
        main_node()
        {   head=NULL;  }
    };
    main_node array[26];
    struct sub_node
    {
        double frequency;
        string word;
        sub_node *next;
        sub_node()
        {   frequency=1;    word="";    next=NULL;  }
    };

public:
    void add_node(string phrase)
    {
        char alphabat1=phrase[0];
        if(isupper(alphabat1))
        {
            alphabat1=tolower(alphabat1);
        }
        if(!isalpha(alphabat1))
            return;

        sub_node*temp = new sub_node;
        temp->word = phrase;

        sub_node*current = array[alphabat1-97].head;

        if(current == NULL)
        array[alphabat1-97].head = temp;

        else
        {
            while(current -> next != NULL && phrase != current-> word)
            {   current= current->next; }

            if(current->word == phrase)
                current->frequency++;
            else
                current->next  = temp; //adding words to linklist
        }
    }

    void display()
    {
        for(int i=0;i<26;i++)
        {
        sub_node *temp=array[i].head;
        cout<<char(i+97)<<" -> ";
        while(temp!=NULL)
        {
            cout<<temp->word<<" ("<<temp->frequency<<")  ";
            temp=temp->next;
        }
        cout<<"\n";
        }
    }
void parsing_documents(char *path)
{
    char token[100];
    ifstream read;
    read.open(path);
    do
    {
        read>>token;    // parsing words
        add_node(token);    //sending words to linked list

    }
    while(!read.eof());
        read.clear();
        read.close();
}
void reading_directory()
{
 // code to read multiple files

   HANDLE          hFile;                // Handle to file
   WIN32_FIND_DATA FileInformation;      // File information
   char tempPattern[90];
   strcpy(tempPattern,"*.txt");
   hFile = ::FindFirstFile(tempPattern, &FileInformation);
   long count=0;
   if(hFile != INVALID_HANDLE_VALUE)
   {
        do
        {
            count++;
            cout<<"."<<count;
            this->parsing_documents( FileInformation.cFileName);
        }
        while(TRUE == ::FindNextFile(hFile, &FileInformation));
   } 
    ::FindClose(hFile);

}
};
void main()
{
    linklist member;
    member.reading_directory();
    member.display();
}

我在其中我要读超过50,000文本文件解析他们的话和他们在一个链表保存在一个有序的方式一个项目时,我已经在C code ++。它的工作非常有效,但我必须在这方面,它没有读取文件有时正确有时3000 4000我已经寻找了很多,但我不能成功找到我的错一个问题。 。这里是在C我的code ++
    如果任何机构帮我在这方面,我将非常感激。

I am working on a project in which I have to read more than 50,000 text files parse their words and save them in a linked list in a sorted manner , i have made the code in C++. it's working quite efficiently but I have one problem in this regard that it is not reading the files correctly sometimes 3000 sometimes 4000. I have searched for it a lot but i couldn't succeed to find my fault . . here is my code in C++ if any body help me in this regard i would be very thankful

推荐答案

!read.eof()仅检查文件结尾,而不是错误读取文件,例如作为网络安装的文件系统还没有准备好,磁盘错误,或缺少权限读取文件。您应该检查所有的失败,与而(读)它有一个重载运算符来检查你的一切。因此,如果文件失败,则停止尝试从中读取数据。您也应该检查前的状态的 的尝试从文件中读取。因此,而(读){...} 是preferable到DO / while循环。在循环之后,你可能会发出警告或错误的,你的用户没有达到文件的结束!read.eof(),使他们可以调查特定的文件。

!read.eof() only checks for end of file, not errors reading the file, such as a networked mounted file system not being ready, disk error, or lack of permission to read the file. You should check for all failures, with while(read) which has an overloaded operator to check everything for you. So, if the file fails, you stop trying to read from it. You should also check the status before trying to read from the file. As such, while(read) { ... } is preferable to the do/while loop. After the loop, you might issue a warning or error to the user of you did not reach the end of file !read.eof() so they can investigate that specific file.

尽量避免的char * 的char [] 尽可能的,这是非常容易出错。你有一个char [100]。如果字符串超过100个字符会发生什么? 阅读&GT;&GT;令牌可覆盖栈 - 如损坏 ifstream的读

Try to avoid char * and char [] as much possible as this is highly error prone. You have a char[100]. What happens if the string is longer than 100 characters? read >> token may overwrite the stack -- such as to damage the ifstream read.

考虑使用的std ::列表&LT; sub_node&GT; ,以避免重新发明,重新调试轮?你将不再需要下一个指针的std ::名单已经这样做了你。这将使远低于code调试。

Consider using std::list<sub_node> to avoid having to re-invent and re-debug the wheel? You would no longer need the next pointer as std::list already does that for you. This would leave far less code to debug.

这篇关于要阅读超过50,000 txt文件,并在他们链表保存在C ++中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆