使用多线程的单词计数程序:大文件大小 [英] Word counting program using multi threading: Large file size

查看:71
本文介绍了使用多线程的单词计数程序:大文件大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个程序,该程序将对大文件中的单词进行计数.我正在做多线程.但是我的程序出现了段错误,而我只是停留在这里.我正在寻找导师的任何建议:代码如下:

I am trying to write a program which will count words in a large file. I am doing multi threading. But my program gives segmentation fault and I am just stuck here. I am looking for any advice from mentors: The code is given below:

输入:file name 输出:分段故障

INPUT: file name Output: Segmentation Fault

代码为:

   #include <stdio.h>
#include <pthread.h>
#include <stdlib.h>


struct thread_data{
    FILE *fp;
    long int offset;
    int start;
    int blockSize;
};

int words=0;  

void *countFrequency(void* data){

    struct thread_data* td=data;
    char *buffer = malloc(td->blockSize);

    int i,c;
    i=0;c=0;
    enum states { WHITESPACE, WORD };
    int state = WHITESPACE;

    fseek(td->fp, td->offset, td->start);

        char last = ' '; 
        while ((fread(buffer, td->blockSize, 1, td->fp))==1){

            if ( buffer[0]== ' ' || buffer[0] == '\t'  ){
            state = WHITESPACE;
            }
            else if (buffer[0]=='\n'){
            //newLine++;
                state = WHITESPACE;
            }
            else {
                if ( state == WHITESPACE ){
                    words++;
                }
                state = WORD;
            }
            last = buffer[0];
    }
    free(buffer);

    pthread_exit(NULL);

    return NULL;
}

int main(int argc, char **argv){

    int nthreads, x, id, blockSize,len;
    //void *state;
    FILE *fp;
    pthread_t *threads;

    struct thread_data data[nthreads];

    if (argc < 2){
        fprintf(stderr, "Usage: ./a.out <file_path>");
        exit(-1);
    }

    if((fp=fopen(argv[1],"r"))==NULL){
        printf("Error opening file");
        exit(-1);
    }  

    printf("Enter the number of threads: ");
    scanf("%d",&nthreads);
    threads = malloc(nthreads*sizeof(pthread_t));

    fseek(fp, 0, SEEK_END);
    len = ftell(fp);  
    printf("len= %d\n",len);

    blockSize=(len+nthreads-1)/nthreads;
    printf("size= %d\n",blockSize);

    for(id = 0; id < nthreads; id++){

        data[id].fp=fp;
        data[id].offset = blockSize;
        data[id].start = id*blockSize+1;

        }
        //LAST THREAD
        data[nthreads-1].start=(nthreads-1)*blockSize+1;

        for(id = 0; id < nthreads; id++)
            pthread_create(&threads[id], NULL, &countFrequency,&data[id]);

    for(id = 0; id < nthreads; id++)
        pthread_join(threads[id],NULL);

    fclose(fp);
    //free(threads);

    //pthread_exit(NULL);

    printf("%d\n",words); 
    return 0;  
}

推荐答案

类型转换不能修复错误的代码-它只能掩盖它或使其变得更加错误.让我们看看这些错误:

Typecasting does not fix wrong code - it only disguises it or makes it even more wrong. Let's look at those errors:

struct thread_data* td=(struct thread_data)data; /* wrong */

您不能将struct thread_data *强制转换为struct thread_data,也不能将struct thread_data分配给struct thread_data *.错误的不必要的强制转换是导致错误的唯一原因.

You can't cast a struct thread_data * to a struct thread_data, neither can you assign a struct thread_data to a struct thread_data *. The incorrect and unnecessary cast is the sole cause of the error.

x = pthread_create(&threads[id], NULL, &countFrequency, (void *)data); /* wrong */

其次,也不能将struct thread_data强制转换为void *-您需要一个实际的指针,例如data address :

Secondly, nor can you cast a struct thread_data to a void * - you need an actual pointer, like the address of data:

x = pthread_create(&threads[id], NULL, &countFrequency, &data);

也不强制转换,因为指向数据类型的指针自然转换为void *.当然,由于只有data的一个副本,因此所有线程都将共享它,并且所有工作都可以写入最后的值.这样做进展不顺利-每个线程需要一个struct thread_data.

No cast, either, because pointers to data types convert to void * naturally. Of course, since there's only one copy of data all the threads are going to share it, and all work on whatever the last values written to it were. That's not going to go well - you'll want one struct thread_data per thread.

第三,这些警告告诉您您的线程函数签名错误:

Thirdly, those warnings are telling you your thread function has the wrong signature:

void *countFrequency(struct thread_data *data) /* wrong */

结合第一点,使所有类型正确,并且再次不需要强制转换.

Combined with the first point, get all the types correct and yet again no casts are needed.

void *countFrequency(void *data) {
    struct thread_data* td = data;

这篇关于使用多线程的单词计数程序:大文件大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆