拆分文本文件转换成用C字 [英] Splitting a text file into words in C

查看:153
本文介绍了拆分文本文件转换成用C字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2种类型的文本,我想将它们分割成单词。

I have 2 types of texts which I want to split them into words.

第一种类型的文本文件是空谈换行符分。

The first type of text file is just words divided by newline.

Milk
Work
Chair
...

第二个类型的文本文件是从一本书,其中只有空白的文本。
(无昏迷,问号等)

The second type of text file is a text from a book , which has only whitespace. (No comas,question marks etc.)

And then she tried to run 
but she was stunned by the view of 
...

你知道这是做的最好方法是什么?

Do you know which is the best way to do it ?

我尝试了以下2种方式,但它似乎我越来越细分。

I tried the following 2 ways but it seems I am getting segmentations.

对于第一类文本的使用:

For the first type of text I use:

while(fgets(line,sizeof(line),wordlist) != NULL)
{
    /* Checks Words |
    printf("%s",line);*/
    InsertWord(W,line);/*Function that inserts the word to a tree*/
}

和第二类型的文本我使用的:

And for the second type of text I use:

while(fgets(line,sizeof(line),out) != NULL)
{
    bp = line ;
    while(1)
    {
        cp = strtok(bp," ");
        bp = NULL ;

        if(cp == NULL)
            break;

        /*printf("Word by Word : %s \n",cp);*/
        CheckWord(Words, cp);/*Function that checks if the word from the book is the same with one in a tree */
    }
}

你能更好的建议或任何东西,如果我错了,这些人的纠正我?

Can you suggest anything better or correct me if am wrong on these one's ?

InsertWord是插入的话成树的功能。
当我用这个code:

InsertWord is a function that inserts words into a tree. When I used this code:

for (i = 0 ; i <=2 ; i++)
{
    if (i==0)
        InsertWord(W,"A");
    if (i==1)
        InsertWord(W,"B");
    if (i==2)
        InsertWord(W,"c");
}*/

树插入的话就好了,并将它们打印,也意味着我的树能正常工作,它的功能(它们是由我们的老师给定的,太)。
但是,当我试图做同样的那样:

The tree inserts the words just fine and prints them , too, which means my tree works fine and it's functions(they were given by our teachers ,too). But when I try to do the same like that :

char this_word[15];
while (fscanf(wordlist, "%14s", this_word) == 1) 
{
    printf("Latest word that was read: '%s'\n", this_word);
    InsertWord(W,this_word);
}

我收到错误从tree.So,我猜这是某种细分。
任何想法?

I am getting errors from tree.So, I guessed it was some kind of segmentation. Any ideas ?

推荐答案

您想从文件中读取,的与fgets()可能出现在脑海。

You want to read from a file, fgets() might come in mind.

您想通过分隔符(空格)分割成令牌,的strtok()应该记在心里。

You want to split into tokens by a delimiter (whitespace), strtok() should come in mind.

所以,你可以做到这一点是这样的:

So, you could do it like this:

#include <stdio.h>
#include <string.h>

int main(void)
{
   FILE * pFile;
   char mystring [100];
   char* pch;

   pFile = fopen ("text_newlines.txt" , "r");
   if (pFile == NULL) perror ("Error opening file");
   else {
     while ( fgets (mystring , 100 , pFile) != NULL )
       printf ("%s", mystring);
     fclose (pFile);
   }

   pFile = fopen ("text_wspaces.txt" , "r");
   if (pFile == NULL) perror ("Error opening file");
   else {
     while ( fgets (mystring , 100 , pFile) != NULL ) {
       printf ("%s", mystring);
       pch = strtok (mystring," ");
       while (pch != NULL)
       {
         printf ("%s\n",pch);
         pch = strtok (NULL, " ");
       }
     }
     fclose (pFile);
   }

   return 0;
}

输出:

linux25:/home/users/grad1459>./a.out
Milk
Work
Chair
And then she tried to run 
And
then
she
tried
to
run


but she was stunned by the view of
but
she
was
stunned
by
the
view
of
//newline here as well

这篇关于拆分文本文件转换成用C字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆