通过读字的文本文件单词 [英] Reading a text file word by word

查看:111
本文介绍了通过读字的文本文件单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只包含小写字母和除了空间没有标点符号的文本文件。我想知道的成炭读取文件字符,即如果下一个字符是一个空间,它表示一个字的结束和一个新字开始的方式,最好的方式。即,因为每个字被读出它被添加到一个字符串,如果该下一个字符的空间,则该字被传递给另一个方法和重置,直到读取器到达文件的结尾。

I have a text file containing just lowercase letters and no punctuation except for spaces. I would like to know the best way of reading the file char by char, in a way that if the next char is a space, it signifies the end of one word and the start of a new word. i.e. as each character is read it is added to a string, if the next char is space, then the word is passed to another method and reset until the reader reaches the end of the file.

我试图用StringReader要做到这一点,是这样的:

I'm trying to do this with a StringReader, something like this:

public String GetNextWord(StringReader reader)
{
    String word = "";
    char c;
    do
    {
        c = Convert.ToChar(reader.Read());
        word += c;
    } while (c != ' ');
    return word;
}

和把GetNextWord方法在while循环直到该文件的结束。请问这种做法是有意义还是有实现这一目标的更好的方法。

and put the GetNextWord method in a while loop till the end of the file. Does this approach make sense or are there better ways of achieving this?

推荐答案

有这样做的更好的方法:<? code> string.Split():如果你在阅读整个字符串,C#可以自动分割它在每个空间:

There is a much better way of doing this: string.Split(): if you read the entire string in, C# can automatically split it on every space:

string[] words = reader.ReadToEnd().Split(' ');



数组现在包含所有的在该文件中的话,你可以做任何你想要他们。

The words array now contains all of the words in the file and you can do whatever you want with them.

此外,您可能希望调查 File.ReadAllText System.IO 命名空间的方法 - 它可以让你的生活更容易为文件导入到文本

Additionally, you may want to investigate the File.ReadAllText method in the System.IO namespace - it may make your life much easier for file imports to text.

编辑:我想这假定您的文件不是abhorrently大;只要整个事情可以合理地读入内存,这将工作最容易。如果你有千兆字节的数据在读取,你可能会想回避离开这。我建议,虽然使用这种方法,如果可能的话:它可以更好地利用您在您的处置框架

I guess this assumes that your file is not abhorrently large; as long as the entire thing can be reasonably read into memory, this will work most easily. If you have gigabytes of data to read in, you'll probably want to shy away from this. I'd suggest using this approach though, if possible: it makes better use of the framework that you have at your disposal.

这篇关于通过读字的文本文件单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆