如何忽略“行尾"或“换行"逐字阅读文本文件时的字符? [英] How can I ignore the "end of line" or "new line" character when reading text files word by word?

查看:63
本文介绍了如何忽略“行尾"或“换行"逐字阅读文本文件时的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在逐字读取文本文件,并将每个字保存为数组中的元素.然后,我逐字打印出此数组.我知道这样做可以更有效,但这是为了赋值,我必须使用数组.

I am reading a text file word by word, and am saving each word as an element in an array. I am then printing out this array, word by word. I know this could be done more efficiently, but this is for an assignment and I have to use an array.

我正在对数组做更多的事情,例如计算重复的元素,删除某些元素等.我还成功地将文件转换为完全小写且没有标点符号.

I'm doing more with the array, such as counting repeated elements, removing certain elements, etc. I also have successfully converted the files to be entirely lowercase and without punctuation.

我有一个文本文件,如下所示:

I have a text file that looks like this:

beginning of file




more lines with some bizzare     spacing
some lines next to each other
while

others are farther apart
eof


这是我的一些代码,其中 itemsInArray 初始化为 0 ,并且包含一个称为 wordArray [(我的文件的适当长度)]的单词数组] :


Here is some of my code with itemsInArray initialized at 0 and an array of words refered to as wordArray[ (approriate length for my file ) ]:

ifstream infile;
infile.open(fileExample);

while (!infile.eof()) {

    string temp;
    getline(infile,temp,' ');  // Successfully reads words seperated by a single space
    
    
    if ((temp != "") && (temp != '\n') && (temp != " ") && (temp != "\n") && (temp != "\0") {
            wordArray[itemsInArray] = temp;
            itemsInArray++;
    }


问题:

我的代码将行尾字符保存为数组中的一项.在我的if语句中,我列出了试图区分行尾字符的所有方法,但是我没有运气.


The Problem:

My code is saving the end of line character as an item in my array. In my if statement, I've listed all of the ways I have tried to disclude the end of line character, but I've had no luck.

如何防止行尾字符另存为数组中的项目?

How can I prevent the end of line character from saving as an item in my array?

我尝试了在类似于此的线程上发现的其他一些方法,包括一些我无法使用的带有 * const char 的方法,以及遍历和删除该方法.换行符.我已经为此工作了几个小时,我不想重新发布相同的问题,并尝试了许多方法.

I've tried a few other methods I have found on threads similar to this, including something with a *const char that I couldn't make work, as well as iterating through and deleting the new line characters. I've been working on this for hours, I don't want to repost the same issue, and have tried many many methods.

推荐答案

std :: string 重载的标准>> 运算符已经使用空格作为字边界,可以大大简化您的程序.

The standard >> operator overloaded for std::string already uses white-space as word boundary so your program can be simplified a lot.

#include <iostream>
#include <string>
#include <vector>

int
main()
{
  std::vector<std::string> words {};
  {
    std::string tmp {};
    while (std::cin >> tmp)
      words.push_back(tmp);
  }
  for (const auto& word : words)
    std::cout << "'" << word << "'" << std::endl;
}

对于您显示的输入,将输出:

For the input you are showing, this will output:

'beginning'
'of'
'file'
'more'
'lines'
'with'
'some'
'bizzare'
'spacing'
'some'
'lines'
'next'
'to'
'each'
'other'
'while'
'others'
'are'
'farther'
'apart'
'eof'

这不是你想要的吗?

这篇关于如何忽略“行尾"或“换行"逐字阅读文本文件时的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆