getline函数的多个定界符,C ++ [英] Multiple delimiters for getline function, c++

查看:193
本文介绍了getline函数的多个定界符,C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想逐字阅读文本,以一种简单的方式避免使用任何非字母数字字符. 从带有空格和'\ n'的文本演变"之后,如果还存在,",.",我需要解决该问题.例如. 第一种情况只是通过将getline与定界符''一起解决. 我想知道是否有一种方法可以将getline与多个定界符一起使用,甚至可以与某种正则表达式(例如'.'|' '|','|'\n')一起使用.

I want to read a text word by word, avoiding any non-alphanumeric characters in a simple way. After 'evolving' from text with white-spaces and '\n', I need to solve that problem in case there are also ',', '.' for example. The first case was simply solved by using getline with delimiter ' '. I wondered if there's a way to use getline with multiple delimiters, or even with some kind of regular expression (for example '.'|' '|','|'\n' ).

据我所知,getline的工作方式是从输入流中读取字符,直到到达'\ n'或delimiter字符为止.我的第一个猜测是,为它提供多个定界符非常简单,但是我发现事实并非如此.

As far as I know, getline works in a way that it reads characters from the input stream, until either '\n' or delimiter character reached. My first guess was that it is quite simple to provide it with multiple delimiters, but I found out that it's not.

为澄清起见.解决方案不是我想要的任何C风格(例如strtok,在我看来这很丑陋)或算法类型的解决方案.提出一个简单的算法来解决该问题并实现它是很容易的.我正在寻找一种更优雅的解决方案,或者至少是一种解释,为什么我们不能使用getline函数来处理它,因为除非我完全被误解,否则应该可以接受多个分隔符.

just as a clarification. Any C style (strtok for example, which is for my opinion very ugly) or algorithmic type of solution is not what I'm looking for. It is fairly easy to come up with a simple algorithm to solve that problem, and implement it. I'm looking for a more elegant solution, or at least an explanation for why can't we handle it with the getline function, since unless I completely misunderstood, should be able to somehow accept more than one delimiter.

推荐答案

有好消息也有坏消息.好消息是您可以执行此操作.

There's good news and bad news. The good news is that you can do this.

坏消息是,这样做相当round回,有些人觉得它丑陋而令人讨厌.

The bad news is that doing it is fairly roundabout, and some people find it downright ugly and nasty.

要做到这一点,首先要观察两个事实:

To do it, you start by observing two facts:

  1. 普通的字符串提取器使用空格来分隔单词".
  2. 在流的语言环境中定义了构成空白的内容.

将它们放在一起,答案就变得很明显(如果是circuit回的):要定义多个定界符,我们定义一个语言环境,它可以让我们指定应将哪些字符视为定界符(即空白):

Putting those together, the answer becomes fairly obvious (if circuitous): to define multiple delimiters, we define a locale that allows us to specify what characters should be treated as delimiters (i.e., white space):

struct word_reader : std::ctype<char> {
    word_reader(std::string const &delims) : std::ctype<char>(get_table(delims)) {}
    static std::ctype_base::mask const* get_table(std::string const &delims) {
        static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());

        for (char ch : delims)
            rc[ch] = std::ctype_base::space;
        return &rc[0];
    }
};

然后,我们需要告诉流使用该语言环境(嗯,具有该ctype构面的语言环境),传递要用作定界符的字符,然后从流中提取单词:

Then we need to tell the stream to use that locale (well, a locale with that ctype facet), passing the characters we want used as delimiters, and then extract words from the stream:

int main() {
    std::istringstream in("word1, word2. word3,word4");

    // create a ctype facet specifying delimiters, and tell stream to use it:
    in.imbue(std::locale(std::locale(), new word_reader(" ,.\n")));
    std::string word;

    // read words from the stream. Note we just use `>>`, not `std::getline`:
    while (in >> word)
        std::cout << word << "\n";
}

结果是(我希望)您想要的:提取每个单词而没有我们所说的标点是空白".

The result is what (I hope) you want: extracting each word without the punctuation we said was "white space".

word1
word2
word3
word4

这篇关于getline函数的多个定界符,C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆