如何将C ++输入流定界符包含到结果标记中 [英] how to include C++ input stream delimiters into result tokens
问题描述
C ++标准库支持几种为输入流引入自定义定界符的方法,据我了解,推荐的方法是使用新的语言环境和ctype对象:
C++ standard library supports a few ways to introduce custom delimiters for input streams, as I understand recommended way is a using new locale and ctype objects:
第一种方法(继承自ctype专门化):
first way (inherited from ctype specialization) :
struct csv_whitespace : std::ctype<char>
{
bool do_is(mask m, char_type c) const
{
if ((m & space) && c == ' ') {
return false; // space will NOT be classified as whitespace
}
if ((m & space) && c == ',') {
return true; // comma will be classified as whitespace
}
return ctype::do_is(m, c); // leave the rest to the parent class
}
};
// for cin stream :
cin.imbue(std::locale(cin.getloc(), new csv_whitespace));
第二种方式(参数化的ctype专长):
second way (parameterized ctype specialization):
// getting existing table for ctype<char> specialization
const auto temp = std::ctype<char>::classic_table();
// create a copy of the table in vector container
std::vector<std::ctype<char>::mask> new_table_vector(temp, temp + std::ctype<char>::table_size);
// add/remove stream separators using bitwise arithmetic.
// use char-based indices because ascii codes here are equal to indices
new_table_vector[' '] ^= ctype_base::space;
new_table_vector['\t'] &= ~(ctype_base::space | ctype_base::cntrl);
new_table_vector[':'] |= ctype_base::space;
// A ctype initialized with new_table_vector would delimit on '\n' and ':' but not ' ' or '\t'.
// ....
// usage of the mask above.
cin.imbue(locale(cin.getloc(), new std::ctype<char>(new_table_vector.data())));
但是有没有办法在结果标记中包含定界符?例如
But is there way to include a delimiters into a resulted tokens? e.g.
aaa& bbb * ccc%ddd& eee
aaa&bbb*ccc%ddd&eee
其中
& *%
& * %
是使用上述方法之一定义的定界符. 结果字符串将是:
are delimiters defined using one of methods above. and result strings would be:
aaa
& bbb
* ccc
%ddd
& eee
所以您看到-结果字符串中包含定界符. 这是一个问题-如何为此配置(可能吗?)输入流?
so you see - that delimiters are included into result strings. this is a question - how to configure (and is it possible?) input stream for that?
谢谢
推荐答案
简短的回答是否,istream
没有提供提取和保留分隔符的先验方法. istream
提供了以下提取方法:
The short answer is no, istream
s do not provide an inate method for extracting and retaining separators. istream
s provide the following extraction methods:
-
operator>>
-丢弃定界符 -
get
-不提取 a 分隔符 -
getline
-丢弃 a 分隔符 -
read
-不遵守分隔符 -
readsome
-不遵守分隔符
operator>>
- discards the delimiterget
- does not extract a delimiter at allgetline
- discard a delimiterread
- doesn't respect delimitersreadsome
- doesn't respect delimiters
但是,假设您将istream
插入到string foo
中,那么可以使用这样的正则表达式来标记化:
However, let's assume that you slurpped your istream
into string foo
, then you could use a regex like this to tokenize:
((?:^|[&*%])[^&*%]*)
这可以与 regex_token_iterator
一起使用,如下所示:
This could be used with a regex_token_iterator
like this:
const regex re{ "((?:^|[&*%])[^&*%]*)" };
const vector<string> bar{ sregex_token_iterator(cbegin(foo), cend(foo), re, 1), sregex_token_iterator() };
这篇关于如何将C ++输入流定界符包含到结果标记中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!