使用 std::regex 拆分一行并丢弃空元素 [英] Split a line using std::regex and discard empty elements
问题描述
我需要根据两个分隔符拆分一行:' '
和 ;
.
I need to split a line based on two separators: ' '
and ;
.
举例:
input : " abc ; def hij klm "
output: {"abc","def","hij","klm"}
如何修复下面的函数以丢弃第一个空元素?
How can I fix the function below to discard the first empty element?
std::vector<std::string> Split(std::string const& line) {
std::regex seps("[ ;]+");
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1);
return std::vector<std::string>(rit, std::sregex_token_iterator());
}
// input : " abc ; def hij klm "
// output: {"","abc","def","hij","klm"}
下面是一个完整的编译示例:
Below a complete sample that compiles:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
std::vector<std::string> Split(std::string const& line) {
std::regex seps("[ ;]+");
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1);
return std::vector<std::string>(rit, std::sregex_token_iterator());
}
int main()
{
std::string line = " abc ; def hij klm ";
std::cout << "input: \"" << line << "\"" << std::endl;
auto collection = Split(line);
std::cout << "output: {";
auto bComma = false;
for (auto oneField : collection)
{
std::cout << (bComma ? "," : "") << "\"" << oneField << "\"";
bComma = true;
}
std::cout << "} " << std::endl;
}
推荐答案
除了目前其他问题中提到的内容之外,我还看到了一些可能性.第一种方法是在构建向量时使用 std::remove_copy_if
:
I can see a couple possibilities beyond what's been mentioned in the other questions so far. The first would be to use std::remove_copy_if
when building your vector:
// regex stuff here
std::vector<std::string> tokens;
std::remove_copy_if(rit, std::sregex_token_iterator(),
std::back_inserter(tokens),
[](std::string const &s) { return s.empty(); });
另一种可能性是创建一个对字符进行适当分类的语言环境,然后从那里读取:
Another possibility would be to create a locale that classified characters appropriately, and just read from there:
struct reader: std::ctype<char> {
reader(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc[' '] = std::ctype_base::space;
rc[';'] = std::ctype_base::space;
// at a guess, newlines are probably still separators too:
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
一旦我们有了这个,我们告诉流在读取(或写入)流时使用该语言环境:
Once we have this, we tell the stream to use that locale when reading from (or writing to) the stream:
std::stringstream input(" abc ; def hij klm ");
input.imbue(std::locale(std::locale(), new reader));
然后我们可能想清理代码以仅在之间标记插入逗号,而不是在每个标记之后插入.幸运的是,我前段时间写了一些代码来相当巧妙地处理这个问题.使用它,我们可以相当简单地将标记从上面的输入复制到标准输出:
Then we probably want to clean up the code for inserting commas only between tokens, rather than after every token. Fortunately, I wrote some code to handle that fairly neatly some time ago. Using it, we can copy tokens from the input above to standard output fairly simply:
std::cout << "{ ";
std::copy(std::istream_iterator<std::string>(input), {},
infix_ostream_iterator<std::string>(std::cout, ", "));
std::cout << " }";
结果:{ abc, def, hij, klm }",正如您所期望/希望的那样——没有任何额外的麻烦来弥补它开始做错误的事情.
Result: "{ abc, def, hij, klm }", exactly as you'd expect/hope for--without any extra kludges to make up for its starting out doing the wrong thing.
这篇关于使用 std::regex 拆分一行并丢弃空元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!