将数据字符串标记化为结构向量? [英] tokenizing a string of data into a vector of structs?
问题描述
所以我有以下的数据字符串,它是通过TCP winsock连接接收的,并且想要做一个高级的tokenization到一个结构的向量,其中每个结构代表一个记录。
So I have the following string of data, which is being received through a TCP winsock connection, and would like to do an advanced tokenization, into a vector of structs, where each struct represents one record.
std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"
struct table_t
{
std::string key;
std::string first;
std::string last;
std::string rank;
std::additional;
};
字符串中的每个记录都用回车符分隔。我尝试分割记录,但尚未分割字段:
Each record in the string is delimited by a carriage return. My attempt at splitting up the records, but not yet splitting up the fields:
void tokenize(std::string& str, std::vector< string >records)
{
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of("\n", 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of("\n", lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
records.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of("\n", pos);
// Find next "non-delimiter"
pos = str.find_first_of("\n", lastPos);
}
}
似乎完全不必重复所有的代码以进一步通过冒号(内部字段分隔符)将每个记录标记为结构体,并将每个结构体推入向量。我相信有一个更好的方法,或者这个设计本身是错误的。
It seems totally unnecessary to repeat all of that code again to further tokenize each record via the colon (internal field separator) into the struct and push each struct into a vector. I'm sure there is a better way of doing this, or perhaps the design is in itself wrong.
感谢您的帮助。
推荐答案
为了将字符串分解成记录,我使用istringstream,如果只有
,当我想从
a文件读取时更改。对于标记化,最明显的解决方案是boost :: regex,因此:
For breaking the string up into records, I'd use istringstream, if only because that will simplify the changes later when I want to read from a file. For tokenizing, the most obvious solution is boost::regex, so:
std::vector<table_t> parse( std::istream& input )
{
std::vector<table_t> retval;
std::string line;
while ( std::getline( input, line ) ) {
static boost::regex const pattern(
"\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\)" );
boost::smatch matched;
if ( !regex_match( line, matched, pattern ) ) {
// Error handling...
} else {
retval.push_back(
table_t( matched[1], matched[2], matched[3],
matched[4], matched[5] ) );
}
}
return retval;
}
(我假设table_t的逻辑构造函数。
long传统在C的名字以_t结尾是typedef的,所以你的
可能更好找一些其他约定。)
(I've assumed the logical constructor for table_t. Also: there's a very long tradition in C that names ending in _t are typedef's, so you're probably better off finding some other convention.)
这篇关于将数据字符串标记化为结构向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!