将数据字符串标记化为结构向量? [英] tokenizing a string of data into a vector of structs?

查看:145
本文介绍了将数据字符串标记化为结构向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有以下的数据字符串,它是通过TCP winsock连接接收的,并且想要做一个高级的tokenization到一个结构的向量,其中每个结构代表一个记录。

So I have the following string of data, which is being received through a TCP winsock connection, and would like to do an advanced tokenization, into a vector of structs, where each struct represents one record.

std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"

struct table_t
{
    std::string key;
    std::string first;
    std::string last;
    std::string rank;
    std::additional;
};

字符串中的每个记录都用回车符分隔。我尝试分割记录,但尚未分割字段:

Each record in the string is delimited by a carriage return. My attempt at splitting up the records, but not yet splitting up the fields:

    void tokenize(std::string& str, std::vector< string >records)
{
    // Skip delimiters at beginning.
    std::string::size_type lastPos = str.find_first_not_of("\n", 0);
    // Find first "non-delimiter".
    std::string::size_type pos     = str.find_first_of("\n", lastPos);
    while (std::string::npos != pos || std::string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        records.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of("\n", pos);
        // Find next "non-delimiter"
        pos = str.find_first_of("\n", lastPos);
    }
}

似乎完全不必重复所有的代码以进一步通过冒号(内部字段分隔符)将每个记录标记为结构体,并将每个结构体推入向量。我相信有一个更好的方法,或者这个设计本身是错误的。

It seems totally unnecessary to repeat all of that code again to further tokenize each record via the colon (internal field separator) into the struct and push each struct into a vector. I'm sure there is a better way of doing this, or perhaps the design is in itself wrong.

感谢您的帮助。

推荐答案

为了将字符串分解成记录,我使用istringstream,如果只有
,当我想从
a文件读取时更改。对于标记化,最明显的解决方案是boost :: regex,因此:

For breaking the string up into records, I'd use istringstream, if only because that will simplify the changes later when I want to read from a file. For tokenizing, the most obvious solution is boost::regex, so:

std::vector<table_t> parse( std::istream& input )
{
    std::vector<table_t> retval;
    std::string line;
    while ( std::getline( input, line ) ) {
        static boost::regex const pattern(
            "\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\)" );
        boost::smatch matched;
        if ( !regex_match( line, matched, pattern ) ) {
            //  Error handling...
        } else {
            retval.push_back(
                table_t( matched[1], matched[2], matched[3],
                         matched[4], matched[5] ) );
        }
    }
    return retval;
}

(我假设table_t的逻辑构造函数。
long传统在C的名字以_t结尾是typedef的,所以你的
可能更好找一些其他约定。)

(I've assumed the logical constructor for table_t. Also: there's a very long tradition in C that names ending in _t are typedef's, so you're probably better off finding some other convention.)

这篇关于将数据字符串标记化为结构向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆