如何阅读tsv文件? [英] How to read tsv file?

查看:102
本文介绍了如何阅读tsv文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我有一个标签分隔值表,如下所示:

header1 header2 header3

13.455 55.3一个字符串

4.55 5.66另一个字符串


我想把这个人加载到矢量矢量中,因为我不知道如何

可能很长。我想我必须有一个字符串向量的向量,然后

然后提取双打(?):

std :: vector< std :: vector< std ::串GT; > m_data_vec;


我开始使用这个骨架函数,但我不知道如何解析选项卡和换行符的

行,并将元素填入向量。是否更好地阅读整行然后解析它?
?我可以动态解析它吗?

怎么样?


void MyClass :: ReadTSV(const * filename)

{

使用命名空间std;


ifstream infile(文件名);

if(!infile){

cout<< 无法加载文件 <<结束;

}


//现在怎么样?

}


谢谢,

布莱恩

解决方案

" BCC" < a@b.c>写了...

我有一个标签分隔值表,如下所示:
header1 header2 header3
13.455 55.3 A string
4.55 5.66另一个字符串
<我希望将这个人加载到矢量矢量中,因为我不知道它可能有多长。我想我必须有一个字符串向量的矢量,
然后提取双打(?):
std :: vector< std :: vector< std :: string> > m_data_vec;

我开始使用这个骨架函数,但我不知道如何解析
选项卡和换行符的行,并将元素填充到向量中。是否更好地阅读整行然后解析它?


哦,好多了......

我可以动态解析吗?


我不知道。你能吗?

怎么样?

void MyClass :: ReadTSV(const * filename)
{
使用命名空间std;

ifstream infile(filename);
if(!infile){
cout<< 无法加载文件 << endl;
}

//现在怎样?


如果您知道预期的字段数,可以使用get(...,''\ t'')N-1

次然后得到(......,''\ n''),然后一次又一次。


更容易获得一个个字符并注意''\ t ''和''\ n''。但是

我仍​​然会做获取整行,然后解析它。东西。

}




V




BCC < a@b.c>在留言中写道

news:p1 **************** @ newssvr29.news.prodigy.com ...

你好,

我有一个标签分隔值表如下:
header1 header2 header3
13.455 55.3 A string
4.55 5.66另一个字符串

我想把这个人加载到矢量矢量中,因为我不知道它可能是多久。我想我必须有一个字符串向量的向量,然后
然后提取双打(?):
std :: vector< std :: vector< std :: string> > m_data_vec;

我开始使用这个骨架函数,但我不知道如何解析选项卡和换行符的
行,并将元素填充到向量中。是否更好地阅读整行然后解析它?我可以动态解析吗?
如何?

void MyClass :: ReadTSV(const * filename)
{
使用命名空间std;
<如果(!infile){
cout< 无法加载文件 << endl;
}
//现在怎么样?
}



这可能会给你一个基本的想法。

我没有测试过它。也没有检查错误等。


<未完成的代码>


#include< fstream>

#include< string>

#include< vector>

using namespace std;


void ReadTSV(const char * filename)

{

using namespace std;


ifstream infile(filename);

if (!infile){

cout<< 无法加载文件 <<结束;

}

string str;


vector< vector< string> > vvStr;

vector< string> vStr;

int pos1,pos2;

while(getline(infile,str))

{

pos1 = 0;

while((pos2 = str.find(''\t''))!= string :: npos)

{

vStr.push_back(str.substr(pos1,pos2));

pos1 = pos2 ++;

}

vStr.push_back( str.substr(pos1,string :: npos));

vvStr.push_back(vStr);

}


}


< / UNTESTED CODE>


祝福,

Sharad


" BCC" < a@b.c>在留言中写道

news:p1 **************** @ newssvr29.news.prodigy.com ...

你好,

我有一个标签分隔值表如下:
header1 header2 header3
13.455 55.3 A string
4.55 5.66另一个字符串

我想把这个人加载到矢量向量中,因为我不知道它可能有多长时间。我想我必须有一个
字符串向量的向量,然后再提取双打(?):
std :: vector< std :: vector< std :: string> > m_data_vec;

我开始使用这个骨架函数,但是我不确定要对
解析制表符和换行符的行,并将元素填充到向量中。
读取整行然后解析它会更好吗?我能用飞机上的
解析它吗?怎么样?




这里是我前段时间编写的一些代码,用于分割

字符的序列并将它们添加到列表中。我使用Visual

C ++已经使用了很多。我不保证它的便携性或效率,但我看起来一般都好。


用法:


struct is_tab {

bool operator(char c)const {return c ==''\t''; }

};


//使用制表符作为分隔符进行拆分,

//将段添加到末尾向量。

string s;

vector< string> vec;

split(s.begin(),s.end(),back_inserter(vec),is_tab(),false);


在这里你可以使用任何输入迭代器作为第一个和第二个
参数;特别是,你应该可以使用istream_iterators

或istreambuf_iterators。


Jonathan

-------- -------------

//

//文件名:split.h

//

//描述:包含用于将字符串

拆分为

//列表的模板函数。

//

//作者:Jonathan Turkanis

//

//版权所有:Jonathan Turkanis,2002年7月29日。请参阅Readme.txt了解< br $>
//许可证信息。

//

#ifndef UT_SPLIT_H_INCLUDED

#define UT_SPLIT_H_INCLUDED


#include< iterator>

#include< locale>

#include< string>

#include< boost / bind.hpp>

#include< boost / ref.hpp>


namespace Utility {
< br $>
//

//函数名称:split。

//

//描述:拆分给定的字符串组件。

//

//模板参数:

// InIt - 具有任何值类型Elem的输入迭代器类型。

// OutIt - 输出迭代器类型,其值类型等于

// std :: basic_string< Elem> ;.

// Pred - 具有参数类型Elem的谓词。

//参数:

//第一个 - 输入序列的开头。

// last - 输入序列的结尾。

// dest - 接收生成列表中的术语。

// sep - 确定分割输入序列的位置。

// coalesce - 如果满足

sep

//的连续元素序列应视为一个。默认为true。

//

模板<类InIt,类OutIt,类Pred>

void split(InIt first,InIt last, OutIt dest,Pred sep,bool coalesce

= true);


//

//函数名称:split_by_whitespace。< br $>
//

//说明:将给定的字符串拆分为组件。

//

//模板参数:

// InIt - 具有任何值类型Elem的输入迭代器类型。

// OutIt - 值类型等于
$ b的输出迭代器类型$ b // std :: basic_string< Elem> ;.

// Pred - 参数类型为Elem的谓词。

//参数:

// first - 输入序列的开始。

// last - 输入序列的结尾。

// dest - 接收生成列表中的术语。

//

模板< class InIt,类OutIt>

void split_by_whitespace(InIt first,InIt last,OutIt dest)

{

使用命名空间std;

typedef iterator_traits< InIt> :: value_type char_type;

locale loc;

split(first ,last,dest,boost :: bind(isspace< char_type>,_1,

boost :: ref(loc)));

}


模板<类InIt,类OutIt,类Pred>

void split(InIt first,InIt last,OutIt dest,Pred sep,bool coalesce)

{

using namespace std;

typedef iterator_traits< InIt> :: value_type char_type;

typedef basic_string< char_type> string_type;


bool prev = true; //如果prev char是分隔符,则为真。

string_type term;

while(first!= last){

char_type c = * first ++ ;

bool is_sep = sep(c);

if(is_sep&&(!coalesce || coalesce&&!prev)){

* dest ++ = term;

term.clear();

}

if(!is_sep)

term + = c;

prev = is_sep;

}

if(!term.empty()&& !coalesce || coalesce&&!prev)

* dest ++ = term;

}

}

#endif // #ifndef UT_SPLIT_H_INCLUDED


Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how
long it may be. I think I have to have a vector of vectors of strings, and
then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the
line for tabs and newlines, and stuff the elements into the vector. Is it
better to read in the whole line then parse it? Can I parse it on the fly?
How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
}

Thanks,
Bryan

解决方案

"BCC" <a@b.c> wrote...

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how
long it may be. I think I have to have a vector of vectors of strings, and then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the line for tabs and newlines, and stuff the elements into the vector. Is it
better to read in the whole line then parse it?
Oh, so much better...
Can I parse it on the fly?
I don''t know. Can you?
How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
If you know how many fields to expect, you could use get( ... , ''\t'') N-1
times and then get( ... , ''\n'') and then again and again.

Easier still to get one by one character and watch for ''\t'' and ''\n''. But
I would still do the "get the whole line and then parse it" thing.
}



V



"BCC" <a@b.c> wrote in message
news:p1****************@newssvr29.news.prodigy.com ...

Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how
long it may be. I think I have to have a vector of vectors of strings, and
then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure how to parse the
line for tabs and newlines, and stuff the elements into the vector. Is it
better to read in the whole line then parse it? Can I parse it on the fly?
How?

void MyClass::ReadTSV(const* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}

// Now what?
}


May be this gives you the basic idea.
I haven''t tested it. Also no checks for errors etc.

<UNTESTED CODE>

#include <fstream>
#include <string>
#include <vector>
using namespace std;

void ReadTSV(const char* filename)
{
using namespace std;

ifstream infile(filename);
if (!infile) {
cout << "unable to load file" << endl;
}
string str;

vector<vector<string> > vvStr;
vector<string> vStr;
int pos1, pos2;
while (getline(infile, str))
{
pos1 = 0;
while((pos2 = str.find(''\t''))!= string::npos)
{
vStr.push_back(str.substr(pos1, pos2));
pos1 = pos2++;
}
vStr.push_back(str.substr(pos1, string::npos));
vvStr.push_back(vStr);
}

}

</UNTESTED CODE>

Best wishes,
Sharad


"BCC" <a@b.c> wrote in message
news:p1****************@newssvr29.news.prodigy.com ...

Hi,

I have a tab separated value table like this:
header1 header2 header3
13.455 55.3 A string
4.55 5.66 Another string

I want to load this guy into a vector of vectors, since I do not know how long it may be. I think I have to have a vector of vectors of strings, and then extract the doubles later(?):
std::vector<std::vector<std::string> > m_data_vec;

I started off with this skeletal function, but Im not sure ho to parse the line for tabs and newlines, and stuff the elements into the vector. Is it better to read in the whole line then parse it? Can I parse it on the fly? How?



Here''s some code I wrote some time ago for splitting sequences of
characters and adding them to lists. I have used it a lot with Visual
C++. I don''''t guarantee its portability or efficiency, but I looks
generally okay.

Usage:

struct is_tab {
bool operator(char c) const { return c == ''\t''; }
};

// Split s using tab as a separator character,
// adding segments to the end of a vector.
string s;
vector<string> vec;
split(s.begin(), s.end(), back_inserter(vec), is_tab(), false);

Here you could use any input iterators for the first and second
arguments; in particular, you should be able to use istream_iterators
or istreambuf_iterators.

Jonathan
---------------------
//
// File name: split.h
//
// Descriptions: Contains template functions for splitting a string
into
// a list.
//
// Author: Jonathan Turkanis
//
// Copyright: Jonathan Turkanis, July 29, 2002. See Readme.txt for
// license information.
//

#ifndef UT_SPLIT_H_INCLUDED
#define UT_SPLIT_H_INCLUDED

#include <iterator>
#include <locale>
#include <string>
#include <boost/bind.hpp>
#include <boost/ref.hpp>

namespace Utility {

//
// Function name: split.
//
// Description: Splits the given string into components.
//
// Template paramters:
// InIt - An input iterator type with any value type Elem.
// OutIt - An output iterator type with value type equal to
// std::basic_string<Elem>.
// Pred - A predicate with argument type Elem.
// Parameters:
// first - The beginning of the input sequence.
// last - The end of the input sequence.
// dest - Receives the terms in the generated list.
// sep - Determines where to split the input sequence.
// coalesce - true if sequences of consecutive elements satisfying
sep
// should be treated as one. Defaults to true.
//
template<class InIt, class OutIt, class Pred>
void split(InIt first, InIt last, OutIt dest, Pred sep, bool coalesce
= true);

//
// Function name: split_by_whitespace.
//
// Description: Splits the given string into components.
//
// Template paramters:
// InIt - An input iterator type with any value type Elem.
// OutIt - An output iterator type with value type equal to
// std::basic_string<Elem>.
// Pred - A predicate with argument type Elem.
// Parameters:
// first - The begiining of the input sequence.
// last - The end of the input sequence.
// dest - Receives the terms in the generated list.
//
template<class InIt, class OutIt>
void split_by_whitespace(InIt first, InIt last, OutIt dest)
{
using namespace std;
typedef iterator_traits<InIt>::value_type char_type;
locale loc;
split(first, last, dest, boost::bind(isspace<char_type>, _1,
boost::ref(loc)));
}

template<class InIt, class OutIt, class Pred>
void split(InIt first, InIt last, OutIt dest, Pred sep, bool coalesce)
{
using namespace std;
typedef iterator_traits<InIt>::value_type char_type;
typedef basic_string<char_type> string_type;

bool prev = true; // True if prev char was a separator.
string_type term;
while (first != last) {
char_type c = *first++;
bool is_sep = sep(c);
if (is_sep && (!coalesce || coalesce && !prev)) {
*dest++ = term;
term.clear();
}
if (!is_sep)
term += c;
prev = is_sep;
}
if (!term.empty() && !coalesce || coalesce && !prev)
*dest++ = term;
}
}

#endif // #ifndef UT_SPLIT_H_INCLUDED


这篇关于如何阅读tsv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆