将900 MB .csv转换为ROOT(CERN)TTree [英] Converting 900 MB .csv into ROOT (CERN) TTree

查看:226
本文介绍了将900 MB .csv转换为ROOT(CERN)TTree的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是编程和ROOT(CERN)的新手,所以对我轻松一点.简而言之,我想将〜900 MB(1100万行x 10列).csv文件转换为组织良好的.root TTree.有人可以提供最好的方法来解决这个问题吗?

I am new to programming and ROOT (CERN), so go easy on me. Simply, I want to convert a ~900 MB (11M lines x 10 columns) .csv file into a nicely organized .root TTree. Could someone provide the best way to go about this?

以下是带有标题的数据行示例(这是2010年美国人口普查区人口和人口密度数据):

Here is an example line of data with headers (it's 2010 US census block population and population density data):

人口普查县代码",人口普查区代码",人口普查阻止代码",县/州",块质心纬度(度)",块质心W经度(度)",块土地面积" (sq mi)",块土地面积(平方公里)",块人口",块人口密度(人/平方公里)"

"Census County Code","Census Tract Code","Census Block Code","County/State","Block Centroid Latitude (degrees)","Block Centroid W Longitude (degrees)","Block Land Area (sq mi)","Block Land Area (sq km)","Block Population","Block Population Density (people/sq km)"

1001,201,1000,奥塔加(Autauga)AL,32.469683,-86.480959,0.186343,0.482626154,61,126.3918241

1001,201,1000,Autauga AL,32.469683,-86.480959,0.186343,0.482626154,61,126.3918241

我已经粘贴了我到目前为止写的内容.

I've pasted the what I've wrote so far below.

在运行时,我尤其无法找出此错误:"C:41:1:错误:未知类型名称'UScsvToRoot'".

I particularly can’t figure out this error when running: "C:41:1: error: unknown type name ‘UScsvToRoot’".

这可能真的很愚蠢,但是您如何在ROOT中读取字符串(用于读取县/州名称)?像什么是数据类型?我只需要使用char的字符吗?我要空白.

This may be really really stupid, but how do you read in strings in ROOT (for reading in the County/State name)? Like what is the data type? Do I just have to use char’s? I’m blanking.

#include "Riostream.h"
#include "TString.h"
#include "TFile.h"
#include "TNtuple.h"
#include "TSystem.h"

void UScsvToRoot() {

   TString dir = gSystem->UnixPathName(__FILE__);
   dir.ReplaceAll("UScsvToRoot.C","");
   dir.ReplaceAll("/./","/");
   ifstream in;
   in.open(Form("%sUSPopDens.csv",dir.Data()));

   Int_t countyCode,tractCode,blockCode;
   // how to import County/State string?
   Float_t lat,long,areaMi,areaKm,pop,popDens;
   Int_t nlines = 0;
   TFile *f = new TFile("USPopDens.root","RECREATE");
   TNtuple *ntuple = new TNtuple("ntuple","data from csv file","countyCode:tractCode:blockCode:countyState:lat:long:areaMi:areaKm:pop:popDens");

   while (1) {
      in >> countyCode >> tractCode >> blockCode >> countyState >> lat >> long >> areaMi >> areaKm >> pop >> popDens;
      if (!in.good()) break;
      ntuple->Fill(countyCode,tractCode,blockCode,countyState,lat,long,areaMi,areaKm,pop,popDens);
      nlines++;
   }

   in.close();

   f->Write();
}`

推荐答案

好的,所以我将对此进行介绍,但前面有一些评论:

Ok, so I am going to give this a shot, but a few comments up front:

对于根问题,您应该强烈考虑进入根主页,然后再进入论坛.尽管stackoverflow是一个很好的信息源,但是有关根框架的特定问题更适合根主页.

for questions on root, you should strongly consider going to the root homepage and then to the forum. While stackoverflow is an excellent source of information, specific questions on the root framework are better suited on the root homepage.

如果您不是root用户,则应该看看教程页面 ;它有许多有关如何使用root的各种功能的示例.

If you are new to root, you should take a look at the tutorial page; it has many examples on how to use the various features of root.

您还应该使用根参考指南,其中包含有关所有根类的文档.

You should also make use of the root reference guide that has documentation on all root classes.

对于您的代码:如果您查看类的文档 TNtuple您正在使用的说明中清楚地表明:

To your code: if you look at the documentation for the class TNtuple that you are using you see that in the description it plainly says:

仅限于浮点变量列表的简单树.

A simple tree restricted to a list of float variables only.

,因此尝试将任何字符串存储到TNtuple中将不起作用.您需要为此使用更通用的类TTree.

so trying to store any string into a TNtuple will not work. You need to use the more general class TTree for that.

要读取文件并将信息存储在树中,您有两个选择: 您可以手动定义分支,然后在遍历文件时填充树:

To read your file and store the information in a tree you have two options: either you manually define the branches and then fill the tree as you loop over the file:

void UScsvToRoot() {
   TString dir = gSystem->UnixPathName(__FILE__);
   dir.ReplaceAll("UScsvToRoot.C","");
   dir.ReplaceAll("/./","/");
   ifstream in;
   in.open(Form("%sUSPopDens.csv",dir.Data()));

   Int_t countyCode,tractCode,blockCode;
   char countyState[1024];
   Float_t lat,lon,areaMi,areaKm,pop,popDens;
   Int_t nlines = 0;
   TFile *f = new TFile("USPopDens.root","RECREATE");
   TTree *tree = new TTree("ntuple","data from csv file");

   tree->Branch("countyCode",&countyCode,"countyCode/I");
   tree->Branch("tractCode",&tractCode,"tractCode/I");
   tree->Branch("blockCode",&blockCode,"blockCode/I");
   tree->Branch("countyState",countyState,"countyState/C");
   tree->Branch("lat",&lat,"lat/F");
   tree->Branch("long",&lon,"lon/F");
   tree->Branch("areaMi",&areaMi,"areaMi/F");
   tree->Branch("areaKm",&areaKm,"areaKm/F");
   tree->Branch("pop",&pop,"pop/F");
   tree->Branch("popDens",&popDens,"popDens/F");

   while (1) {
      in >> countyCode >> tractCode >> blockCode >> countyState >> lat >> lon >> areaMi >> areaKm >> pop >> popDens;
      if (!in.good()) break;
      tree->Fill();
      nlines++;
   }

   in.close();

   f->Write();
}

命令TTree::Branch基本上告诉根

  • 您的分支机构的名称
  • 根将从其读取信息的变量的地址
  • 分支的格式

包含字符串信息的TBranch的类型为C,如果您查看

The TBranch that contains the string information is of type C which if you look at the TTree documentation means

  • C:以0个字符结尾的字符串
  • C : a character string terminated by the 0 character

我给字符数组指定了一定的大小,您应该看一下自己适合数据的大小.

N.B. I gave the character array a certain size, you should see yourself what size is appropriate for your data.

您可以使用的另一种可能性是取消ifstream并简单地使用您将采用的TTreeReadFile方法

The other possibility that you can use is to do away with the ifstream and simply make use of the ReadFile method of the TTree which you would employ like this

#include "Riostream.h"
#include "TString.h"
#include "TFile.h"
#include "TTree.h"
#include "TSystem.h"

void UScsvToRoot() {

   TString dir = gSystem->UnixPathName(__FILE__);
   dir.ReplaceAll("UScsvToRoot.C","");
   dir.ReplaceAll("/./","/");

   TFile *f = new TFile("USPopDens.root","RECREATE");
   TTree *tree = new TTree("ntuple","data from csv file");
   tree->ReadFile("USPopDens.csv","countyCode/I:tractCode/I:blockCode/I:countyState/C:lat/F:lon/F:areaMi/F:areaKm/F:pop/F:popDens/F",',');
   f->Write();
}

您可以阅读部分. >有关更多信息;除其他外,它还具有使用TTree:ReadFile 的示例.

让我知道这是否有帮助

这篇关于将900 MB .csv转换为ROOT(CERN)TTree的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆