如何将HITRAN2012数据库读入MATLAB? [英] How do I read the HITRAN2012 database into MATLAB?

查看:481
本文介绍了如何将HITRAN2012数据库读入MATLAB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HITRAN数据库是分子旋转振动转变的清单.它在一个文本文件中给出,其中每行为160个字符,固定宽度的字段定义分子,同位素等.该格式有据可查,甚至在

The HITRAN database is a listing of molecular rotational-vibrational transitions. It is given in a text file where each line is 160 characters, with fixed width fields defining molecule, isotope, etc. The format is well documented, and there is even a program on the MathWorks File Exchange that will read in the database and simulate a portion of the spectrum. However, I need to read in a specific portion of the spectrum and then use it to do some fitting to a measured spectrum, so I need something much more custom.

如该函数的注释部分以及其他地方所述,以下行应正确读取每一行:

As given in the comment section of that function, as well as elsewhere, the following line should read each line in properly:

database = which('HITRAN2012.par');
fid = fopen(database);
hitran = textscan(fid,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6c%12c%1c%7f%7f','delimiter','','whitespace','');
fclose(fid);

前两个字段表示从1至47的分子代码,从1至9的同位素代码.

The first two fields denote the molecule code, which runs from 1-47, and the isotope code which runs from 1-9.

不幸的是,分子1-9没有前导零,并且无论我做什么,似乎都在默默地使MATLAB困惑.如果我加载了整个数据库,然后键入

Unfortunately, molecules 1-9 do not have a leading zero, and no matter what I do, it seems to silently confuse MATLAB. If I load in the entire database and then type

unique(hitran{1})

我没有得到1到47的数字,但是得到10到92却缺少一些数字.据我所知,当MATLAB遇到前导空间时,它会将线移开,然后填充末端,以使"12"变为"12",但我不确定.我也尝试过

I do not get the numbers 1-47, but I get 10-92 with a few numbers missing. As far as I can figure, when MATLAB encounters a leading space, it shifts the line over and then pads the end, so that ' 12' becomes '12', but I'm not exactly sure. I have also tried

hitran = textscan(fid,'%160c','delimiter','\n','whitespace','');

然后尝试解析结果字符串,但是有时也会被第一个空格弄糊涂.

and then tried to parse the resulting strings, but that also sometimes gets confused by the first space.

例如,第一条水线看起来像

For instance, the first water line looks like

exampleHitranLine = ' 14    0.007002 1.165E-32 2.071E-14.05870.305  818.00670.590.000000          0 0 0          0 0 0  7  5  2        7  5  3      005540 02227 5 2 0    90.0   90.0';

这行代码的第一位出现,并返回'14'而不是' 1''4'.如果我只是读入仅包含分子1的子集(如本例所示),则第二种阅读方法可以正常工作.但是,如果我尝试读取整个数据库,则分子1-9的行向左移动,从而弄乱了所有其他字段.

The first bit of code comes across this line and returns '14' instead of ' 1' and '4'. If I just read in a subset that only contains molecule 1 (as in this example), then the second method of reading works fine. If I try to read in the entire database, however, the lines with molecule 1-9 are shifted the the left, which messes up all the other fields.

我应该注意,我尝试将数值字段读取为浮点数和整数,但均未给出令人满意的结果.整个文本形式的数据库将近700 MB,因此我需要一些效率最高的东西.

I should note that I've tried reading the numerical fields both as floats and as integers, and neither gives satisfactory results. The entire database in text form is nearly 700 MB, and so I need something that works as efficiently as possible.

我在做什么错了?

推荐答案

对于这种情况的发生,我没有任何答案,但是我有解决方案.如果有人对原因有答案,我很乐意接受.

I don't have an answer as to why this is happening, but I do have a solution. If anyone has an answer as to why, I'd be happy to accept it.

这是将事情搞砸的领先空间. MATLAB有点太聪明了,当textscan遇到前导空格时,它会确定它是多余的,并将其丢弃并移至下两个字符.为了正确读取文件中的内容,我必须逐行检查第一个字符是否为空格,然后将其替换为前导零,如下所示:

It is the leading space that is screwing things up. MATLAB is being a little too clever, and when textscan encounters a leading space, it decides that it's extra and discards it and moves on to the next two characters. To get it to properly read in the file, I had to go line by line and test whether the first character is a space and then replace it with a leading zero, like this:

database = which('HITRAN2012_First100Lines.par');

fileParams = dir(database);
K = fileParams.bytes/162;
hitran = cell(K,19);

fid = fopen(database);
for k = 1:K
hitranTemp = fgetl(fid);
if abs(hitranTemp(1)) == 32;
    hitranTemp(1) = '0';
end
    hitran(k,:) = deal(textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6c%12c%1c%7f%7f','delimiter','','whitespace',''));
end
fclose(fid);

我正在使用MATLAB 2013a.我应该认为这是一个错误并报告它吗?出于某种原因,应该抢占领先空间吗?

I'm working in MATLAB 2013a. Should I consider this to be a bug and report it? Is there some reason that the leading space should be gobbled up like this?

更新:

我上面的解决方法很慢,但是有效.然后,我不得不处理HITEMP数据库,该数据库的大小是原来的几倍,所以我最终向MathWorks提交了支持通知单. MathWorks技术支持建议的解决方法是将所有内容都以文本形式读取,然后进行转换.这样可以节省大量的磁盘读取和工作时间.

My workaround above was slow, but worked. Then I had to process the HITEMP database, which is several times larger, so I finally did submit a support ticket to MathWorks. The workaround suggested by MathWorks technical support is to read everything in as text and then convert. This saves a lot of disk reads and works.

fileParams = dir(database);

fid = fopen(database);

hitran = textscan(fid,'%2c%1c%12c%10c%10c%5c%5c%10c%4c%8c%15c%15c%15c%15c%6c%12c%1c%7c%7c','delimiter','','whitespace','');

fclose(fid);

moleculeNumber = uint8(str2num(hitran{1}));
isotopologueNumber = uint8(str2num(hitran{2});
vacuumWavenumber = str2num(hitran{3});
...
etc.

根据应用程序的不同,对于大型数据库,可能要执行此操作 ,而不是一次全部

Depending on the application, for larger databases one would probably want to do this in chunks rather than all at once.

他还表示,他会将这种行为转发给开发团队,以供将来更新.

He also said he would forward the behavior to the development team for consideration in a future update.

这篇关于如何将HITRAN2012数据库读入MATLAB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆