在java中用非常短的时间在一个非常大的ARPA文件中搜索 [英] do searching in a very big ARPA file in a very short time in java

查看:101
本文介绍了在java中用非常短的时间在一个非常大的ARPA文件中搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个几乎为1 GB的ARPA文件。我不得不在不到1分钟的时间里搜索它。我搜索了很多,但我还没有找到合适的答案。我想我不必阅读整个文件。我只需要跳转到文件中的特定行并阅读整行。 ARPA文件的行长度不同。我必须提到ARPA文件具有特定格式。



文件格式



 
\ data \

ngram 1 = 19

ngram 2 = 234

ngram 3 = 1013

\ -c-b:

-1.7132 puluh -3.8008

-1.9782 satu -3.8368

\2-gram:

-1.5403 dalam dua -1.0560

-3.1626 dalam ini 0.0000

\ 3-gram:

-1.8726 itu dan tiga

-1.9654 itu dan untuk

\end\

当你在样本文件中看到我有19行1克,234行2克和1013行3克。我将该行的字符串部分提供给程序,并获取字符串左侧和右侧的数字。输入字符串可以帮助我知道我必须在哪个部分搜索文件。我必须找到一种不完全读取文件的方法,因为我的文件非常大并且读取整个文件需要花费很多时间。我认为这是跳转到文件中特定行而不使用索引文件并访问整行的好方法。



如果你能这样做会很棒帮我完成任务。

解决方案

我不知道ARPA文件是什么。我假设它是某种包含文本的文件。



您要做的是首先索引文件,以便将文件中的行号与字符串相关联。 / p>

这是一个大文件,因此您可能将索引存储在单独的文件中。



首先,在此之前用户搜索,你运行你的索引。然后,您将在索引中搜索找到用户正在查找的String的行号。


I have an ARPA file which is almost 1 GB. I have to do searching in it in less than 1 minute. I have searched a lot, but I have not found the suitable answer yet. I think I do not have to read the whole file. I just have to jump to a specific line in the file and read the whole line. The lines of the ARPA file do not have the same length. I have to mention that ARPA files have a specific format.

File Format

\data\

ngram 1=19

ngram 2=234

ngram 3=1013

\1-grams:

-1.7132 puluh -3.8008

-1.9782 satu -3.8368

\2-grams:

-1.5403 dalam dua -1.0560

-3.1626 dalam ini 0.0000

\3-grams:

-1.8726 itu dan tiga

-1.9654 itu dan untuk

\end\

As you see in the sample file I have 19 lines of 1-grams, 234 lines of 2-grams and 1013 lines of 3-grams. I give the string part of the line to the program and get the numbers which are at the left and at the right side of the string. The input string can help me to know in which part of the file I have to do searching.I have to find a way not to read the file completely, because my file is very big and reading the whole file takes a lot of time. I think it is a good way to jump to the specific line in the file without using the index file and access to the whole line.

It will be great if you can help me to do my assignment.

解决方案

I don't know what an ARPA file is. I'm assuming it's some sort of file containing text.

What you want to do is first index the file so you can associate line numbers in the file to Strings.

That's a big file so you'd probably store your index in a separate file.

First, prior to the user searching, you'd run your index. Then you'd search your index for the line numbers where the String the user is looking for is found.

这篇关于在java中用非常短的时间在一个非常大的ARPA文件中搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆