将大量数据加载到内存中 - 最有效的方式来做到这一点? [英] Loading large amount of data into memory - most efficient way to do this?

查看:361
本文介绍了将大量数据加载到内存中 - 最有效的方式来做到这一点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基于Web的文档搜索/查看系统,我正在为客户开发。该系统的一部分是搜索系统,允许客户端搜索文档中包含的术语。我已经创建了必要的搜索数据文件,但是有很多数据需要加载,并且需要8-20秒的时间来加载所有的数据。数据被分解为40-100个文件,具体取决于需要搜索的文档。每个文件可以是40-350kb的任何地方。



此外,此应用程序必须能够在本地文件系统以及通过Web服务器上运行。



当网页加载时,我可以生成我需要加载的搜索数据文件的列表。这个整个列表必须在网页被视为功能之前加载。



有了这个序言,我们来看看我现在该怎么做。 p>

在我知道整个网页被加载后,我调用一个loadData()函数

  function loadData(){
var d = new Date();
var curr_min = d.getMinutes();
var curr_sec = d.getSeconds();
var curr_mil = d.getMilliseconds();
console.log(test.js start background load,time is:+ curr_min +:+ curr_sec +:+ curr_mil);
recursiveCall();
}


函数recursiveCall(){
if(file_array.length> 0){
var string = file_array.pop();
setTimeout(function(){$。getScript(string,recursiveCall);},1);
}
其他{
var d = new Date();
var curr_min = d.getMinutes();
var curr_sec = d.getSeconds();
var curr_mil = d.getMilliseconds();
console.log(test.js stopped background load,time is:+ curr_min +:+ curr_sec +:+ curr_mil);
}
}

这样做是按顺序处理文件数组,在文件之间休息一下1ms。这有助于防止浏览器在加载过程中被完全锁定,但是浏览器仍然倾向于通过加载数据而陷入僵局。我正在加载的每个文件如下所示:

  AddToBookData(0,[0,1,2,3, 4,5,6,7,8]); 
AddToBookData(1,[0,1,2,3,4,5,6,7,8]);
AddToBookData(2,[0,1,2,3,4,5,6,7,8]);

每行是一个将数据添加到数组的函数调用。 AddToBookData函数只需执行以下操作:

 函数AddToBookData(index1,value1){
BookData [BookIndex] .push([index1,value1]);
}

这是现有的系统。加载完所有数据后,AddToBookData可以调用100,000次以上。



我认为这是非常低效的,所以我写了一个脚本来使用test.js文件它包含上面的所有函数调用,并处理它,将其更改为与BookData正在创建的数据结构相同的巨大数组。而不是进行旧系统的所有函数调用,我只需执行以下操作:

  var test_array [.... ......(数据结构我需要).......] 
BookData [BookIndex] = test_array;

我期望看到性能提升,因为我正在删除上面的所有函数调用,这个方法需要稍微更多的时间来创建确切的数据结构。我应该注意,在我的真实世界测试中,test_array略微超过了9万个元素。



似乎加载数据的两种方法都具有大致相同的CPU利用率。我很惊讶地发现,因为我期待第二种方法需要很少的CPU时间,因为数据结构正在创建之前。



请指教? p>

解决方案

看起来,优化数据加载有两个基本方面,可以单独考虑和解决:


  1. 从服务器下载数据。而不是一个大文件,您应该从多个较小文件的并行加载中获胜。尝试同时加载的数量,请牢记浏览器的限制,并且具有太多并行连接的收益减少。请参阅我的并行顺序在jsfiddle上进行实验,但是请记住,结果会因为从github中拉出测试数据的变化而变化 - 你最好在更严格控制的条件下用自己的数据进行测试。

  2. 尽可能高效地构建数据结构。您的结果看起来像一个多维数组,这个关于JavaScript数组性能的有趣的文章可能会给您一些在这方面的实验的想法。

但我不确定通过优化单独的数据加载,您将真正能够走多远。要解决您的应用程序的实际问题(浏览器锁定时间过长),您是否考虑过选项?



使用Web工作器



Web Workers 可能不受所有目标浏览器的支持,但应防止主浏览器线程在处理数据时锁定。



对于没有工作人员的浏览器,您可以考虑将 setTimeout 间隔稍微增加到给浏览器的时间为用户以及JS提供服务。这将使事情实际上稍微慢一点,但可能会增加用户的幸福度,同时与下一点相结合。



提供进度反馈



对于具有工作能力和缺少工作人员的浏览器,需要一些时间才能使用进度条来更新DOM。你知道你有多少文件要加载,所以进度应该是相当一致的,虽然事情可能会稍微慢些,用户会感觉到更好的,如果他们得到反馈,不要以为浏览器已经锁定了他们。



Lazy Loading



正如 jira 在他的评论中所建议的。如果Google Instant可以在我们输入时搜索整个网络,是否真的无法让服务器返回当前书籍中所有搜索关键字位置的文件?这个文件应该比书中所有单词的位置小得多,加载速度要快一点,这是我以为您目前正试图尽可能快的加载的?


I have a web-based documentation searching/viewing system that I'm developing for a client. Part of this system is a search system that allows the client to search for a term[s] contained in the documentation. I've got the necessary search data files created, but there's a lot of data that needs to be loaded, and it takes anywhere from 8-20 seconds to load all the data. The data is broken into 40-100 files, depending on what documentation needs to be searched. Each file is anywhere from 40-350kb.

Also, this application must be able to run on the local file system, as well as through a webserver.

When the webpage loads up, I can generate a list of what search data files I need load. This entire list must be loaded before the webpage can be considered functional.

With that preface out of the way, let's look at how I'm doing it now.

After I know that the entire webpage is loaded, I call a loadData() function

function loadData(){
            var d = new Date();
            var curr_min = d.getMinutes();
            var curr_sec = d.getSeconds();
         var curr_mil = d.getMilliseconds();
         console.log("test.js started background loading, time is: " + curr_min + ":" + curr_sec+ ":" + curr_mil);
          recursiveCall();
      }


   function recursiveCall(){
      if(file_array.length > 0){
         var string = file_array.pop();
         setTimeout(function(){$.getScript(string,recursiveCall);},1);
    }
    else{
        var d = new Date();
        var curr_min = d.getMinutes();
        var curr_sec = d.getSeconds();
        var curr_mil = d.getMilliseconds();
        console.log("test.js stopped background loading, time is: " + curr_min + ":" + curr_sec+ ":" + curr_mil);
    }
  }

What this does is processes an array of files sequentially, taking a 1ms break between files. This helps prevent the browser from being completely locked up during the loading process, but the browser still tends to get bogged down by loading the data. Each of the files that I'm loading look like this:

AddToBookData(0,[0,1,2,3,4,5,6,7,8]);
AddToBookData(1,[0,1,2,3,4,5,6,7,8]);
AddToBookData(2,[0,1,2,3,4,5,6,7,8]);

Where each line is a function call that is adding data to an array. The "AddToBookData" function simply does the following:

    function AddToBookData(index1,value1){
         BookData[BookIndex].push([index1,value1]);
    }

This is the existing system. After loading all the data, "AddToBookData" can get called 100,000+ times.

I figured that was pretty inefficient, so I wrote a script to take the test.js file which contains all the function calls above, and processed it to change it into a giant array which is equal to the data structure that BookData is creating. Instead of making all the function calls that the old system did, I simply do the following:

var test_array[..........(data structure I need).......]
BookData[BookIndex] = test_array;

I was expecting to see a performance increase because I was removing all the function calls above, this method takes slightly more time to create the exact data structure. I should note that "test_array" holds slightly over 90,000 elements in my real world test.

It seems that both methods of loading data have roughly the same CPU utilization. I was surprised to find this, since I was expecting the second method to require little CPU time, since the data structure is being created before hand.

Please advise?

解决方案

Looks like there are two basic areas for optimising the data loading, that can be considered and tackled separately:

  1. Downloading the data from the server. Rather than one large file you should gain wins from parallel loads of multiple smaller files. Experiment with number of simultaneous loads, bear in mind browser limits and diminishing returns of having too many parallel connections. See my parallel vs sequential experiments on jsfiddle but bear in mind that the results will vary due to the vagaries of pulling the test data from github - you're best off testing with your own data under more tightly controlled conditions.
  2. Building your data structure as efficiently as possible. Your result looks like a multi-dimensional array, this interesting article on JavaScript array performance may give you some ideas for experimentation in this area.

But I'm not sure how far you'll really be able to go with optimising the data loading alone. To solve the actual problem with your application (browser locking up for too long) have you considered options such as?

Using Web Workers

Web Workers might not be supported by all your target browsers, but should prevent the main browser thread from locking up while it processes the data.

For browsers without workers, you could consider increasing the setTimeout interval slightly to give the browser time to service the user as well as your JS. This will make things actually slightly slower but may increase user happiness when combined with the next point.

Providing feedback of progress

For both worker-capable and worker-deficient browsers, take some time to update the DOM with a progress bar. You know how many files you have left to load so progress should be fairly consistent and although things may actually be slightly slower, users will feel better if they get the feedback and don't think the browser has locked up on them.

Lazy Loading

As suggested by jira in his comment. If Google Instant can search the entire web as we type, is it really not possible to have the server return a file with all locations of the search keyword within the current book? This file should be much smaller and faster to load than the locations of all words within the book, which is what I assume you are currently trying to get loaded as quickly as you can?

这篇关于将大量数据加载到内存中 - 最有效的方式来做到这一点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆