从客户端和客户端读取的文本文件超出了javascript中单个字符串的最大大小 [英] Reading a text file from the client and on the client that exceeds the maximum size of a single string in javascript

查看:198
本文介绍了从客户端和客户端读取的文本文件超出了javascript中单个字符串的最大大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想撤消在javascript中在客户端上执行的以下步骤,但是在处理blob时遇到了麻烦.

I'd like to reverse the following steps performed on the client in javascript but am having trouble with the blob.

在indexedDB数据库中,通过对象存储索引上的打开游标:

In an indexedDB database, over an open cursor on an object store index:

  1. 从数据库中提取数据对象.
  2. 使用JSON.stringify将对象转换为字符串.
  3. 为JSON字符串制作新的blob {类型:'text/csv'}.
  4. 将blob写入数组.
  5. 将光标向下移动一个并从第1步开始重复.

在事务成功完成之后,从Blob数组中创建了一个相同类型的新Blob.

After the transaction completed successfully, a new blob of same type was made from the array of blobs.

这样做的原因是JSON字符串的串联超出了单个字符串的最大允许大小;因此,无法先进行连接并使该大字符串成为一个blob.但是,可以将这些Blob数组做成一个更大的大小约350MB的Blob,然后下载到客户端磁盘.

The reason for doing it this way is that the concatenation of the JSON strings exceeded the maximum permitted size for a single string; so, couldn't concatenate first and make one blob of that large string. However, the array of blobs could be made into a single blob of greater size, approximately 350MB, and downloaded to the client disk.

要扭转这个过程,我想我可以读入blob,然后将其切成组件blob,然后将每个blob读为一个字符串.但我不知道该怎么做.

To reverse this process, I thought I could read the blob in and then slice it into the component blobs, and then read each blob as a string; but I can't figure out how to do it.

如果将FileReader读取为文本,则结果是一大块文本,由于超出了最大大小,因此无法将其写入单个变量,并且抛出分配大小溢出错误.

If the FileReader is read as text, the result is one large block of text that cannot be written to a single variable because it exceeds the maximum size and throws an allocation size overflow error.

看来,将文件读取为数组缓冲区是一种允许将blob切成碎片的方法,但是似乎存在某种编码问题.

It appeared that reading the file as an array buffer would be an approach allowing for slicing the blob into pieces, but there seems to be an encoding issue of some kind.

是否有办法按原样反转原始过程,或者可以添加一个编码步骤,以使数组缓冲区可以转换回原始字符串?

Is there a way to reverse the orignal process as is, or an encoding step that can be added that will allow the array buffer to be converted back to the original strings?

我尝试阅读一些似乎相关的问题,但目前我不理解他们正在讨论的编码问题.恢复字符串似乎相当复杂.

I tried reading over some questions that appeared to be related but, at this point, I don't understand the encoding issues they were discussing. It seems that it is rather complicated to recover a string.

感谢您提供的任何指导.

Thank you for any guidance you can provide.

采用接受的答案后的其他信息

下面发布的代码当然没有什么特别之处,但我认为我会与那些对我来说像新手一样的人分享它.这是集成在asnyc函数中的可接受的答案,该函数用于读取blob,解析blob并将其写入数据库.

There's certainly nothing special about my code posted below but I figured I'd share it for those who may be as new to this as me. It is the accepted answer integrated into the asnyc function used to read the blobs, parse them, and write them to the database.

此方法仅占用很少的内存.太糟糕了,没有一种方法可以将数据写入磁盘.将数据库写入磁盘时,内存使用量会随着生成较大的blob而增加,并在下载完成后不久释放.使用这种方法从本地磁盘上载文件似乎可以工作,而无需在切片之前将整个Blob加载到内存中.就像是从磁盘上片中读取文件一样.因此,在内存使用方面非常有效.

This method uses very little memory. It is too bad there isn't a way to do the same for writing the data to disk. In writing the database to disk, memory usage increases as the large blob is generated and then released shortly after the download completes. Using this method to upload the file from the local disk, appears to work without ever loading the entire blob into memory before slicing. It is as if the file is read from the disk in slices. So, it is very efficient in terms of memory usage.

在我的特定情况下,仍有工作要做,因为使用此方法将总计350MB的50,000个JSON字符串写回数据库相当慢,大约需要7:30才能完成.

In my specific case, there is still work to be done because using this to write the 50,000 JSON strings totalling 350MB back to the database is rather slow and takes about 7:30 to complete.

现在,每个单独的字符串都被单独切片,读取为文本,并在单个事务中写入数据库.是否将Blob切成由一组JSON字符串组成的更大片段,将它们作为文本读取为一个块,然后在单个事务中将它们写入数据库,将在不使用大量内存的情况下更快地执行操作我将需要尝试一个主题和一个单独的问题.

Right now each individual string is separately sliced, read as text, and written to the database in a single transaction. Whether slicing the blob into larger pieces comprised of a set of JSON strings, reading them as text in a block, and then writting them to the database in a single transaction, will perform more quickly while still not using a large amount of memory is something I will need to experiment with and a topic for a separate question.

如果使用替代循环来确定填充大小const c所需的JSON字符串数,然后对该大小的blob进行切片,将其读取为文本,然后将其拆分以解析每个单独的JSON字符串,则该时间为对于c = 250,000到1,000,000,大约为1:30.似乎解析大量JSON字符串仍然会减慢速度.较大的Blob切片不会转换为将大量文本解析为一个块,并且需要分别解析50,000个字符串中的每个字符串.

If use the alternative loop that determines the number of JSON strings needed to fill the size const c, and then slice that size blob, read it as text, and split it up to parse each individual JSON string, the time to complete is about 1:30 for c =250,000 through 1,000,000. It appears that parsing a large number of JSON strings still slows things down regardless. Large blob slices don't translate to large amounts of text being parsed as a single block and each of the 50,000 strings needs to be parsed individually.

   try

     {

       let i, l, b, result, map, p;

       const c = 1000000;


       // First get the file map from front of blob/file.

       // Read first ten characters to get length of map JSON string.

       b = new Blob( [ f.slice(0,10) ], { type: 'text/csv' } ); 

       result = await read_file( b );

       l = parseInt(result.value);


       // Read the map string and parse to array of objects.

       b = new Blob( [ f.slice( 10, 10 + l) ], { type: 'text/csv' } ); 

       result = await read_file( b );

       map = JSON.parse(result.value); 


       l = map.length;

       p = 10 + result.value.length;


       // Using this loop taks about 7:30 to complete.

       for ( i = 1; i < l; i++ )

         {

           b = new Blob( [ f.slice( p, p + map[i].l ) ], { type: 'text/csv' } ); 

           result = await read_file( b ); // FileReader wrapped in a promise.

           result = await write_qst( JSON.parse( result.value ) ); // Database transaction wrapped in a promise.

           p = p + map[i].l;

           $("#msg").text( result );

         }; // next i


       $("#msg").text( "Successfully wrote all data to the database." );


       i = l = b = result = map = p = null;

     }

   catch(e)

     { 

       alert( "error " + e );

     }

   finally

     {

       f = null;

     }



/* 

  // Alternative loop that completes in about 1:30 versus 7:30 for above loop.


       for ( i = 1; i < l; i++ )

         { 

           let status = false, 

               k, j, n = 0, x = 0, 

               L = map[i].l,

               a_parse = [];



           if ( L < c ) status = true;

           while ( status )

             {

               if ( i+1 < l && L + map[i+1].l <= c ) 

                 {

                   L = L + map[i+1].l;

                   i = i + 1;

                   n = n + 1;

                 }

               else

                 {

                   status = false;

                 };

             }; // loop while


           b = new Blob( [ f.slice( p, p + L ) ], { type: 'text/csv' } ); 

           result = await read_file( b ); 

           j = i - n; 

           for ( k = j; k <= i; k++ )

             {

                a_parse.push( JSON.parse( result.value.substring( x, x + map[k].l ) ) );

                x = x + map[k].l;

             }; // next k

           result = await write_qst_grp( a_parse, i + ' of ' + l );

           p = p + L;

           $("#msg").text( result );

         }; // next i



*/



/*

// Was using this loop when thought the concern may be that the JSON strings were too large,
// but then realized the issue in my case is the opposite one of having 50,000 JSON strings of smaller size.

       for ( i = 1; i < l; i++ )

         {

           let x,

               m = map[i].l,

               str = [];

           while ( m > 0 )

             {

               x = Math.min( m, c );

               m = m - c;

               b = new Blob( [ f.slice( p, p + x ) ], { type: 'text/csv' } ); 

               result = await read_file( b );

               str.push( result.value );

               p = p + x;

             }; // loop while


            result = await write_qst( JSON.parse( str.join("") ) );

            $("#msg").text( result );

            str = null;

         }; // next i
*/           

推荐答案

您在问题中已经说过足够有趣,这很有趣:

Funnilly enough you already said in your question what should be done:

切片您的Blob.

Blob界面确实具有 .slice() 方法.
但是要使用它,您应该跟踪合并发生的位置. (可以位于数据库的其他字段中,甚至可以作为文件的标头:

The Blob interface does have a .slice() method.
But to use it, you should keep track of the positions where your merging occurred. (could be in an other field of your db, or even as an header of your file:

function readChunks({blob, chunk_size}) {
  console.log('full Blob size', blob.size);
  const strings = [];  
  const reader = new FileReader();
  var cursor = 0;
  reader.onload = onsingleprocessed;
  
  readNext();
  
  function readNext() {
    // here is the magic
    const nextChunk = blob.slice(cursor, (cursor + chunk_size));
    cursor += chunk_size;
    reader.readAsText(nextChunk);
  }
  function onsingleprocessed() {
    strings.push(reader.result);
    if(cursor < blob.size) readNext();
    else {
      console.log('read %s chunks', strings.length);
      console.log('excerpt content of the first chunk',
        strings[0].substring(0, 30));
    }
  }
}



// we will do the demo in a Worker to not kill visitors page
function worker_script() {
  self.onmessage = e => {
    const blobs = [];
    const chunk_size = 1024*1024; // 1MB per chunk
    for(let i=0; i<500; i++) {
      let arr = new Uint8Array(chunk_size);
      arr.fill(97); // only 'a'
      blobs.push(new Blob([arr], {type:'text/plain'}));
    }
    const merged = new Blob(blobs, {type: 'text/plain'});
    self.postMessage({blob: merged, chunk_size: chunk_size});
  }
}
const worker_url = URL.createObjectURL(
  new Blob([`(${worker_script.toString()})()`],
    {type: 'application/javascript'}
  )
);
const worker = new Worker(worker_url);
worker.onmessage = e => readChunks(e.data);
worker.postMessage('do it');

这篇关于从客户端和客户端读取的文本文件超出了javascript中单个字符串的最大大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆