如何使用node.js和cloudflare worker在现有的HTML响应中注入javascript [英] How to inject javascript in existing HTML response with node.js and cloudflare workers

查看:173
本文介绍了如何使用node.js和cloudflare worker在现有的HTML响应中注入javascript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个指向GitBook的虚荣网址. GitBook不支持插入任意JavaScript代码段.目前,GitBook仅具有4个集成".

I have a vanity URL pointing to a GitBook. GitBook doesn't support the insertion of arbitrary javascript snippets. At the moment GitBook has 4 "integrations" only.

我可以通过自己的VM服务器进行路由,以完成此操作,但是我有CloudFlare,所以我想尝试一些工作程序. (JavaScript在CDN边缘运行).

I could route through my own VM server to accomplish this, but I have CloudFlare and I want to try out workers. (Javascript running at the CDN edge).

CloudFlare工作者环境使标头注入非常容易,但是没有明显的方法来实现.

The CloudFlare worker environment makes header injection very easy, but there is no obvious way to do this.

推荐答案

使用TransformStream进行处理很重要,这样处理才是异步的,并且不需要内存缓冲(以实现可伸缩性并最大程度地减少GC)-只有5毫秒的CPU时间预算.

It's important to process with a TransformStream so that processing is async and doesn't require memory buffering (for scalability and to minimise GC) - there's only a 5ms CPU time budget.

概述:

  • 要自己使用,请更改字符串forHeadStartforHeadEndforBodyEnd.
  • 推荐使用这种deferredInjection方法,以最大程度地减少工作人员的CPU时间.它效率更高,因为它只需要解析HTML的开头.另一种方法需要解析整个head部分以进行headInjection,如果使用bodyInjection,则实际上需要解析整个html响应.
  • deferredInjection方法的工作原理是将内容注入到head标签的开头,然后在客户端在运行时将HTML内容部署到所需的位置.
  • 如果需要,可以使用headInjection和/或bodyInjection直接注入.取消注释相关代码,包括injectScripts中的代码,并设置将要编码的tagBytes的字符串.
  • 此解决方案将仅解析HTML内容类型
  • 此解决方案直接在字节(而非字符串)上工作,以提高效率.搜索结束标签字符串的字节.
  • 您可以定位更多的最终标签,但通常不需要定位到这两个以上
  • 通过流处理数据(整个HTML字符串未缓存在内存中).这样可以降低峰值内存使用率,并加快达到第一个字节的时间.
  • 处理一种罕见的边缘情况,即结束标记位于文本读取边界上.我相信每〜1000个字节(每个TCP数据包1000-1500个字节)可能会出现一个边界,并且由于gzip压缩而可能会有所不同.
  • 将注入解析代码分开保存,以使代码简单地转发其余部分以保持清晰性.
  • 如果不需要,可以通过注释掉第二个身体标签注入器来禁用它-这样可以加快处理速度.
  • 我已经为自己测试了此确切的代码,并且可以正常工作.可能还有其他错误(取决于结束标记的位置,以及服务器是否以部分html模板回复(仅正文)).我今天可能会修复一个问题2019-06-28
  • To use for yourself, change the strings forHeadStart, forHeadEnd, and forBodyEnd.
  • This deferredInjection approach is the recommended way that minimises CPU time for the worker. It's more efficient because it only needs to parse the very start of the HTML. The other approach requires parsing of the whole head section for headInjection, and if you use bodyInjection it practically needs to parse the whole html response.
  • The deferredInjection approach works by injecting the content into the start of the head tag, then on the client-side at runtime your HTML content will be deployed to the desired places.
  • You can inject directly if needed using headInjection and/or bodyInjection. Uncommenting related code, including code in injectScripts, and setting the strings for tagBytes that will be encoded.
  • This solution will only parse HTML content types
  • This solution works directly on bytes (not strings) for better efficiency. Searching for the bytes of the end-tag strings.
  • You could potentially target more end-tags, but usually you don't need to target more than these two
  • Processes data with streaming (the whole HTML string is not cached in memory). This lowers peak memory usage and speeds up time to first byte.
  • Handles a rare edge case where the closing tag is on a text read boundary. I believe a boundary might occur every ~1000 bytes (TCP packets 1000-1500 bytes each), and this can vary due to gzip compression.
  • Keeps the injection parsing code separate for the code to simply forward the rest for clarity.
  • You can disable the second body-tag injector by commenting it out if you don't need it - that will speed up processing.
  • I have tested this exact code for myself and it works. There might be remaining bugs (depending on location of closing tag, and depending if your server replies with partial html templates (body only)). I may have fixed one today 2019-06-28

代码

addEventListener('fetch', event => {
  event.passThroughOnException();
  event.respondWith(handleRequest(event.request))
})

/**
 * Fetch and log a request
 * @param {Request} request
 */
async function handleRequest(request) {
  const response = await fetch(request);

  var ctype = response.headers.get('content-type');
  if (ctype.startsWith('text/html') === false)
    return response; //Only parse html body

  let { readable, writable } = new TransformStream();
  let promise = injectScripts(response.body, writable);
  return new Response(readable, response);
}

let encoder = new TextEncoder('utf-8');

let deferredInjection = function() {
    let forHeadStart = `<script>var test = 1; //Start of head section</script>`;
    let forHeadEnd = `<script>var test = 2; //End of head section</script>`;
    let forBodyEnd = `<script>var test = 3; //End of body section</script><button>click</button>`;

    let helper = `
    ${forHeadStart}
    <script>
        function appendHtmlTo(element, htmlContent) {
            var temp = document.createElement('div');
            temp.innerHTML = htmlContent;
            while (temp.firstChild) {
                element.appendChild(temp.firstChild);
            };
        }

        let forHeadEnd = "${ btoa(forHeadEnd) }";
        let forBodyEnd = "${ btoa(forBodyEnd) }";

        if (forHeadEnd.length > 0) appendHtmlTo(document.head, atob(forHeadEnd)); 
    if (forBodyEnd.length > 0) window.onload = function() {
      appendHtmlTo(document.body, atob(forBodyEnd));
    };

    </script>
    `;
    return {
        forInjection: encoder.encode(helper),
        tagBytes: encoder.encode("<head>"),
        insertAfterTag: true
    };

}();

// let headInjection = {
    // forInjection: encoder.encode("<script>var test = 1;</script>"),
    // tagBytes: encoder.encode("</head>"), //case sensitive
    // insertAfterTag: false
// };
// let bodyInjection = {
    // forInjection: encoder.encode("<script>var test = 1;</script>"),
    // tagBytes: encoder.encode("</body>"), //case sensitive
    // insertAfterTag: false
// }

//console.log(bodyTagBytes);
encoder = null;

async function injectScripts(readable, writable) {
  let processingState = {
    readStream: readable,
    writeStream: writable,
    reader: readable.getReader(),
    writer: writable.getWriter(),
    leftOvers: null, //data left over after a closing tag is found
    inputDone: false,
    result: {charactersFound: 0, foundIndex: -1, afterHeadTag: -1} //Reused object for the duration of the request
  };


  await parseForInjection(processingState, deferredInjection);

  //await parseForInjection(processingState, headInjection);

  //await parseForInjection(processingState, bodyInjection);

  await forwardTheRest(processingState);      
}



///Return object will have foundIndex: -1, if there is no match, and no partial match at the end of the array
///If there is an exact match, return object will have charactersFound:(tagBytes.Length)
///If there is a partial match at the end of the array, return object charactersFound will be < (tagBytes.Length)
///The result object needs to be passed in to reduce Garbage Collection - we can reuse the object
function searchByteArrayChunkForClosingTag(chunk, tagBytes, result)
{   
  //console.log('search');
    let searchStart = 0;
  //console.log(tagBytes.length);
    //console.log(chunk.length);

    for (;;) {
        result.charactersFound = 0;
        result.foundIndex = -1;
        result.afterHeadTag = -1;
    //console.log(result);

        let sweepIndex = chunk.indexOf(tagBytes[0], searchStart);
        if (sweepIndex === -1)
            return; //Definitely not found

        result.foundIndex = sweepIndex;
        sweepIndex++;
        searchStart = sweepIndex; //where we start searching from next
        result.charactersFound++;   
        result.afterHeadTag = sweepIndex;

    //console.log(result);

        for (let i = 1; i < tagBytes.length; i++)
        {
            if (sweepIndex === chunk.length) return; //Partial match
            if (chunk[sweepIndex++] !== tagBytes[i]) { result.charactersFound = 0; result.afterHeadTag = -1; break; } //Failed to match (even partially to boundary)
            result.charactersFound++;
            result.afterHeadTag = sweepIndex; //Because we work around the actual found tag in case it's across a boundary
        }   

    if (result.charactersFound === tagBytes.length)
          return; //Found
    }

}

function continueSearchByteArrayChunkForClosingTag(chunk, tagBytes, lastSplitResult, result)
{
  //console.log('continue');
    //Finish the search (no need to check the last buffer at all)
    //console.log('finish the search');
    result.charactersFound = lastSplitResult.charactersFound; //We'll be building on the progress from the lastSplitResult
    result.foundIndex = (-1 * result.charactersFound); //This won't be used, but a negative value is indicative of chunk spanning
    let sweepIndex = 0;
    result.afterHeadTag = 0;
    for (let i = lastSplitResult.charactersFound; i < tagBytes.length; i++) //Zero-based
    {
        if (sweepIndex === chunk.length) return result; //So we support working on a chunk that's smaller than the tagBytes search size
        if (chunk[sweepIndex++] !== tagBytes[i]) { result.charactersFound = 0; result.afterHeadTag = -1; break; }
        result.charactersFound++;
        result.afterHeadTag = sweepIndex;
    }
}

function continueOrNewSearch(chunk, tagBytes, lastSplitResult, result)
{
  //console.log('continueOrNewSearch');
      if (lastSplitResult == null)
          searchByteArrayChunkForClosingTag(chunk, tagBytes, result);
      else
      {
          continueSearchByteArrayChunkForClosingTag(chunk, tagBytes, lastSplitResult, result);
        if (result.charactersFound === tagBytes.length)
            return result;
        else
            return searchByteArrayChunkForClosingTag(chunk, tagBytes, result); //Keep searching onward
      }
}

async function parseForInjection(processingState, injectionJob)
{
  if (processingState.inputDone) return; //Very edge case: Somehow </head> is never found?            
  if (!injectionJob) return;
  if (!injectionJob.tagBytes) return;
  if (!injectionJob.forInjection) return;

  let reader = processingState.reader;
  let writer = processingState.writer;
  let result = processingState.result;
  let tagBytes = injectionJob.tagBytes;
  //(reader, writer, tagBytes, forInjection)

  let lastSplitResult = null;
  let chunk = null;
  processingState.inputDone = false;
  for (;;) {
    if (processingState.leftOvers)
      {
      chunk = processingState.leftOvers;
      processingState.leftOvers = null;
      }
      else
      {
      let readerResult = await reader.read();
      chunk = readerResult.value;
      processingState.inputDone = readerResult.done;
      }

      if (processingState.inputDone) {
        if (lastSplitResult !== null) {
            //Very edge case: Somehow tagBytes is never found?            
            console.log('edge');
                  throw 'tag not found'; //Causing the system to fall back to the direct request
        }
        await writer.close();
        return true;
      }   
      //console.log(value.length);

        continueOrNewSearch(chunk, tagBytes, lastSplitResult, result)
      //console.log(result);

      if (result.charactersFound === tagBytes.length) //Complete match
      {
        //Inject
        //console.log('inject');
        if (result.foundIndex > 0)
        {
          let partValue = chunk.slice(0, result.foundIndex);
          //console.log(partValue);
          await writer.write(partValue);
        }
        console.log('injected');
        if (parseForInjection.insertAfterTag)
        {
            await writer.write(injectionJob.forInjection);
            await writer.write(injectionJob.tagBytes);
        }
        else
        {
            await writer.write(injectionJob.tagBytes);
            await writer.write(injectionJob.forInjection);
        }
        let remainder = chunk.slice(result.afterHeadTag, chunk.length - 1);
        processingState.leftOvers = remainder;
        lastSplitResult = null;
        return;
      }

      if (lastSplitResult !== null)
      {
        //console.log('no match over boundary');
        //The remainder wasn't found, so write the partial match from before (maybe `<` or `</`)
        let failedLastBit = injectionJob.tagBytes.slice(0, lastSplitResult.charactersFound);
        await writer.write(failedLastBit);
        lastSplitResult = null;
      }

      if (result.charactersFound === 0)
      {
        //console.log('not found')
        await writer.write(chunk);
        continue;
      }

      if (result.charactersFound < tagBytes.length)
      {
        //console.log('boundary: ' + result.charactersFound);
        lastSplitResult = result;
        let partValue = chunk.slice(0, result.foundIndex);
        //console.log(partValue);
        await writer.write(partValue);
        continue;
      }
  }
}

async function forwardTheRest(processingState)
{
  try
  {
  if (processingState.inputDone) return; //Very edge case: Somehow </head> is never found?            

  if (processingState.leftOvers)
  {
    chunk = processingState.leftOvers;
    await processingState.writer.write(chunk);
  }

  processingState.reader.releaseLock();
  processingState.writer.releaseLock();

  await processingState.readStream.pipeTo(processingState.writeStream);

  //Should there be an explicit close method called? I couldn't find one
  }
  catch (e)
  {
    console.log(e);
  }
}


直接使用(utf-8)字节的进一步说明:

  • 仅使用字节值.这至少可以通过搜索字符的第一个独特的utf-8字节(<128和> 192)来实现.但是在这种情况下,我们正在搜索</head>,它由少于128个字节组成,非常容易使用.
  • 鉴于搜索utf-8的性质(这是最棘手的问题),它应该与['utf-8','utf8','iso-8859-1','us-ascii']一起使用.您需要更改代码段编码器以使其匹配.
  • 这未经彻底测试.边界案件并没有触发我.理想情况下,我们将为核心功能配备测试平台
  • 感谢Kenton Varda挑战我
  • 请让我知道在forwardTheRest函数中是否存在CloudFlare工人方法来进行pipeTo
  • 您可能会发现continueOrNewSearch和两个子函数是一种有趣的方法,可以跨块边界查找多字节.直到边界,我们才算找到了多少个字节.无需保留这些字节(我们知道它们是什么).然后在下一个块中,我们从上次中断的地方继续.我们总是在标题周围剪切数组缓冲区,并确保我们写入标题字节(使用tagBytes)
  • Only working with byte values. This is possible at least by searching for the first distinctive utf-8 byte of a character (< 128 and > 192). But in this case, we're searching for </head> which is made up of lower-than-128 bytes, very easy to work with.
  • Given the nature of searching for utf-8 (which is the trickiest), this should work with ['utf-8', 'utf8', 'iso-8859-1', 'us-ascii']. You will need to change the snippet encoder to match.
  • This isn't thoroughly tested. The boundary case, didn't trigger for me. Ideally, we would have a testing rig for the core functions
  • thanks to Kenton Varda for challenging me
  • Please let me know if there's a CloudFlare workers way to do pipeTo in the forwardTheRest function
  • You might find continueOrNewSearch and the two sub-functions to be an interesting approach to finding multi-bytes across a chunk boundary. Up until the boundary we just count how many bytes are found. There's no need to keep those bytes (we know what they are). Then on the next chunk we continue where we left off. We always cut the array buffer around the header, and make sure we write the header bytes (using the tagBytes)

这篇关于如何使用node.js和cloudflare worker在现有的HTML响应中注入javascript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆