页面加载后 ChromeDriver --print-to-pdf [英] ChromeDriver --print-to-pdf after page load

查看:26
本文介绍了页面加载后 ChromeDriver --print-to-pdf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据文档,Chrome可以在--print-to-pdf 的无头模式以导出网页的 PDF.这适用于通过 GET 请求访问的页面.

According to the docs, Chrome can be started in headless mode with --print-to-pdf in order to export a PDF of a web page. This works well for pages accessible with a GET request.

试图找到一个打印到 PDF 的解决方案,允许我在从 Chrome 中执行多个导航请求后导出 PDF.示例:打开 google.com,输入搜索查询,点击第一个结果链接,导出为 PDF.

Trying to find a print-to-pdf solution that would allow me to export a PDF after executing multiple navigation request from within Chrome. Example: open google.com, input a search query, click the first result link, export to PDF.

查看 [非常有限的可用数量] 文档和示例,我未能找到一种方法来指示 Chrome 在页面加载后导出 PDF.我正在使用 Java chrome-driver.

Looking at the [very limited amount of available] docs and samples, I failed to find a way to instruct Chrome to export a PDF, after a page loads. I'm using the Java chrome-driver.

一种不涉及 Chrome 的可能解决方案是使用 wkhtmltopdf 之类的工具.走这条路会迫使我 - 在将 HTML 发送到工具之前 - 执行以下操作:

One possible solution not involving Chrome, is by using a tool like wkhtmltopdf. Going on this path would force me to - before sending the HTML to the tool - do the following:

  • 将 HTML 保存在本地文件中
  • 遍历DOM,下载所有文件链接(图片、js、css等)

不喜欢此路径,因为它需要我进行大量修改 [我假设] 才能使下载的文件路径正确,以便 wkhtmltopdf 正确读取.

Don't prefer this path as it would require a lot of tinkering [I assume] on my part to get downloads' file paths correct for wkhtmltopdf to read correctly.

有没有办法指示 Chrome 打印到 PDF,但只能在页面加载后?

Is there a way to instruct Chrome to print to PDF, but only after a page loads?

推荐答案

由于没有答案,我将解释我的解决方法.我没有尝试寻找如何从 Chrome 请求打印当前页面,而是选择了另一条路线.

As there are no answers, I will explain my workaround. Instead of trying to find how to request from Chrome to print the current page, I went down another route.

对于本示例,我们将尝试从 Google 下载查询示例"的结果页面:

For this example we will try to download the results page from Google on the query 'example':

  1. 使用driver.get("google.com")导航,输入查询'example',点击'Google Search'
  2. 等待结果页面加载
  3. 使用 driver.getPageSource()
  4. 获取页面源
  5. 使用例如解析源Jsoup 以重新映射所有相关链接以指向为此目的定义的端点(如下所述) - 例如 localhost:8080.链接./style.css"将变为localhost:8080/style.css"
  6. 将 HTML 保存到文件中,例如命名为查询示例"
  7. 运行 chrome --print-to-pdf localhost:8080/search?id=query-example
  1. Navigate with driver.get("google.com"), input the query 'example', click 'Google Search'
  2. Wait for the results page to load
  3. Retrieve the page source with driver.getPageSource()
  4. Parse source with e.g. Jsoup in order to remap all relative links to point to an endpoint defined for this purpose (explained below) - example to localhost:8080. Link './style.css' would become 'localhost:8080/style.css'
  5. Save HTML to a file, e.g. named 'query-example'
  6. Run chrome --print-to-pdf localhost:8080/search?id=query-example

chrome 会从我们的控制器请求 HTML,并且对于我们返回的 HTML 中定义的资源,它将转到我们的控制器——因为我们重新映射了相关链接——这反过来又将该请求转发给真正的资源的位置 - google.com.下面是一个 Spring 控制器示例,请注意该示例不完整,仅供参考.

What will happen is that chrome will request the HTML from our controller, and for resources defined in the HTML we return, it will go to our controller - since we remapped relative links - which will in turn forward that request to the real location of the resource - google.com. Below is an example Spring controller, and note that the example is incomplete and is here only as a guidance.

@RestController
@RequestMapping
public class InternationalOffloadRestController {
  @RequestMapping(method = RequestMethod.GET, value = "/search/html")
  public String getHtml(@RequestParam("id") String id) {
    File file = new File("location of the HTML file", id);
    try (FileInputStream input = new FileInputStream(file)) {
      return IOUtils.toString(input, HTML_ENCODING);
    }
  }
  @RequestMapping("/**") // forward all remapped links to google.com
  public void forward(HttpServletResponse httpServletResponse, ...) {
    URI uri = new URI("https", null, "google.com", -1, 
      request.getRequestURI(), request.getQueryString(), null);
    httpServletResponse.setHeader("Location", uri.toString());
    httpServletResponse.setStatus(HttpServletResponse.SC_MOVED_PERMANENTLY);
  }
}

这篇关于页面加载后 ChromeDriver --print-to-pdf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆