使用来自https的RMagick读取pdf会产生未经授权的错误 [英] Reading pdfs with RMagick from https gives a not authorized error

查看:362
本文介绍了使用来自https的RMagick读取pdf会产生未经授权的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试阅读pdf并将第一页保存为图像。此方法适用于http,但它不适用于https。

I am trying to read a pdf and save the first page as an image. This method works for http, but it doesn't work for https.

require 'RMagick'

url = "http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf"
image = Magick::Image.read(url + "[0]")
=> [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf[0]=>tud-ke-2008-07.pdf PDF 595x842 595x842+0+0 DirectClass 16-bit 27kb]

url = "https://www.cs.purdue.edu/homes/dgleich/publications/Gleich%202003%20-%20Machine%20Learning%20in%20Computer%20Chess.pdf"
image = Magick::Image.read(url + "[0]")
Magick::ImageMagickError: not authorized `//www.cs.purdue.edu/homes/dgleich/publications/Gleich%202003%20-%20Machine%20Learning%20in%20Computer%20Chess.pdf' @ error/constitute.c/ReadImage/454

policy.xml文件如下所示,未经编辑:

The policy.xml file looks like this without having edited it:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE policymap [
<!ELEMENT policymap (policy)+>
<!ELEMENT policy (#PCDATA)>
<!ATTLIST policy domain (delegate|coder|filter|path|resource) #IMPLIED>
<!ATTLIST policy name CDATA #IMPLIED>
<!ATTLIST policy rights CDATA #IMPLIED>
<!ATTLIST policy pattern CDATA #IMPLIED>
<!ATTLIST policy value CDATA #IMPLIED>
]>
<!--
  Configure ImageMagick policies.

  Domains include system, delegate, coder, filter, path, or resource.

  Rights include none, read, write, and execute.  Use | to combine them,
  for example: "read | write" to permit read from, or write to, a path.

  Use a glob expression as a pattern.

  Suppose we do not want users to process MPEG video images:

    <policy domain="delegate" rights="none" pattern="mpeg:decode" />

  Here we do not want users reading images from HTTP:

    <policy domain="coder" rights="none" pattern="HTTP" />

  Lets prevent users from executing any image filters:

    <policy domain="filter" rights="none" pattern="*" />

  The /repository file system is restricted to read only.  We use a glob
  expression to match all paths that start with /repository:

    <policy domain="path" rights="read" pattern="/repository/*" />

  Any large image is cached to disk rather than memory:

  Define arguments for the memory, map, area, and disk resources with
  SI prefixes (.e.g 100MB).  In addition, resource policies are maximums for
  each instance of ImageMagick (e.g. policy memory limit 1GB, -limit 2GB
  exceeds policy maximum so memory limit is 1GB).
-->
<policymap>
  <!-- <policy domain="system" name="precision" value="6"/> -->
  <!-- <policy domain="resource" name="temporary-path" value="/tmp"/> -->
  <!-- <policy domain="resource" name="memory" value="2GiB"/> -->
  <!-- <policy domain="resource" name="map" value="4GiB"/> -->
  <!-- <policy domain="resource" name="area" value="1GB"/> -->
  <!-- <policy domain="resource" name="disk" value="16EB"/> -->
  <!-- <policy domain="resource" name="file" value="768"/> -->
  <!-- <policy domain="resource" name="thread" value="4"/> -->
  <!-- <policy domain="resource" name="throttle" value="0"/> -->
  <!-- <policy domain="resource" name="time" value="3600"/> -->
  <policy domain="coder" rights="none" pattern="EPHEMERAL" />
  <policy domain="coder" rights="none" pattern="URL" />
  <policy domain="coder" rights="none" pattern="HTTPS" />
  <policy domain="coder" rights="none" pattern="MVG" />
  <policy domain="coder" rights="none" pattern="MSL" />
  <policy domain="coder" rights="none" pattern="TEXT" />
  <policy domain="coder" rights="none" pattern="SHOW" />
  <policy domain="coder" rights="none" pattern="WIN" />
  <policy domain="coder" rights="none" pattern="PLT" />
  <policy domain="path" rights="none" pattern="@*" />
</policymap>


推荐答案

听起来你的imagemagick政策文件不允许访问https。这是通过看起来像

It sounds like your imagemagick policy file doesn't allow access to https. This is done with a directive that looks like

<policy domain="coder" rights="none" pattern="HTTPS" />

这是最近一轮imagemagick 安全漏洞

This was part of the recommended policy.xml following a recent round of imagemagick security vulnerabilities.

您当然可以编辑policy.xml来删除它(我不知道)我不知道如果文件完全丢失,imagemagick是否会抱怨,但如果您的托管服务提供商依赖这些动机,这可能会让您对这些漏洞持开放态度

You could of course edit policy.xml to remove this (I don't know off the top of my head whether imagemagick will complain if the file is missing entirely) however this may leave you open to these vulnerabilities if your hosting provider relied on these motivations

另一种选择是下载文件,然后让Rmagick读取该本地文件 - 该策略仅限制ImageMagick执行https访问。

Another option is to download the file, and then ask Rmagick to read that local file - the policy only restricts ImageMagick from doing the https access itself.

这篇关于使用来自https的RMagick读取pdf会产生未经授权的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆