如何在不使用幻数的情况下将文件表示为SVG? [英] How can I say a file is SVG without using a magic number?
问题描述
SVG
文件基本上是一个XML
文件,因此我可以使用字符串<?xml
(或十六进制表示形式:'3c 3f 78 6d 6c'
)作为幻数,但是有一些相反的原因不这样做:例如,有多余的空格可能会破坏此检查.
An SVG
file is basically an XML
file so I could use the string <?xml
(or the hex representation: '3c 3f 78 6d 6c'
) as a magic number but there are a few opposing reason not to do that if for example there are extra white-spaces it could break this check.
我需要/期望检查的其他图像都是二进制文件,并且具有幻数.如何快速检查文件是否为SVG
格式,而最终不使用Python来使用扩展名?
The other images I need/expect to check are all binaries and have magic numbers. How can I fast check if the file is an SVG
format without using the extension eventually using Python?
推荐答案
XML不需要以<?xml
开头开头,因此测试该前缀不是一种好的检测技术-更不用说它可以识别每个前缀XML作为SVG.一个不错的检测方法而且真的很容易实现,它是使用一个真正的XML解析器来测试该文件是否为格式正确的XML,其中包含svg
顶级元素:
XML is not required to start with the <?xml
preamble, so testing for that prefix is not a good detection technique — not to mention that it would identify every XML as SVG. A decent detection, and really easy to implement, is to use a real XML parser to test that the file is well-formed XML that contains the svg
top-level element:
import xml.etree.cElementTree as et
def is_svg(filename):
tag = None
with open(filename, "r") as f:
try:
for event, el in et.iterparse(f, ('start',)):
tag = el.tag
break
except et.ParseError:
pass
return tag == '{http://www.w3.org/2000/svg}svg'
使用cElementTree
可确保通过使用expat进行检测; timeit
显示在大约200μs内检测到SVG文件,在35μs内检测到非SVG文件. iterparse
API使解析器可以放弃创建整个元素树(尽管具有模块名称),并且仅读取文档的初始部分,而不管文件的总大小如何.
Using cElementTree
ensures that the detection is efficient through the use of expat; timeit
shows that an SVG file was detected as such in ~200μs, and a non-SVG in 35μs. The iterparse
API enables the parser to forego creating the whole element tree (module name notwithstanding) and only read the initial portion of the document, regardless of total file size.
这篇关于如何在不使用幻数的情况下将文件表示为SVG?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!