从 file_get_contents() 确定数据类型 [英] Determine data type from file_get_contents()

查看:28
本文介绍了从 file_get_contents() 确定数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用 PHP 编写一个命令行应用程序,它接受本地输入文件的路径作为参数.输入文件将包含以下内容之一:

I'm writing a command line application in PHP that accepts a path to a local input file as an argument. The input file will contain one of the following things:

  • JSON 编码的关联数组
  • 关联数组的 serialized() 版本
  • serialized() 关联数组的 ba​​se 64 编码版本
  • Base 64 编码的 JSON 编码关联数组
  • 一个普通的旧 PHP 关联数组
  • 垃圾
  • JSON encoded associative array
  • A serialized() version of the associative array
  • A base 64 encoded version of the serialized() associative array
  • Base 64 encoded JSON encoded associative array
  • A plain old PHP associative array
  • Rubbish

简而言之,有几个不同的程序我无法控制,一旦我真正弄清楚格式,它们就会以我可以理解的统一方式写入这个文件.一旦我弄清楚如何摄取数据,我就可以运行它了.

In short, there are several dissimilar programs that I have no control over that will be writing to this file, in a uniform way that I can understand, once I actually figure out the format. Once I figure out how to ingest the data, I can just run with it.

我正在考虑的是:

  • 如果文件的第一个字节是 { ,尝试 json_decode() ,看看是否失败.
  • 如果文件的第一个字节是<$,尝试include(),看看是否失败.
  • 如果文件的前三个字节匹配 a:[0-9],请尝试 unserialize().
  • 如果不是前三个,尝试base64_decode(),看看是否失败.如果不:
    • 再次检查解码数据的第一个字节.
    • 如果这一切都失败了,那就是垃圾.
    • If the first byte of the file is { , try json_decode(), see if it fails.
    • If the first byte of the file is < or $, try include(), see if it fails.
    • if the first three bytes of the file match a:[0-9], try unserialize().
    • If not the first three, try base64_decode(), see if it fails. If not:
      • Check the first bytes of the decoded data, again.
      • If all of that fails, it's rubbish.

      对于一项相当简单的任务来说,这似乎相当昂贵.我能以更好的方式做吗?如果是这样,如何?

      That just seems quite expensive for quite a simple task. Could I be doing it in a better way? If so, how?

      推荐答案

      这里没有太多需要优化的地方.魔术字节方法已经是可行的方法.但是当然可以避免实际的反序列化功能.为每个使用验证正则表达式是可行的(尽管模因通常比让 PHP 实际解压嵌套数组更快).

      There isn't much to optimize here. The magic bytes approach is already the way to go. But of course the actual deserialization functions can be avoided. It's feasible to use a verification regex for each instead (which despite the meme are often faster than having PHP actually unpack a nested array).

      base64 很容易探测.

      json 可以用正则表达式检查.检查字符串是 PHP 中的 JSON? 是用于在 JS 中保护它的 RFC 版本.但是写一个完整的json(?R)匹配规则是可行的.

      json can be checked with a regex. Fastest way to check if a string is JSON in PHP? is the RFC version for securing it in JS. But it would be feasible to write a complete json (?R) match rule.

      serialize 如果没有适当的解包功能,会有点困难.但是通过一些启发式方法,您已经可以断言它是一个序列化 blob.

      serialize is a bit more difficult without a proper unpack function. But with some heuristics you can already assert that it's a serialize blob.

      php 数组脚本.或者,如果格式和数据受到足够的限制,再次使用正则表达式.

      php array scripts can be probed a bit faster with token_get_all. Or if the format and data is constrained enough, again with a regex.

      这里更重要的问题是,您需要可靠性还是简单性和速度?

      The more important question here is, do you need reliability - or simplicity and speed?

      这篇关于从 file_get_contents() 确定数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆