如何从 RAM 中完全解析压缩文件? [英] How to parse a zipped file completely from RAM?

查看:29
本文介绍了如何从 RAM 中完全解析压缩文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我需要解析一些不同类型的 zip 文件(为了一个或另一个目的获取一些内部文件内容,包括获取它们的名称).

有些文件无法通过文件路径访问,因为 Android 有 Uri 可以访问它们,有时 zip 文件位于另一个 zip 文件中.随着使用 SAF 的推动,在某些情况下更不可能使用文件路径.

为此,我们有两种主要的处理方式:我的 C 很生疏,所以我确信还有一些工作要做.我已经更新了代码,使其更加健壮,并允许释放内存.

native-lib.cpp

extern "C" {静态 jbyteArray *_holdBuffer = NULL;静态作业_directBuffer = NULL;/*此例程不可重入且一次只能处理一个缓冲区.如果缓冲区是分配然后它必须在分配下一个之前释放.*/JNI出口jobject JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_allocate(JNIEnv *env, jobject obj, jlong​​ size) {如果(_holdBuffer != NULL || _directBuffer != NULL){__android_log_print(ANDROID_LOG_ERROR, "JNI 例程","在 freeBuffer() 之前调用 JNI 分配()");返回空;}//直接缓冲区的最大大小是 jint 的最大值,即使 NewDirectByteBuffer 需要一个//长.夹具最大尺寸如下:if (size > SIZE_T_MAX || size > INT_MAX || size <= 0) {jlong​​ maxSize = SIZE_T_MAX <INT_MAX ?SIZE_T_MAX : INT_MAX;__android_log_print(ANDROID_LOG_ERROR, "JNI 例程","本机内存分配请求必须 > 0 且 <= %lld 但为 %lld.\n",最大尺寸,大小);返回空;}jbyteArray *array = (jbyteArray *) malloc(static_cast(size));如果(数组 == NULL){__android_log_print(ANDROID_LOG_ERROR, "JNI 例程","无法分配 %lld 字节的本机内存.\n",尺寸);返回空;}jobject directBuffer = env->NewDirectByteBuffer(array, size);如果(直接缓冲区 == NULL){免费(数组);__android_log_print(ANDROID_LOG_ERROR, "JNI 例程","无法创建大小为 %lld 的直接缓冲区.\n",尺寸);返回空;}//memset() 并不是真正需要的,但我们在这里调用它来强制 Android 计数//统计中消耗的内存,因为它似乎只计算"脏页.(?)memset(array, 0xFF, static_cast(size));_holdBuffer = 数组;//获取对直接缓冲区的全局引用,以便 Java 不会对其进行 GC._directBuffer = env->NewGlobalRef(directBuffer);返回直接缓冲区;}JNIEXPORT void JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_freeBuffer(JNIEnv *env, jobject obj, jobject directBuffer) {如果(_directBuffer == NULL || _holdBuffer == NULL){__android_log_print(ANDROID_LOG_ERROR, "JNI 例程","尝试释放未分配的缓冲区.");返回;}jbyteArray *bufferLoc = (jbyteArray *) env->GetDirectBufferAddress(directBuffer);如果(bufferLoc == NULL){__android_log_print(ANDROID_LOG_ERROR, "JNI 例程","无法检索与 ByteBuffer 关联的直接缓冲区位置.");返回;}如果(bufferLoc != _holdBuffer){__android_log_print(ANDROID_LOG_ERROR, "JNI 例程","DirectBuffer 与分配的不匹配.");返回;}//释放 malloc 的缓冲区和全局引用.Java 不能 GC 直接缓冲区.自由(bufferLoc);env->DeleteGlobalRef(_directBuffer);_holdBuffer = NULL;_directBuffer = NULL;}}

我还更新了数组持有者:

class JniByteArrayHolder {外部乐趣分配(大小:长):字节缓冲区外部乐趣 freeBuffer(byteBuffer: ByteBuffer)伴生对象{在里面 {System.loadLibrary("本机库")}}}

我可以确认此代码与 Botje 此处 提供的 ByteBufferChannel 类一起工作适用于 API 24 之前的 Android 版本.SeekableByteChannel 接口是在 API 24 中引入的,ZipFile 实用程序需要此接口.

可以分配的最大缓冲区大小是一个 jint 的大小,这是由于 JNI 的限制.可以容纳更大的数据(如果可用),但需要多个缓冲区和处理它们的方法.

这是示例应用的主要活动.较早的版本总是假定 InputStream 读取缓冲区在尝试将其放入 ByteBuffer 时总是被填充并出错.这是固定的.

MainActivity.kt

class MainActivity : AppCompatActivity() {覆盖 fun onCreate(savedInstanceState: Bundle?) {super.onCreate(savedInstanceState)setContentView(R.layout.activity_main)}有趣的点击(查看:查看){button.isEnabled = falsestatus.text = getString(R.string.running)线 {printMemStats("缓冲区分配前:")变量缓冲区大小 = 0L//testzipfile.zip 不是项目的一部分,但任何 zip 都可以通过//要测试的设备文件管理器或 adb.val fileToRead = "$filesDir/testzipfile.zip"val inStream =如果(文件(文件到读取).存在()){FileInputStream(fileToRead).apply {bufferSize = getFileSize(this)关闭()}文件输入流(文件读取)} 别的 {//如果 testzipfile.zip 不存在,我们只看这个//是 APK 的一部分.resources.openRawResource(R.raw.appapk).apply {bufferSize = getFileSize(this)关闭()}resources.openRawResource(R.raw.appapk)}//在本机内存(堆外)中分配缓冲区.val jniByteArrayHolder = JniByteArrayHolder()val 字节缓冲区 =如果(缓冲区大小!= 0L){jniByteArrayHolder.allocate(bufferSize)?.apply {printMemStats("缓冲区分配后")}} 别的 {空值}if (byteBuffer == null) {Log.d("Applog", "无法分配 $bufferSize 字节的本机内存.")} 别的 {Log.d("Applog", "已分配 ${Formatter.formatFileSize(this, bufferSize)} 缓冲区.")val inBytes = ByteArray(4096)Log.d("Applog", "开始缓冲读取...")而 (inStream.available() > 0) {byteBuffer.put(inBytes, 0, inStream.read(inBytes))}inStream.close()byteBuffer.flip()ZipFile(ByteBufferChannel(byteBuffer)).use {Log.d("Applog", "开始 Zip 文件名转储...")for (entry in it.entries) {Log.d("Applog", "邮编名称:${entry.name}")val zis = it.getInputStream(entry)而(zis.available()> 0){zis.read(inBytes)}}}printMemStats("缓冲区释放前:")jniByteArrayHolder.freeBuffer(byteBuffer)printMemStats("缓冲区释放后:")}runOnUiThread {status.text = getString(R.string.idle)button.isEnabled = trueLog.d("Applog", "完成!")}}}/*这个函数有点误导,因为它不能反映内存的真实状态.本机缓冲区分配后,它等待直到内存被使用,然后计数为用过的.释放后,似乎没有把内存算作释放直到垃圾收藏.(仅限我的观察.)另外,请参阅 native-lib.cpp 中 memset() 的注释这是这个项目的成员.*/私人乐趣 printMemStats(desc: String? = null) {val memoryInfo = ActivityManager.MemoryInfo()(getSystemService(Context.ACTIVITY_SERVICE) 作为 ActivityManager).getMemoryInfo(memoryInfo)val nativeHeapSize = memoryInfo.totalMemval nativeHeapFreeSize = memoryInfo.availMemval usedMemInBytes = nativeHeapSize - nativeHeapFreeSizeval usedMemInPercentage = usedMemInBytes * 100/nativeHeapSizeval sDesc = desc?.run { "$this:\n" }日志.d("AppLog", "$sDesc 总计:${Formatter.formatFileSize(this, nativeHeapSize)} " +免费:${Formatter.formatFileSize(this, nativeHeapFreeSize)}" +使用:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)")}//这不是一个很好的方法,但不是演示的对象.私人乐趣 getFileSize(inStream: InputStream): Long {变量缓冲区大小 = 0L而 (inStream.available() > 0) {val toSkip = inStream.available().toLong()inStream.skip(toSkip)bufferSize += toSkip}返回缓冲区大小}}

示例 GitHub 存储库位于 此处.

Background

I need to parse some zip files of various types (getting some inner files content for one purpose or another, including getting their names).

Some of the files are not reachable via file-path, as Android has Uri to reach them, and as sometimes the zip file is inside another zip file. With the push to use SAF, it's even less possible to use file-path in some cases.

For this, we have 2 main ways to handle: ZipFile class and ZipInputStream class.

The problem

When we have a file-path, ZipFile is a perfect solution. It's also very efficient in terms of speed.

However, for the rest of the cases, ZipInputStream could reach issues, such as this one, which has a problematic zip file, and cause this exception:

  java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor
        at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:321)
        at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:124)

What I've tried

The only always-working solution would be to copy the file to somewhere else, where you could parse it using ZipFile, but this is inefficient and requires you to have free storage, as well as remove the file when you are done with it.

So, what I've found is that Apache has a nice, pure Java library (here) to parse Zip files, and for some reason its InputStream solution (called "ZipArchiveInputStream") seem even more efficient than the native ZipInputStream class.

As opposed to what we have in the native framework, the library offers a bit more flexibility. I could, for example, load the entire zip file into bytes array, and let the library handle it as usual, and this works even for the problematic Zip files I've mentioned:

org.apache.commons.compress.archivers.zip.ZipFile(SeekableInMemoryByteChannel(byteArray)).use { zipFile ->
    for (entry in zipFile.entries) {
      val name = entry.name
      ... // use the zipFile like you do with native framework

gradle dependency:

// http://commons.apache.org/proper/commons-compress/ https://mvnrepository.com/artifact/org.apache.commons/commons-compress
implementation 'org.apache.commons:commons-compress:1.20'

Sadly, this isn't always possible, because it depends on having the heap memory hold the entire zip file, and on Android it gets even more limited, because the heap size could be relatively small (heap could be 100MB while the file is 200MB). As opposed to a PC which can have a huge heap memory being set, for Android it's not flexible at all.

So, I searched for a solution that has JNI instead, to have the entire ZIP file loaded into byte array there, not going to the heap (at least not entirely). This could be a nicer workaround because if the ZIP could be fit in the device's RAM instead of the heap, it could prevent me from reaching OOM while also not needing to have an extra file.

I've found this library called "larray" which seems promising , but sadly when I tried using it, it crashed, because its requirements include having a full JVM, meaning not suitable for Android.

EDIT: seeing that I can't find any library and any built-in class, I tried to use JNI myself. Sadly I'm very rusty with it, and I looked at an old repository I've made a long time ago to perform some operations on Bitmaps (here). This is what I came up with :

native-lib.cpp

#include <jni.h>
#include <android/log.h>
#include <cstdio>
#include <android/bitmap.h>
#include <cstring>
#include <unistd.h>

class JniBytesArray {
public:
    uint32_t *_storedData;

    JniBytesArray() {
        _storedData = NULL;
    }
};

extern "C" {
JNIEXPORT jobject JNICALL Java_com_lb_myapplication_JniByteArrayHolder_allocate(
        JNIEnv *env, jobject obj, jlong size) {
    auto *jniBytesArray = new JniBytesArray();
    auto *array = new uint32_t[size];
    for (int i = 0; i < size; ++i)
        array[i] = 0;
    jniBytesArray->_storedData = array;
    return env->NewDirectByteBuffer(jniBytesArray, 0);
}
}

JniByteArrayHolder.kt

class JniByteArrayHolder {
    external fun allocate(size: Long): ByteBuffer

    companion object {
        init {
            System.loadLibrary("native-lib")
        }
    }
}

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        thread {
            printMemStats()
            val jniByteArrayHolder = JniByteArrayHolder()
            val byteBuffer = jniByteArrayHolder.allocate(1L * 1024L)
            printMemStats()
        }
    }

    fun printMemStats() {
        val memoryInfo = ActivityManager.MemoryInfo()
        (getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
        val nativeHeapSize = memoryInfo.totalMem
        val nativeHeapFreeSize = memoryInfo.availMem
        val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
        val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
        Log.d("AppLog", "total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
                "free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
                "used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)")
    }

This doesn't seem right, because if I try to create a 1GB byte array using jniByteArrayHolder.allocate(1L * 1024L * 1024L * 1024L) , it crashes without any exception or error logs.

The questions

  1. Is it possible to use JNI for Apache's library, so that it will handle the ZIP file content which is contained within JNI's "world" ?

  2. If so, how can I do it? Is there any sample of how to do it? Is there a class for it? Or do I have to implement it myself? If so, can you please show how it's done in JNI?

  3. If it's not possible, what other way is there to do it? Maybe alternative to what Apache has?

  4. For the solution of JNI, how come it doesn't work well ? How could I efficiently copy the bytes from the stream into the JNI byte array (my guess is that it will be via a buffer)?

解决方案

I took a look at the JNI code you posted and made a couple of changes. Mostly it is defining the size argument for NewDirectByteBuffer and using malloc().

Here is the output of the log after allocating 800mb:

D/AppLog: total:1.57 GB free:1.03 GB used:541 MB (34%)
D/AppLog: total:1.57 GB free:247 MB used:1.32 GB (84%)

And the following is what the buffer looks like after the allocation. As you can see, the debugger is reporting a limit of 800mb which is what we expect.

My C is very rusty, so I am sure that there is some work to be done. I have updated the code to be a little more robust and to allow for the freeing of memory.

native-lib.cpp

extern "C" {
static jbyteArray *_holdBuffer = NULL;
static jobject _directBuffer = NULL;
/*
    This routine is not re-entrant and can handle only one buffer at a time. If a buffer is
    allocated then it must be released before the next one is allocated.
 */
JNIEXPORT
jobject JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_allocate(
        JNIEnv *env, jobject obj, jlong size) {
    if (_holdBuffer != NULL || _directBuffer != NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Call to JNI allocate() before freeBuffer()");
        return NULL;
    }

    // Max size for a direct buffer is the max of a jint even though NewDirectByteBuffer takes a
    // long. Clamp max size as follows:
    if (size > SIZE_T_MAX || size > INT_MAX || size <= 0) {
        jlong maxSize = SIZE_T_MAX < INT_MAX ? SIZE_T_MAX : INT_MAX;
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Native memory allocation request must be >0 and <= %lld but was %lld.\n",
                            maxSize, size);
        return NULL;
    }

    jbyteArray *array = (jbyteArray *) malloc(static_cast<size_t>(size));
    if (array == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to allocate %lld bytes of native memory.\n",
                            size);
        return NULL;
    }

    jobject directBuffer = env->NewDirectByteBuffer(array, size);
    if (directBuffer == NULL) {
        free(array);
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to create direct buffer of size %lld.\n",
                            size);
        return NULL;
    }
    // memset() is not really needed but we call it here to force Android to count
    // the consumed memory in the stats since it only seems to "count" dirty pages. (?)
    memset(array, 0xFF, static_cast<size_t>(size));
    _holdBuffer = array;

    // Get a global reference to the direct buffer so Java isn't tempted to GC it.
    _directBuffer = env->NewGlobalRef(directBuffer);
    return directBuffer;
}

JNIEXPORT void JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_freeBuffer(
        JNIEnv *env, jobject obj, jobject directBuffer) {

    if (_directBuffer == NULL || _holdBuffer == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Attempt to free unallocated buffer.");
        return;
    }

    jbyteArray *bufferLoc = (jbyteArray *) env->GetDirectBufferAddress(directBuffer);
    if (bufferLoc == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to retrieve direct buffer location associated with ByteBuffer.");
        return;
    }

    if (bufferLoc != _holdBuffer) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "DirectBuffer does not match that allocated.");
        return;
    }

    // Free the malloc'ed buffer and the global reference. Java can not GC the direct buffer.
    free(bufferLoc);
    env->DeleteGlobalRef(_directBuffer);
    _holdBuffer = NULL;
    _directBuffer = NULL;
}
}

I also updated the array holder:

class JniByteArrayHolder {
    external fun allocate(size: Long): ByteBuffer
    external fun freeBuffer(byteBuffer: ByteBuffer)

    companion object {
        init {
            System.loadLibrary("native-lib")
        }
    }
}

I can confirm that this code along with the ByteBufferChannel class provided by Botje here works for Android versions before API 24. The SeekableByteChannel interface was introduced in API 24 and is needed by the ZipFile utility.

The maximum buffer size that can be allocated is the size of a jint and is due to the limitation of JNI. Larger data can be accommodated (if available) but would require multiple buffers and a way to handle them.

Here is the main activity for the sample app. An earlier version always assumed the the InputStream read buffer was was always filled and errored out when trying to put it to the ByteBuffer. This was fixed.

MainActivity.kt

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
    }

    fun onClick(view: View) {
        button.isEnabled = false
        status.text = getString(R.string.running)

        thread {
            printMemStats("Before buffer allocation:")
            var bufferSize = 0L
            // testzipfile.zip is not part of the project but any zip can be uploaded through the
            // device file manager or adb to test.
            val fileToRead = "$filesDir/testzipfile.zip"
            val inStream =
                if (File(fileToRead).exists()) {
                    FileInputStream(fileToRead).apply {
                        bufferSize = getFileSize(this)
                        close()
                    }
                    FileInputStream(fileToRead)
                } else {
                    // If testzipfile.zip doesn't exist, we will just look at this one which
                    // is part of the APK.
                    resources.openRawResource(R.raw.appapk).apply {
                        bufferSize = getFileSize(this)
                        close()
                    }
                    resources.openRawResource(R.raw.appapk)
                }
            // Allocate the buffer in native memory (off-heap).
            val jniByteArrayHolder = JniByteArrayHolder()
            val byteBuffer =
                if (bufferSize != 0L) {
                    jniByteArrayHolder.allocate(bufferSize)?.apply {
                        printMemStats("After buffer allocation")
                    }
                } else {
                    null
                }

            if (byteBuffer == null) {
                Log.d("Applog", "Failed to allocate $bufferSize bytes of native memory.")
            } else {
                Log.d("Applog", "Allocated ${Formatter.formatFileSize(this, bufferSize)} buffer.")
                val inBytes = ByteArray(4096)
                Log.d("Applog", "Starting buffered read...")
                while (inStream.available() > 0) {
                    byteBuffer.put(inBytes, 0, inStream.read(inBytes))
                }
                inStream.close()
                byteBuffer.flip()
                ZipFile(ByteBufferChannel(byteBuffer)).use {
                    Log.d("Applog", "Starting Zip file name dump...")
                    for (entry in it.entries) {
                        Log.d("Applog", "Zip name: ${entry.name}")
                        val zis = it.getInputStream(entry)
                        while (zis.available() > 0) {
                            zis.read(inBytes)
                        }
                    }
                }
                printMemStats("Before buffer release:")
                jniByteArrayHolder.freeBuffer(byteBuffer)
                printMemStats("After buffer release:")
            }
            runOnUiThread {
                status.text = getString(R.string.idle)
                button.isEnabled = true
                Log.d("Applog", "Done!")
            }
        }
    }

    /*
        This function is a little misleading since it does not reflect the true status of memory.
        After native buffer allocation, it waits until the memory is used before counting is as
        used. After release, it doesn't seem to count the memory as released until garbage
        collection. (My observations only.) Also, see the comment for memset() in native-lib.cpp
        which is a member of this project.
    */
    private fun printMemStats(desc: String? = null) {
        val memoryInfo = ActivityManager.MemoryInfo()
        (getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
        val nativeHeapSize = memoryInfo.totalMem
        val nativeHeapFreeSize = memoryInfo.availMem
        val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
        val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
        val sDesc = desc?.run { "$this:\n" }
        Log.d(
            "AppLog", "$sDesc total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
                    "free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
                    "used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)"
        )
    }

    // Not a great way to do this but not the object of the demo.
    private fun getFileSize(inStream: InputStream): Long {
        var bufferSize = 0L
        while (inStream.available() > 0) {
            val toSkip = inStream.available().toLong()
            inStream.skip(toSkip)
            bufferSize += toSkip
        }
        return bufferSize
    }
}

A sample GitHub repository is here.

这篇关于如何从 RAM 中完全解析压缩文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆