如何完全从RAM中解析一个压缩文件? [英] How to parse a zipped file completely from RAM?

查看:135
本文介绍了如何完全从RAM中解析一个压缩文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析一些各种类型的zip文件(出于某种目的而获取一些内部文件内容,包括获取其名称).

I need to parse some zip files of various types (getting some inner files content for one purpose or another, including getting their names).

某些文件无法通过文件路径访问,因为Android具有Uri可以访问它们,并且有时zip文件位于另一个zip文件中.随着使用SAF的推动,在某些情况下甚至不可能使用文件路径.

Some of the files are not reachable via file-path, as Android has Uri to reach them, and as sometimes the zip file is inside another zip file. With the push to use SAF, it's even less possible to use file-path in some cases.

为此,我们有2种主要处理方式: ZipFile 类和 ZipInputStream 类.

For this, we have 2 main ways to handle: ZipFile class and ZipInputStream class.

当我们拥有文件路径时,ZipFile是一个完美的解决方案.就速度而言,它也非常有效.

When we have a file-path, ZipFile is a perfect solution. It's also very efficient in terms of speed.

但是,在其余情况下,ZipInputStream可能会遇到问题,例如此问题,其中的zip文件有问题,并导致此异常:

However, for the rest of the cases, ZipInputStream could reach issues, such as this one, which has a problematic zip file, and cause this exception:

  java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor
        at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:321)
        at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:124)

我尝试过的

唯一可行的解​​决方案是将文件复制到其他地方,您可以使用ZipFile对其进行解析,但这效率低下,需要您有免费的存储空间,并在完成后删除文件它.

What I've tried

The only always-working solution would be to copy the file to somewhere else, where you could parse it using ZipFile, but this is inefficient and requires you to have free storage, as well as remove the file when you are done with it.

因此,我发现Apache具有一个不错的纯Java库( 此处 )来解析Zip文件,由于某种原因,其InputStream解决方案(称为"ZipArchiveInputStream")似乎比本机ZipInputStream类更有效.

So, what I've found is that Apache has a nice, pure Java library (here) to parse Zip files, and for some reason its InputStream solution (called "ZipArchiveInputStream") seem even more efficient than the native ZipInputStream class.

与本机框架相比,该库提供了更多的灵活性.例如,我可以将整个zip文件加载到bytes数组中,然后让库像往常一样处理它,即使对于我提到的有问题的Zip文件,此方法也可以工作:

As opposed to what we have in the native framework, the library offers a bit more flexibility. I could, for example, load the entire zip file into bytes array, and let the library handle it as usual, and this works even for the problematic Zip files I've mentioned:

org.apache.commons.compress.archivers.zip.ZipFile(SeekableInMemoryByteChannel(byteArray)).use { zipFile ->
    for (entry in zipFile.entries) {
      val name = entry.name
      ... // use the zipFile like you do with native framework

等级依赖性:

// http://commons.apache.org/proper/commons-compress/ https://mvnrepository.com/artifact/org.apache.commons/commons-compress
implementation 'org.apache.commons:commons-compress:1.20'

遗憾的是,这并非总是可能的,因为这取决于让堆内存容纳整个zip文件,而在Android上,这甚至受到更大的限制,因为堆的大小可能相对较小(堆可能为100MB,而堆可能为100MB).文件是200MB).与可能设置了巨大堆内存的PC相比,对于Android而言,它根本不灵活.

Sadly, this isn't always possible, because it depends on having the heap memory hold the entire zip file, and on Android it gets even more limited, because the heap size could be relatively small (heap could be 100MB while the file is 200MB). As opposed to a PC which can have a huge heap memory being set, for Android it's not flexible at all.

因此,我搜索了一个具有JNI的解决方案,以将整个ZIP文件加载到那里的字节数组中,而不是转到堆中(至少不是全部).这可能是一个更好的解决方法,因为如果ZIP可以放在设备的RAM中而不是堆中,则可以防止我到达OOM,同时也不需要额外的文件.

So, I searched for a solution that has JNI instead, to have the entire ZIP file loaded into byte array there, not going to the heap (at least not entirely). This could be a nicer workaround because if the ZIP could be fit in the device's RAM instead of the heap, it could prevent me from reaching OOM while also not needing to have an extra file.

我找到了> 这个名为"larray"的库 ,很有希望,但可悲的是,当我尝试使用它时,它崩溃了,因为它的要求包括拥有完整的JVM,这意味着不适合Android.

I've found this library called "larray" which seems promising , but sadly when I tried using it, it crashed, because its requirements include having a full JVM, meaning not suitable for Android.

看到我找不到任何库和任何内置类,因此我尝试自己使用JNI.遗憾的是,我对此感到非常生疏,我看了我很久以前制作的旧存储库,用于在位图上执行一些操作(

seeing that I can't find any library and any built-in class, I tried to use JNI myself. Sadly I'm very rusty with it, and I looked at an old repository I've made a long time ago to perform some operations on Bitmaps (here). This is what I came up with :

native-lib.cpp

#include <jni.h>
#include <android/log.h>
#include <cstdio>
#include <android/bitmap.h>
#include <cstring>
#include <unistd.h>

class JniBytesArray {
public:
    uint32_t *_storedData;

    JniBytesArray() {
        _storedData = NULL;
    }
};

extern "C" {
JNIEXPORT jobject JNICALL Java_com_lb_myapplication_JniByteArrayHolder_allocate(
        JNIEnv *env, jobject obj, jlong size) {
    auto *jniBytesArray = new JniBytesArray();
    auto *array = new uint32_t[size];
    for (int i = 0; i < size; ++i)
        array[i] = 0;
    jniBytesArray->_storedData = array;
    return env->NewDirectByteBuffer(jniBytesArray, 0);
}
}

JniByteArrayHolder.kt

class JniByteArrayHolder {
    external fun allocate(size: Long): ByteBuffer

    companion object {
        init {
            System.loadLibrary("native-lib")
        }
    }
}

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        thread {
            printMemStats()
            val jniByteArrayHolder = JniByteArrayHolder()
            val byteBuffer = jniByteArrayHolder.allocate(1L * 1024L)
            printMemStats()
        }
    }

    fun printMemStats() {
        val memoryInfo = ActivityManager.MemoryInfo()
        (getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
        val nativeHeapSize = memoryInfo.totalMem
        val nativeHeapFreeSize = memoryInfo.availMem
        val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
        val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
        Log.d("AppLog", "total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
                "free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
                "used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)")
    }

这似乎不太正确,因为如果我尝试使用jniByteArrayHolder.allocate(1L * 1024L * 1024L * 1024L)创建1GB字节的数组,它将崩溃而没有任何异常或错误日志.

This doesn't seem right, because if I try to create a 1GB byte array using jniByteArrayHolder.allocate(1L * 1024L * 1024L * 1024L) , it crashes without any exception or error logs.

  1. 是否可以将JNI用于Apache的库,以便它将处理JNI的世界"中包含的ZIP文件内容?

  1. Is it possible to use JNI for Apache's library, so that it will handle the ZIP file content which is contained within JNI's "world" ?

如果是,该怎么办?有样品怎么做吗?有课吗?还是我必须自己实施?如果是这样,您能说明一下如何在JNI中完成吗?

If so, how can I do it? Is there any sample of how to do it? Is there a class for it? Or do I have to implement it myself? If so, can you please show how it's done in JNI?

如果不可能,还有什么其他方法可以做到?也许可以替代Apache的东西?

If it's not possible, what other way is there to do it? Maybe alternative to what Apache has?

对于JNI的解决方案,它为什么不能很好地工作?

For the solution of JNI, how come it doesn't work well ? How could I efficiently copy the bytes from the stream into the JNI byte array (my guess is that it will be via a buffer)?

推荐答案

我查看了您发布的JNI代码并进行了一些更改.通常,它是为NewDirectByteBuffer定义size参数并使用malloc().

I took a look at the JNI code you posted and made a couple of changes. Mostly it is defining the size argument for NewDirectByteBuffer and using malloc().

这是分配800mb之后的日志输出:

Here is the output of the log after allocating 800mb:

D/AppLog:总计:1.57 GB可用空间:1.03 GB已使用:541 MB(34%)
D/AppLog:总计:1.57 GB可用空间:247 MB​​已使用:1.32 GB(84%)

D/AppLog: total:1.57 GB free:1.03 GB used:541 MB (34%)
D/AppLog: total:1.57 GB free:247 MB used:1.32 GB (84%)

以下是分配后缓冲区的外观.如您所见,调试器报告的限制为800mb,这是我们期望的.

And the following is what the buffer looks like after the allocation. As you can see, the debugger is reporting a limit of 800mb which is what we expect.

我的C非常生锈,所以我确定需要做一些工作.我已经更新了代码,使其更加健壮并可以释放内存.

My C is very rusty, so I am sure that there is some work to be done. I have updated the code to be a little more robust and to allow for the freeing of memory.

native-lib.cpp

extern "C" {
static jbyteArray *_holdBuffer = NULL;
static jobject _directBuffer = NULL;
/*
    This routine is not re-entrant and can handle only one buffer at a time. If a buffer is
    allocated then it must be released before the next one is allocated.
 */
JNIEXPORT
jobject JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_allocate(
        JNIEnv *env, jobject obj, jlong size) {
    if (_holdBuffer != NULL || _directBuffer != NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Call to JNI allocate() before freeBuffer()");
        return NULL;
    }

    // Max size for a direct buffer is the max of a jint even though NewDirectByteBuffer takes a
    // long. Clamp max size as follows:
    if (size > SIZE_T_MAX || size > INT_MAX || size <= 0) {
        jlong maxSize = SIZE_T_MAX < INT_MAX ? SIZE_T_MAX : INT_MAX;
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Native memory allocation request must be >0 and <= %lld but was %lld.\n",
                            maxSize, size);
        return NULL;
    }

    jbyteArray *array = (jbyteArray *) malloc(static_cast<size_t>(size));
    if (array == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to allocate %lld bytes of native memory.\n",
                            size);
        return NULL;
    }

    jobject directBuffer = env->NewDirectByteBuffer(array, size);
    if (directBuffer == NULL) {
        free(array);
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to create direct buffer of size %lld.\n",
                            size);
        return NULL;
    }
    // memset() is not really needed but we call it here to force Android to count
    // the consumed memory in the stats since it only seems to "count" dirty pages. (?)
    memset(array, 0xFF, static_cast<size_t>(size));
    _holdBuffer = array;

    // Get a global reference to the direct buffer so Java isn't tempted to GC it.
    _directBuffer = env->NewGlobalRef(directBuffer);
    return directBuffer;
}

JNIEXPORT void JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_freeBuffer(
        JNIEnv *env, jobject obj, jobject directBuffer) {

    if (_directBuffer == NULL || _holdBuffer == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Attempt to free unallocated buffer.");
        return;
    }

    jbyteArray *bufferLoc = (jbyteArray *) env->GetDirectBufferAddress(directBuffer);
    if (bufferLoc == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to retrieve direct buffer location associated with ByteBuffer.");
        return;
    }

    if (bufferLoc != _holdBuffer) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "DirectBuffer does not match that allocated.");
        return;
    }

    // Free the malloc'ed buffer and the global reference. Java can not GC the direct buffer.
    free(bufferLoc);
    env->DeleteGlobalRef(_directBuffer);
    _holdBuffer = NULL;
    _directBuffer = NULL;
}
}

我还更新了数组持有人:

I also updated the array holder:

class JniByteArrayHolder {
    external fun allocate(size: Long): ByteBuffer
    external fun freeBuffer(byteBuffer: ByteBuffer)

    companion object {
        init {
            System.loadLibrary("native-lib")
        }
    }
}

我可以确认此代码以及Botje提供的ByteBufferChannel此处适用于API之前的Android版本24. SeekableByteChannel接口是API 24中引入的,ZipFile实用程序需要该接口.

I can confirm that this code along with the ByteBufferChannel class provided by Botje here works for Android versions before API 24. The SeekableByteChannel interface was introduced in API 24 and is needed by the ZipFile utility.

由于JNI的限制,可以分配的最大缓冲区大小是jint的大小.可以容纳更大的数据(如果可用),但是需要多个缓冲区以及一种处理它们的方法.

The maximum buffer size that can be allocated is the size of a jint and is due to the limitation of JNI. Larger data can be accommodated (if available) but would require multiple buffers and a way to handle them.

这是示例应用程序的主要活动.早期版本始终假定InputStream读取缓冲区始终被填充,并且在尝试将其放入ByteBuffer时出错.这是固定的.

Here is the main activity for the sample app. An earlier version always assumed the the InputStream read buffer was was always filled and errored out when trying to put it to the ByteBuffer. This was fixed.

MainActivity.kt

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
    }

    fun onClick(view: View) {
        button.isEnabled = false
        status.text = getString(R.string.running)

        thread {
            printMemStats("Before buffer allocation:")
            var bufferSize = 0L
            // testzipfile.zip is not part of the project but any zip can be uploaded through the
            // device file manager or adb to test.
            val fileToRead = "$filesDir/testzipfile.zip"
            val inStream =
                if (File(fileToRead).exists()) {
                    FileInputStream(fileToRead).apply {
                        bufferSize = getFileSize(this)
                        close()
                    }
                    FileInputStream(fileToRead)
                } else {
                    // If testzipfile.zip doesn't exist, we will just look at this one which
                    // is part of the APK.
                    resources.openRawResource(R.raw.appapk).apply {
                        bufferSize = getFileSize(this)
                        close()
                    }
                    resources.openRawResource(R.raw.appapk)
                }
            // Allocate the buffer in native memory (off-heap).
            val jniByteArrayHolder = JniByteArrayHolder()
            val byteBuffer =
                if (bufferSize != 0L) {
                    jniByteArrayHolder.allocate(bufferSize)?.apply {
                        printMemStats("After buffer allocation")
                    }
                } else {
                    null
                }

            if (byteBuffer == null) {
                Log.d("Applog", "Failed to allocate $bufferSize bytes of native memory.")
            } else {
                Log.d("Applog", "Allocated ${Formatter.formatFileSize(this, bufferSize)} buffer.")
                val inBytes = ByteArray(4096)
                Log.d("Applog", "Starting buffered read...")
                while (inStream.available() > 0) {
                    byteBuffer.put(inBytes, 0, inStream.read(inBytes))
                }
                inStream.close()
                byteBuffer.flip()
                ZipFile(ByteBufferChannel(byteBuffer)).use {
                    Log.d("Applog", "Starting Zip file name dump...")
                    for (entry in it.entries) {
                        Log.d("Applog", "Zip name: ${entry.name}")
                        val zis = it.getInputStream(entry)
                        while (zis.available() > 0) {
                            zis.read(inBytes)
                        }
                    }
                }
                printMemStats("Before buffer release:")
                jniByteArrayHolder.freeBuffer(byteBuffer)
                printMemStats("After buffer release:")
            }
            runOnUiThread {
                status.text = getString(R.string.idle)
                button.isEnabled = true
                Log.d("Applog", "Done!")
            }
        }
    }

    /*
        This function is a little misleading since it does not reflect the true status of memory.
        After native buffer allocation, it waits until the memory is used before counting is as
        used. After release, it doesn't seem to count the memory as released until garbage
        collection. (My observations only.) Also, see the comment for memset() in native-lib.cpp
        which is a member of this project.
    */
    private fun printMemStats(desc: String? = null) {
        val memoryInfo = ActivityManager.MemoryInfo()
        (getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
        val nativeHeapSize = memoryInfo.totalMem
        val nativeHeapFreeSize = memoryInfo.availMem
        val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
        val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
        val sDesc = desc?.run { "$this:\n" }
        Log.d(
            "AppLog", "$sDesc total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
                    "free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
                    "used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)"
        )
    }

    // Not a great way to do this but not the object of the demo.
    private fun getFileSize(inStream: InputStream): Long {
        var bufferSize = 0L
        while (inStream.available() > 0) {
            val toSkip = inStream.available().toLong()
            inStream.skip(toSkip)
            bufferSize += toSkip
        }
        return bufferSize
    }
}

示例GitHub存储库位于此处.

A sample GitHub repository is here.

这篇关于如何完全从RAM中解析一个压缩文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆