U2Net模型在android中的使用 [英] Usage of U2Net Model in android

查看:47
本文介绍了U2Net模型在android中的使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我按照这些说明,转换成功.

I converted the original u2net model weight file u2net.pth to tensorflow lite by following these instructructions, and it is converted successfully.

但是我在 tensrflow lite 的 android 中使用它时遇到问题,我无法使用 tflite-support 脚本,所以我更改了模型并只返回了 1 个输出 d0(这是所有即 d1,d2,...,d7).然后元数据成功添加,我能够使用模型,但它没有给出任何输出并返回相同的图像.

However I'm having trouble using it in android in tensrflow lite, I was not being able to add the image segmenter metadata to this model with tflite-support script, so I changed the model and returned only 1 output d0 (which is a combination of all i.e d1,d2,...,d7). Then metadata was added successfully and I was able to use the model, but its not giving any output and returning the same image .

因此,任何帮助将不胜感激,让我知道我在哪里搞砸了,以及如何在 tensorflow lite 和 android 中正确使用这个 u2net 模型,在此先感谢..

So any help would be much appreciated, in letting me know where I messed up, and how can I use this use this u2net model properly in tensorflow lite with android, thanks in advance ..

推荐答案

我会在这里写一个很长的答案.与 U2Net 的 github 存储库取得联系,它会让您努力检查预处理和后处理步骤,以便您可以在 android 项目中应用相同的内容.

I will write a long answer here. Getting in touch with the github repo of U2Net it leaves you with the effort to examine the pre and post-processing steps so you can aply the same inside the android project.

首先进行预处理:在 u2net_test.py 文件中,您可以看到 这一行表示所有图像都使用函数ToTensorLab(flag=0)进行了预处理.导航到这里,您会看到 flag=0 的预处理是这样的:

First of all preprocessing: In the u2net_test.py file you can see at this line that all the images are preprocessed with function ToTensorLab(flag=0). Navigating to this you see that with flag=0 the preprocessing is this:

else: # with rgb color (flag = 0)
            tmpImg = np.zeros((image.shape[0],image.shape[1],3))
            image = image/np.max(image)
            if image.shape[2]==1:
                tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,1] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,2] = (image[:,:,0]-0.485)/0.229
            else:
                tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
                tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225

注意两个步骤.

首先将每个颜色像素值除以所有颜色像素值的最大值:

First every color pixel value is divided by the maximum value of all color pixel values:

image = image/np.max(image)

在每个颜色像素值的第二个应用平均值和标准:

Second at every color pixel value is applied mean and std:

tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225

所以基本上在 Kotlin 中,如果您有位图,您必须执行以下操作:

So basically in Kotlin if you have a bitmap you have to do something like:

fun bitmapToFloatArray(bitmap: Bitmap):
                Array<Array<Array<FloatArray>>> {
            
            val width: Int = bitmap.width
            val height: Int = bitmap.height
            val intValues = IntArray(width * height)
            bitmap.getPixels(intValues, 0, width, 0, 0, width, height)

            // Create aa array to find the maximum value
            val fourDimensionalArray = Array(1) {
                Array(320) {
                    Array(320) {
                        FloatArray(3)
                    }
                }
            }
            // https://github.com/xuebinqin/U-2-Net/blob/f2b8e4ac1c4fbe90daba8707bca051a0ec830bf6/data_loader.py#L204
            for (i in 0 until width - 1) {
                for (j in 0 until height - 1) {
                    val pixelValue: Int = intValues[i * width + j]
                    fourDimensionalArray[0][i][j][0] =
                        Color.red(pixelValue)
                            .toFloat()
                    fourDimensionalArray[0][i][j][1] =
                        Color.green(pixelValue)
                            .toFloat()
                    fourDimensionalArray[0][i][j][2] =
                        Color.blue(pixelValue).toFloat()
                }

            }
            // Convert multidimensional array to 1D
            val oneDFloatArray = ArrayList<Float>()

            for (m in fourDimensionalArray[0].indices) {
                for (x in fourDimensionalArray[0][0].indices) {
                    for (y in fourDimensionalArray[0][0][0].indices) {
                        oneDFloatArray.add(fourDimensionalArray[0][m][x][y])
                    }
                }
            }

            val maxValue: Float = oneDFloatArray.maxOrNull() ?: 0f
            //val minValue: Float = oneDFloatArray.minOrNull() ?: 0f

            // Final array that is going to be used with interpreter
            val finalFourDimensionalArray = Array(1) {
                Array(320) {
                    Array(320) {
                        FloatArray(3)
                    }
                }
            }
            for (i in 0 until width - 1) {
                for (j in 0 until height - 1) {
                    val pixelValue: Int = intValues[i * width + j]
                    finalFourDimensionalArray[0][i][j][0] =
                        ((Color.red(pixelValue).toFloat() / maxValue) - 0.485f) / 0.229f
                    finalFourDimensionalArray[0][i][j][1] =
                        ((Color.green(pixelValue).toFloat() / maxValue) - 0.456f) / 0.224f
                    finalFourDimensionalArray[0][i][j][2] =
                        ((Color.blue(pixelValue).toFloat() / maxValue) - 0.406f) / 0.225f
                }

            }

            return finalFourDimensionalArray
        }

然后这个数组被输入到解释器中,因为你的模型有多个输出,我们正在使用 runForMultipleInputsOutputs:

Then this array is fed inside the interpreter and as your model has multiple outputs we are using runForMultipleInputsOutputs:

// Convert Bitmap to Float array
             val inputStyle = ImageUtils.bitmapToFloatArray(loadedBitmap)

            // Create arrays with size 1,320,320,1
            val output1 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output2 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output3 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output4 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output5 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output6 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}

            val outputs: MutableMap<Int,
                    Any> = HashMap()
            outputs[0] = output1
            outputs[1] = output2
            outputs[2] = output3
            outputs[3] = output4
            outputs[4] = output5
            outputs[5] = output6
          
            // Runs model inference and gets result.
            val array = arrayOf(inputStyle)
            interpreterDepth.runForMultipleInputsOutputs(array, outputs)

然后我们使用解释器的第一个输出为 您可以在 u2net_test.py 文件中看到.(我还打印了 第 112 行 但它似乎没有效果.您可以自由尝试使用颜色像素值的最小值和最大值).所以我们有像你在 save_output 函数:

Then we use the first output of the interpreter as you can see at u2net_test.py file. (I have also printed results of line 112 but it seems that it has no effect. You are free to try that with min and max value of the color pixel values). So we have the post proseccing like you can see at the save_output function:

// Convert output array to Bitmap
val (finalBitmapGrey, finalBitmapBlack) = ImageUtils.convertArrayToBitmapTensorFlow(
                output1, CONTENT_IMAGE_SIZE,
                CONTENT_IMAGE_SIZE
            )

上面的函数将类似于:

fun convertArrayToBitmapTensorFlow(
            imageArray: Array<Array<Array<FloatArray>>>,
            imageWidth: Int,
            imageHeight: Int
        ): Bitmap {
            val conf = Bitmap.Config.ARGB_8888 // see other conf types
            val grayToneImage = Bitmap.createBitmap(imageWidth, imageHeight, conf)

            for (x in imageArray[0].indices) {
                for (y in imageArray[0][0].indices) {
                    val color = Color.rgb(
                        //
                        (((imageArray[0][x][y][0]) * 255f).toInt()),
                        (((imageArray[0][x][y][0]) * 255f).toInt()),
                        (((imageArray[0][x][y][0]) * 255f).toInt())
                    )

                    // this y, x is in the correct order!!!
                    grayToneImage.setPixel(y, x, color)
                }
            }
            return grayToneImage
        }

然后这个灰度图像你可以随意使用.

then this grayscale image you can use it as you want.

由于预处理的多个步骤,我直接使用了解释器,没有额外的库.如果您可以在所有步骤中插入元数据,我将在本周晚些时候尝试,但我对此表示怀疑.

Due to multiple steps of the preprocessing I used directly interpreter with no additional libraries. I will try later in the week if you can insert metadata with all the steps but I doubt that.

如果您需要一些说明,请随时问我.

If you need some clarifications please do not hesitate to ask me.

Colab 笔记本 link

快乐编码

这篇关于U2Net模型在android中的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆