使用二进制文件保存图像文件 - pyspark [英] Saving an Image file using binary Files - pyspark
问题描述
地狱所有,
我如何保存图像文件(JPG格式)到我的本地系统。我用BinaryFiles的图片加载到火花,它们改建为阵列并处理它们。下面是code
How can I save Image file(JPG format) into my local system. I used BinaryFiles to load the pictures into spark, converted them into Array and processed them. Below is the code
from PIL import Image
import numpy as np
import math
images = sc.binaryFiles("path/car*")
imagerdd = images.map(lambda (x,y): (x,(np.asarray(Image.open(StringIO(y)))))
做了一些图像处理,现在关键的具有路径和价值有图像阵列
did some image processing and now key has path and value has Array for Image
imageOutuint = imagelapRDD.map(lambda (x,y): (x,(y.astype(np.uint8))))
imageOutIMG = imageOutuint.map(lambda (x,y): (x,(Image.fromarray(y))))
我如何保存图片到本地/ HDFS系统,我看到有关于它的选项。
How can I save the Image to local/HDFS system, I see there is no option pertaining to it.
推荐答案
如果您想将数据保存到本地文件系统仅仅收取本地迭代器并使用标准的工具,通过记录保存文件记录:
If you want to save data to local file system just collect as local iterator and use standard tools to save files records by records:
for x, img in imagerdd.toLocalIterator():
path = ... # Some path .jpg (based on x?)
img.save(path)
只要确保缓存
imagerdd
,以避免重新计算。
Just be sure to cache
imagerdd
to avoid recomputation.
这篇关于使用二进制文件保存图像文件 - pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!