如何使用Spark和Caffe对图像进行分类 [英] How to classify images using Spark and Caffe

查看:81
本文介绍了如何使用Spark和Caffe对图像进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Caffe进行图像分类,我可以使用Pyhton的MAC OS X.

I am using Caffe to do image classification, can I am using MAC OS X, Pyhton.

现在,我知道如何使用Caffe和Spark python对图像列表进行分类,但是如果我想使其更快,我想使用Spark.

Right now I know how to classify a list of images using Caffe with Spark python, but if I want to make it faster, I want to use Spark.

因此,我尝试将图像分类应用于RDD的每个元素,该RDD是从image_path列表创建的.但是,Spark不允许我这样做.

Therefore, I tried to apply the image classification on each element of an RDD, the RDD created from a list of image_path. However, Spark does not allow me to do so.

这是我的代码:

这是图像分类的代码:

# display image name, class number, predicted label
def classify_image(image_path, transformer, net):
    image = caffe.io.load_image(image_path)
    transformed_image = transformer.preprocess('data', image)
    net.blobs['data'].data[...] = transformed_image
    output = net.forward()
    output_prob = output['prob'][0]
    pred = output_prob.argmax()

    labels_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
    labels = np.loadtxt(labels_file, str, delimiter='\t')
    lb = labels[pred]

    image_name = image_path.split(images_folder_path)[1]

    result_str = 'image: '+image_name+'  prediction: '+str(pred)+'  label: '+lb
    return result_str

此代码将生成Caffe参数,并在RDD的每个元素上应用classify_image方法:

This this the code generates Caffe parameters and apply the classify_image method on each element of the RDD:

def main():
    sys.path.insert(0, caffe_root + 'python')
    caffe.set_mode_cpu()
    model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt'
    model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'

    net = caffe.Net(model_def,
                model_weights,
                caffe.TEST)

    mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')
    mu = mu.mean(1).mean(1)

    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

    transformer.set_transpose('data', (2,0,1))
    transformer.set_mean('data', mu)
    transformer.set_raw_scale('data', 255)
    transformer.set_channel_swap('data', (2,1,0))

    net.blobs['data'].reshape(50,
                          3,
                          227, 227)

    image_list= []
    for image_path in glob.glob(images_folder_path+'*.jpg'):
        image_list.append(image_path)

    images_rdd = sc.parallelize(image_list)
    transformer_bc = sc.broadcast(transformer)
    net_bc = sc.broadcast(net)
    image_predictions = images_rdd.map(lambda image_path: classify_image(image_path, transformer_bc, net_bc))
    print image_predictions

if __name__ == '__main__':
    main()

如您所见,这里我尝试广播caffe参数, transformer_bc = sc.broadcast(transformer) net_bc = sc.broadcast(net)错误是:

As you can see, here I tried to broadcast the caffe parameters, transformer_bc = sc.broadcast(transformer), net_bc = sc.broadcast(net) The error is:

RuntimeError:"caffe._caffe.Net"实例的酸洗未启用

RuntimeError: Pickling of "caffe._caffe.Net" instances is not enabled

在我进行广播之前,错误是:

Before I am doing the broadcast, the error was :

驱动程序堆栈跟踪....造成原因:org.apache.spark.api.python.PythonException:追溯(最近一次调用为最后一次):....

Driver stacktrace.... Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):....

那么,您知道吗,有什么方法可以使用Caffe和Spark对图像进行分类,又可以利用Spark?

So, do you know, is there any way I can classify images using Caffe and Spark but also take advantage of Spark?

推荐答案

使用复杂的非本机对象时,初始化必须直接移至工人,例如使用单例模块:

When you work with complex, non-native objects initialization has to moved directly to the workers for example with singleton module:

net_builder.py :

import cafe 

net = None

def build_net(*args, **kwargs):
     ...  # Initialize net here
     return net       

def get_net(*args, **kwargs):
    global net
    if net is None:
        net = build_net(*args, **kwargs)
    return net

main.py :

import net_builder

sc.addPyFile("net_builder.py")

def classify_image(image_path, transformer, *args, **kwargs):
    net = net_builder.get_net(*args, **kwargs)

这意味着您还必须分发所有必需的文件.可以手动完成,也可以使用 SparkFiles 机制完成.

It means you'll have to distribute all required files as well. It can be done either manually or using SparkFiles mechanism.

在旁注中,您应该看一下 SparkNet 程序包.

On a side note you should take a look at the SparkNet package.

这篇关于如何使用Spark和Caffe对图像进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆