加速numpy中的矢量化眼动追踪算法 [英] Speeding up vectorized eye-tracking algorithm in numpy

查看:157
本文介绍了加速numpy中的矢量化眼动追踪算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实施Fabian Timm的眼动跟踪算法[ http://www.inb.uni-luebeck.de/publikationen/pdfs/TiBa11b.pdf] (位于此处:[

I'm trying to implement Fabian Timm's eye-tracking algorithm [http://www.inb.uni-luebeck.de/publikationen/pdfs/TiBa11b.pdf] (found here: [http://thume.ca/projects/2012/11/04/simple-accurate-eye-center-tracking-in-opencv/]) in numpy and OpenCV and I've hit a snag. I think I've vectorized my implementation decently enough, but it's still not fast enough to run in real time and it doesn't detect pupils with as much accuracy as I had hoped. This is my first time using numpy, so I'm not sure what I've done wrong.

def find_pupil(eye):
    eye_len = np.arange(eye.shape[0])
    xx,yy = np.meshgrid(eye_len,eye_len) #coordinates
    XX,YY = np.meshgrid(xx.ravel(),yy.ravel()) #all distance vectors
    Dx,Dy = [YY-XX, YY-XX] #y2-y1, x2-x1 -- simpler this way because YY = XXT
    Dlen = np.sqrt(Dx**2+Dy**2)
    Dx,Dy = [Dx/Dlen, Dy/Dlen] #normalized

    Gx,Gy = np.gradient(eye)
    Gmagn = np.sqrt(Gx**2+Gy**2)

    Gx,Gy = [Gx/Gmagn,Gy/Gmagn] #normalized
    GX,GY = np.meshgrid(Gx.ravel(),Gy.ravel())

    X = (GX*Dx+GY*Dy)**2
    eye = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1])) #inverting and blurring eye for use as w
    eyem = np.repeat(eye.ravel()[np.newaxis,:],eye.size,0)
    C = (np.nansum(eyem*X, axis=0)/eye.size).reshape(eye.shape)

    return np.unravel_index(C.argmax(), C.shape)

和其余代码:

def find_eyes(face):
    left_x, left_y = [int(floor(0.5 * face.shape[0])), int(floor(0.2 * face.shape[1]))]
    right_x, right_y = [int(floor(0.1 * face.shape[0])), int(floor(0.2 * face.shape[1]))]
    area = int(floor(0.2 * face.shape[0]))
    left_eye = (left_x, left_y, area, area)
    right_eye = (right_x, right_y, area, area)

    return [left_eye,right_eye]



faceCascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
video_capture = cv2.VideoCapture(0)

while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    faces = faceCascade.detectMultiScale(
        gray,
        scaleFactor=1.1,
        minNeighbors=5,
        minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE
    )

    # Draw a rectangle around the faces
    for (x, y, w, h) in faces:
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
        roi_gray = gray[y:y+h, x:x+w]
        roi_color = frame[y:y+h, x:x+w]
        eyes = find_eyes(roi_gray)
        for (ex,ey,ew,eh) in eyes:
            eye_gray = roi_gray[ey:ey+eh,ex:ex+ew]
            eye_color = roi_color[ey:ey+eh,ex:ex+ew]
            cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(255,0,0),2)
            px,py = find_pupil(eye_gray)
            cv2.rectangle(eye_color,(px,py),(px+1,py+1),(255,0,0),2)

    # Display the resulting frame
    cv2.imshow('Video', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything is done, release the capture
video_capture.release()
cv2.destroyAllWindows()

推荐答案

您可以执行许多保存复制元素的操作,然后在创建允许 NumPy broadcasting .因此,有两个好处-动态操作可以节省工作空间内存并提高性能.同样,最后,我们可以用简化版本代替nansum计算.因此,在牢记所有这些哲学的前提下,这是一种改良的方法-

You can perform many of those operations that save replicated elements and then perform some mathematical opertaions by directly performing the mathematical operatrions after creating singleton dimensions that would allow NumPy broadcasting. Thus, there would be two benefits - On the fly operations to save workspace memory and performance boost. Also, at the end, we can replace the nansum calculation with a simplified version. Thus, with all of that philosophy in mind, here's one modified approach -

def find_pupil_v2(face, x, y, w, h):    
    eye = face[x:x+w,y:y+h]
    eye_len = np.arange(eye.shape[0])

    N = eye_len.size**2
    eye_len_diff = eye_len[:,None] - eye_len
    Dlen = np.sqrt(2*((eye_len_diff)**2))
    Dxy0 = eye_len_diff/Dlen 

    Gx0,Gy0 = np.gradient(eye)
    Gmagn = np.sqrt(Gx0**2+Gy0**2)
    Gx,Gy = [Gx0/Gmagn,Gy0/Gmagn] #normalized

    B0 = Gy[:,:,None]*Dxy0[:,None,:]
    C0 = Gx[:,None,:]*Dxy0
    X = ((C0.transpose(1,0,2)[:,None,:,:]+B0[:,:,None,:]).reshape(N,N))**2

    eye1 = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1]))
    C = (np.nansum(X,0)*eye1.ravel()/eye1.size).reshape(eye1.shape)

    return np.unravel_index(C.argmax(), C.shape)

Dxy上还剩下一个repeat.可能可以避免该步骤,而Dxy0可以直接输入到使用Dxy给我们X的步骤中,但是我没有完成.一切都转换为broadcasting

There's one repeat still left in it at Dxy. It might be possible to avoid that step and Dxy0 could be fed directly into the step that uses Dxy to give us X, but I haven't worked through it. Everything's converted to broadcasting based!

运行时测试和输出验证-

Runtime test and output verification -

In [539]: # Inputs with random elements
     ...: face = np.random.randint(0,10,(256,256)).astype('uint8')
     ...: x = 40
     ...: y = 60
     ...: w = 64
     ...: h = 64
     ...: 

In [540]: find_pupil(face,x,y,w,h)
Out[540]: (32, 63)

In [541]: find_pupil_v2(face,x,y,w,h)
Out[541]: (32, 63)

In [542]: %timeit find_pupil(face,x,y,w,h)
1 loops, best of 3: 4.15 s per loop

In [543]: %timeit find_pupil_v2(face,x,y,w,h)
1 loops, best of 3: 529 ms per loop

看来我们正在接近 8x 加速!

It seems we are getting close to 8x speedup!

这篇关于加速numpy中的矢量化眼动追踪算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆