加速numpy中的矢量化眼动追踪算法 [英] Speeding up vectorized eye-tracking algorithm in numpy
问题描述
我正在尝试实施Fabian Timm的眼动跟踪算法[ http://www.inb.uni-luebeck.de/publikationen/pdfs/TiBa11b.pdf] (位于此处:[
I'm trying to implement Fabian Timm's eye-tracking algorithm [http://www.inb.uni-luebeck.de/publikationen/pdfs/TiBa11b.pdf] (found here: [http://thume.ca/projects/2012/11/04/simple-accurate-eye-center-tracking-in-opencv/]) in numpy and OpenCV and I've hit a snag. I think I've vectorized my implementation decently enough, but it's still not fast enough to run in real time and it doesn't detect pupils with as much accuracy as I had hoped. This is my first time using numpy, so I'm not sure what I've done wrong.
def find_pupil(eye):
eye_len = np.arange(eye.shape[0])
xx,yy = np.meshgrid(eye_len,eye_len) #coordinates
XX,YY = np.meshgrid(xx.ravel(),yy.ravel()) #all distance vectors
Dx,Dy = [YY-XX, YY-XX] #y2-y1, x2-x1 -- simpler this way because YY = XXT
Dlen = np.sqrt(Dx**2+Dy**2)
Dx,Dy = [Dx/Dlen, Dy/Dlen] #normalized
Gx,Gy = np.gradient(eye)
Gmagn = np.sqrt(Gx**2+Gy**2)
Gx,Gy = [Gx/Gmagn,Gy/Gmagn] #normalized
GX,GY = np.meshgrid(Gx.ravel(),Gy.ravel())
X = (GX*Dx+GY*Dy)**2
eye = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1])) #inverting and blurring eye for use as w
eyem = np.repeat(eye.ravel()[np.newaxis,:],eye.size,0)
C = (np.nansum(eyem*X, axis=0)/eye.size).reshape(eye.shape)
return np.unravel_index(C.argmax(), C.shape)
和其余代码:
def find_eyes(face):
left_x, left_y = [int(floor(0.5 * face.shape[0])), int(floor(0.2 * face.shape[1]))]
right_x, right_y = [int(floor(0.1 * face.shape[0])), int(floor(0.2 * face.shape[1]))]
area = int(floor(0.2 * face.shape[0]))
left_eye = (left_x, left_y, area, area)
right_eye = (right_x, right_y, area, area)
return [left_eye,right_eye]
faceCascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
video_capture = cv2.VideoCapture(0)
while True:
# Capture frame-by-frame
ret, frame = video_capture.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30),
flags=cv2.CASCADE_SCALE_IMAGE
)
# Draw a rectangle around the faces
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = frame[y:y+h, x:x+w]
eyes = find_eyes(roi_gray)
for (ex,ey,ew,eh) in eyes:
eye_gray = roi_gray[ey:ey+eh,ex:ex+ew]
eye_color = roi_color[ey:ey+eh,ex:ex+ew]
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(255,0,0),2)
px,py = find_pupil(eye_gray)
cv2.rectangle(eye_color,(px,py),(px+1,py+1),(255,0,0),2)
# Display the resulting frame
cv2.imshow('Video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# When everything is done, release the capture
video_capture.release()
cv2.destroyAllWindows()
推荐答案
您可以执行许多保存复制元素的操作,然后在创建允许 NumPy broadcasting
.因此,有两个好处-动态操作可以节省工作空间内存并提高性能.同样,最后,我们可以用简化版本代替nansum
计算.因此,在牢记所有这些哲学的前提下,这是一种改良的方法-
You can perform many of those operations that save replicated elements and then perform some mathematical opertaions by directly performing the mathematical operatrions after creating singleton dimensions that would allow NumPy broadcasting
. Thus, there would be two benefits - On the fly operations to save workspace memory and performance boost. Also, at the end, we can replace the nansum
calculation with a simplified version. Thus, with all of that philosophy in mind, here's one modified approach -
def find_pupil_v2(face, x, y, w, h):
eye = face[x:x+w,y:y+h]
eye_len = np.arange(eye.shape[0])
N = eye_len.size**2
eye_len_diff = eye_len[:,None] - eye_len
Dlen = np.sqrt(2*((eye_len_diff)**2))
Dxy0 = eye_len_diff/Dlen
Gx0,Gy0 = np.gradient(eye)
Gmagn = np.sqrt(Gx0**2+Gy0**2)
Gx,Gy = [Gx0/Gmagn,Gy0/Gmagn] #normalized
B0 = Gy[:,:,None]*Dxy0[:,None,:]
C0 = Gx[:,None,:]*Dxy0
X = ((C0.transpose(1,0,2)[:,None,:,:]+B0[:,:,None,:]).reshape(N,N))**2
eye1 = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1]))
C = (np.nansum(X,0)*eye1.ravel()/eye1.size).reshape(eye1.shape)
return np.unravel_index(C.argmax(), C.shape)
在一切都转换为Dxy
上还剩下一个repeat
.可能可以避免该步骤,而Dxy0
可以直接输入到使用Dxy
给我们X
的步骤中,但是我没有完成.broadcasting
There's one Everything's converted to repeat
still left in it at Dxy
. It might be possible to avoid that step and Dxy0
could be fed directly into the step that uses Dxy
to give us X
, but I haven't worked through it.broadcasting
based!
运行时测试和输出验证-
Runtime test and output verification -
In [539]: # Inputs with random elements
...: face = np.random.randint(0,10,(256,256)).astype('uint8')
...: x = 40
...: y = 60
...: w = 64
...: h = 64
...:
In [540]: find_pupil(face,x,y,w,h)
Out[540]: (32, 63)
In [541]: find_pupil_v2(face,x,y,w,h)
Out[541]: (32, 63)
In [542]: %timeit find_pupil(face,x,y,w,h)
1 loops, best of 3: 4.15 s per loop
In [543]: %timeit find_pupil_v2(face,x,y,w,h)
1 loops, best of 3: 529 ms per loop
看来我们正在接近 8x
加速!
It seems we are getting close to 8x
speedup!
这篇关于加速numpy中的矢量化眼动追踪算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!