量化幻灯片中有多少笔迹 [英] Quantify how much a slide has been filled with handwriting

查看:59
本文介绍了量化幻灯片中有多少笔迹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个幻灯片放映的视频,演示者在其中将笔记手写在幻灯片上:

我想创建一个程序来检测幻灯片是否已被填充(例如,通过手写笔记)或新的幻灯片.

我想到的一种方法是文本的OCR,但这并不适合,因为这里唯一更改的文本是手写的或数学的.

我到目前为止所做的事情:我浏览视频,并始终比较前一帧和当前帧.我从相对于前一帧添加的所有元素中提取边界框坐标,并存储最高的y坐标.最高y坐标属于图像最下方的元素(从图像顶部看).因此,从理论上讲,这应该给我指示是否要填充幻灯片...

实际上,我不能真正利用这些数据:

相关视频可以在此处下载:

I have a video of a slideshow, where the presenter handwrites notes onto the slide:

I would like to create a program that detects if a slide is being filled (by handwritten notes for example) or if it is a new slide.

One method I was thinking of is OCR of the text, but this is not suitable since here the only text that changes are either handwritten or math.

What I have done so far: I go through the video and compare always the previous frame and the current frame. I extract the bounding box coordinates from all elements that have been added with respect to the previous frame, and I store the highest y-coordinate. The highest y-coordinate belongs to the element the furthest down the image (as seen from the top of the image). Thus this should -in theory- give me an indication if I am filling up the slide...

In practice, I cannot really make use of this data:

The video in question can be downloaded here: http://www.filedropper.com/00_6

Here is my code:

from skimage.measure import compare_ssim
import cv2
import numpy as np

# Packages for live plot visualisation 
import pyqtgraph as pg
from pyqtgraph.Qt import QtGui, QtCore
from tqdm import tqdm

def get_y_corrd_of_lowest_added_element(prev_frame, frame):
    """
    Given Two Images it detects the bounding boxes of all elemnts that 
    are different betweent the two images and outputs the y coordinate of
    the lowest added element (when seen from the top of the image)

    Parameters
    ----------
    prev_frame : numpy array 
        original image.
    frame : numpy array
        new image, based on original image.

    Returns
    -------
    TYPE
        lowest y coordinate of elments that were added.

    """
    # Compute SSIM between two images
    (score, diff) = compare_ssim(prev_frame, frame, full=True)

    # The diff image contains the actual image differences between the two images
    # and is represented as a floating point data type in the range [0,1] 
    # so we must convert the array to 8-bit unsigned integers in the range
    # [0,255] before we can use it with OpenCV
    diff = (diff * 255).astype("uint8")

    # Threshold the difference image, followed by finding contours to
    # obtain the regions of the two input images that differ
    thresh = cv2.threshold(diff, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
    contours = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = contours[0] if len(contours) == 2 else contours[1]

    # Initialize a list that will hold all y coordinates of all bounding boxes
    # of all elements that were added to the frame when compared to the 
    # previous frame
    y_list = [0]
    
    for c in contours:
        
        area = cv2.contourArea(c)
        if area > 40:
        
            x,y,w,h = cv2.boundingRect(c)
            # Append to y coordinate list
            y_list.append(y)
             
    y_list.sort()
    
    return y_list[-1]


def transform(frame):
    # convert to greyscale
    frame =  cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
    # make smaller
    small = cv2.resize(frame, (0,0), fx=0.5, fy=0.5) 
    return small

vidcap = cv2.VideoCapture(ADD PATH TO VIDEO HERE)
success,prev_frame = vidcap.read()
prev_frame = transform(prev_frame)

# For Real Time Ploting
#Source: http://www.pyqtgraph.org/downloads/0.10.0/pyqtgraph-0.10.0-deb/pyqtgraph-0.10.0/examples/PlotSpeedTest.py
app = QtGui.QApplication([])
win = pg.GraphicsWindow()
win.resize(800, 800)
p = win.addPlot()
p.setTitle('Lowest Y')
plot = p.plot([])

# Store lowest y coordinates of added elements
y_lowest_list = []
while success:
  success,frame = vidcap.read()
  
  # convert
  frame = transform(frame)
  
  # show frame
  cv2.imshow("frame", frame)
  cv2.waitKey(1)
  
  #extract lowest y corrd
  y = get_y_corrd_of_lowest_added_element(prev_frame, frame)
  y_lowest_list.append(y)
  # Real-time plot
  plot.setData(y_lowest_list)
  
# close real-time plot
win.close()

Does anyone have an idea?

解决方案

You can try this code, see comments:

import cv2
import numpy as np

def get_bg_and_ink_level(frame):

    frame =  cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
    background=cv2.threshold(frame[:,:,2], 245, 255, cv2.THRESH_BINARY)[1]
    background_level=cv2.mean(background) # for future use if you need to select frames without hands. 
    ink_color_low = (117,60,150) 
    ink_color_high = (130,207,225) 
    only_ink = cv2.inRange(frame, ink_color_low, ink_color_high)
    ink_level=cv2.mean(only_ink)
    return background_level[0], ink_level[0]

vidcap = cv2.VideoCapture('0_0.mp4')
success,frame = vidcap.read()
bg = []
ink=[]
i=0
while success:
   lv= get_bg_and_ink_level(frame)
   bg.append(lv[0])
   ink.append(lv[1])
   success,frame = vidcap.read()
   
# search for frames where the blue ink is removed from the picture. 
d_ink=np.diff(ink)
d_ink[-1]=-2.0 #add last frame
idx=np.where(d_ink<-1.0)

#save frames
for i in idx[0]:
    vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
    flag, frame = vidcap.read()
    out_name='frame'+str(i)+'.jpg'
    cv2.imwrite(out_name, frame)

Result 15708 frame:

这篇关于量化幻灯片中有多少笔迹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆