多尺度模板匹配与文本检测 [英] Multi-scale template match vs. Text Detection

查看:55
本文介绍了多尺度模板匹配与文本检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 PyAutoGUI 自动导航网站以获取数据和下载文件以检测图像和按钮,但在其他人的计算机上使用它时遇到问题.在我看来,匹配文本图像是这里最大的障碍.

I'm trying to automate the navigation of a website to grab data and download files using PyAutoGUI to detect images and buttons, but I'm having trouble using this on other people's computers. It seems to me that matching images of text is the biggest obstacle here.

我怀疑问题出在缩放和分辨率上,所以我尝试使用多尺度模板匹配,但我发现使用我放大的模板根本不会创建匹配.使用我缩小规模的模板也无济于事,因为它要么找不到任何匹配项,要么即使在 0.8-0.9 的小范围置信度下也找不到错误的匹配项.

I suspected the issue to be with scaling and resolution so I attempted using multi-scale template matching, but I found that using a template I upscaled wouldn't create a match at all. Using a template I downscaled didn't help either since it would either not find any matches, or find the wrong match even with a small range of confidences of 0.8-0.9.

这是 74x17 的原始图像.

Here's the original image at 74x17.

这是 348x80 的放大图像(Windows Photo 出于某种原因不允许我将其放大更小).

Here's the upscaled image at 348x80 (Windows Photo wouldn't let me upscale it any smaller for some reason).

这是 40x8 的缩小缩放图像.

Here's the downscaled image at 40x8.

目前,使用缩小缩放的图像,PyAutoGUI 将上面的图像与此图像混淆:

Currently, with a downscaled image, PyAutoGUI is confusing the above image with this image:

这是我写的代码(还有一些是我从别人那里借来的.

Here's the code I wrote (and some I borrowed from someone.

我借用的多缩放代码:

# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8):

# Locate an image and return a pyscreeze box surrounding it. 
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)

    templateim = pyscreeze._load_cv2(image,grayscale=gs)        # loads the image
    (tH, tW)   = templateim.shape[:2]       # changes the orientation
    screenim_color = pyautogui.screenshot()     # screenshot of image
    screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)

    # Checking if the locateOnScreen() is utilized with grayscale=True or not
    if gs is True:
       screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
    else:
       screenim = screenim_color

    #try different scaling parameters and see which one matches best
    found = None #bookeeping variable for the maximum correlation coefficient, position and scale
    scalingrange = np.linspace(0.25,5,num=150)

    for scale in scalingrange:
        print("Trying another scale")
        resizedtemplate = imutils.resize(templateim,  width = int(templateim.shape[1]*scale) ) # resizing with  imutils maintains the aspect ratio
        r = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
        result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
        (_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
        if found is None or maxVal > found[0]:
           found = (maxVal, maxLoc, r)
           
    (maxVal, maxLoc, r) = found
    if maxVal > confidence:
       box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*r), int(tH*r) )
       return box
    else:
       return None

def locate_center_with_scaling(image,gs=True):
    loc = template_match_with_scaling(image,gs=gs) 
    if loc:
       return pyautogui.center(loc)
    else:
       raise Exception("Image not found")

要匹配的代码并单击其标识符旁边的文本框:

My code to match and click on a textbox next to its identifier:

while SKUnoCounter <= len(listOfSKUs):

    while pyautogui.locateOnScreen('DescriptionBox-RESIZEDsmall.png', grayscale=True, confidence=0.8 ) is None:
        print("Looking for Description Box.")

        if locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png') is not None:
            print("Found a resized version of Description Box. ")

            #Calling to function
            DB_x, DB_y = locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png')
            
            #Clicking on Description text box
            pyautogui.click( DB_x + 417,  DB_y +12,  button='left')
            
            break
        time.sleep(0.5) 

如果我的目标是在所有类型的计算机上使用多尺度模板匹配,是否值得尝试并提高多尺度模板匹配的准确性?尝试使用 OCR 检测文本而不是图像会更好吗?我的另一个想法是使用 PyTesseract 来定位我正在搜索的文本,然后使用这些坐标单击事物.Selenium 在这里不起作用,因为我需要在现有的 IE 浏览器上工作.

Is it worthwhile to try and improve the accuracy of the multi-scale template matching if my goal is to use this across all kinds of computers? Would it be better to try using OCR to detect text instead of by image? My other idea here is to use PyTesseract to locate the text I'm searching for and then use those coordinates to click on things. Selenium does not work here as I need to work on an existing IE browser.

非常感谢这里的任何输入!

Any input here is greatly appreciated!

推荐答案

按照我上面的评论,这就是修改后的函数的样子

Following my comment above, this is how the modified function could look like

# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8, scalingrange=None):

# Locate an image and return a pyscreeze box surrounding it. 
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)
    templateim = pyscreeze._load_cv2(image,grayscale=gs)        # loads the image
    (tH, tW)   = templateim.shape[:2]       # changes the orientation
    screenim_color = pyautogui.screenshot()     # screenshot of image
    screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)

    # Checking if the locateOnScreen() is utilized with grayscale=True or not
    if gs is True:
       screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
    else:
       screenim = screenim_color

    #try different scaling parameters and see which one matches best
    found = None #bookeeping variable for the maximum correlation coefficient, position and scale
    
    for scalex in scalingrange:
      width = int(templateim.shape[1] * scalex) 
      for scaley in scalingrange:
        #print("Trying another scale")
        #print(scalex,scaley)
        height = int(templateim.shape[0] * scaley)
        scaledsize = (width, height)
 
        # resize image
        resizedtemplate = cv2.resize(templateim, scaledsize)
        #resizedtemplate = imutils.resize(templateim,  width = int(templateim.shape[1]*scale) ) # resizing with  imutils maintains the aspect ratio
        ry = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
        rx = float(resizedtemplate.shape[0])/templateim.shape[0] # recompute scaling factor
        result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
        (_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
        if found is None or maxVal > found[0]:
           found = (maxVal, maxLoc, rx, ry)
           
    (maxVal, maxLoc, rx, ry) = found
    print('maxVal= ', maxVal)
    if maxVal > confidence:
       box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*rx), int(tH*ry) )
       return box
    else:
       return None

def locate_center_with_scaling(image,gs=True,**kwargs):
    loc = template_match_with_scaling(image,gs=gs,**kwargs) 
    if loc:
       return pyautogui.center(loc)
    else:
       raise Exception("Image not found")

im =  'DescriptionBox.png' # we will try to detect the small description box, whose width and height are scaled down by 0.54 and 0.47              
unscaledLocation = pyautogui.locateOnScreen(im, grayscale=True, confidence=0.8 )
srange = np.linspace(0.4,0.6,num=20) #scale width and height in this range
if unscaledLocation is None:
   print("Looking for Description Box.")
   scaledLocation = locate_center_with_scaling(im, scalingrange= srange)   
   if scaledLocation is not None:
      print(f'Found a resized version of Description Box at ({scaledLocation[0]},{scaledLocation[1]})')
      pyautogui.moveTo(scaledLocation[0], scaledLocation[1])       

我们需要注意两件事:

  • template_match_with_scaling 现在正在执行双循环,每个维度一个循环,因此检测模板图像需要一些时间.为了分摊检测时间,我们应该在第一次检测后保存宽度和高度的比例参数,并通过这些参数对所有模板图像进行缩放以备后续检测.
  • 为了能够有效地检测模板,我们需要将 template_match_with_scalingscalingrange 输入设置为适当的值范围.如果范围很小或没有足够的值,我们将无法检测到模板.如果太大,检测时间会很长.
  • template_match_with_scaling is now executing a double loop, one over each dimension so it will take some time to detect the template image. To amortize the detection time, we should save the scale parameters for width and height after the first detection, and scale all template images by these parameters for subsequent detections.
  • to be able to detect the template efficiently, we need to set the scalingrange input of template_match_with_scaling to an appropriate range of values. If the range is either small or doesn't have enough values, we will not be able to detect the template. If it is too large, detection time will be large.

这篇关于多尺度模板匹配与文本检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆