OCR - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本 [英] OCR - Getting text from image using tesseract 3.0 and imagemagick 6.6.5

查看：146 发布时间：2018/7/30 13:42:28 linux imagemagick tesseract

本文介绍了OCR - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试构建一个允许我在图像中搜索文本的shell脚本。根据文本，脚本将尽力从图像中获取文本。我想要你的输入，因为这个脚本似乎适用于大多数图像，但不是那些文本字体颜色类似于文本周围较小环境的图像。

I am trying to build a shell script that allows me to search for text in an image. Based on the text, the script will try its best to get the text from the image. I wanted your input on this as this script seems to work with most images, but not those images where the text font color is similar to smaller-surroundings around the text.

# !/bin/bash
# 
# imt-ocr.sh is image magick tessearc OCR tool that is used for finding out text in image
#
# Arguments:
# 1     -- image filename (with path)
# 2     -- text to search in image      (default to '')
# 3     -- occurence of text            (default to 1)
# Usage:
# imt-ocr.sh [image_filename] [text_to_search] [occurence]
#

image=$1
txt=$2
occurence=$3    # Default to 1
if [ "$occurence" == "" ]
then
        occurence=1
fi

get_major_color ()
# Returns the major color of an image with its hex value
#       Parameter:      Image filename (with path)
#       Return format:  Returns a string "hex_val_of_color major_color_name"
{
convert $1 -format %c histogram:info: > x.txt
cat x.txt | awk '{print $1}' > x1.txt
h=$(sort -n x1.txt | tail -1);
color_info=$(cat x.txt | grep "$h" | cut -d '#' -f2)
rm -rf x.txt x1.txt
echo "$color_info"
}


invert_color()
# Inverts the color hex value
#       Parameter:      Hex value to be inverted
#       Return format:  Returns in hex
{
input_color_hex=$1                                              # Input color's hex value
white_color_hex=FFFFFF                                          # White color's  hex vlaue
inv_color_hex=`echo $(printf '%06X\n' $((0x$white_color_hex - 0x$input_color_hex)))`
echo $inv_color_hex
}


start_scale=100
end_scale=300
increment_scale=100
tmp_img=dst.tif
attempt=1
for ((scale=$start_scale, attempt=$attempt; scale <= $end_scale ; scale=scale+$increment_scale, attempt++))
        do
                echo "IMT-OCR-LOG: Scaling image to $scale% in attempt #$attempt"
                convert $image -type Grayscale -scale $scale% $tmp_img
                tesseract $tmp_img OUT
                found_oc=$(grep -o "$txt" OUT.txt | wc -l)
                echo "IMT-OCR-LOG: Found $found_oc occurence(s) of text '$txt' in attempt #$attempt"
                if [ $occurence -le $found_oc ] && [ $found_oc -ne 0 ]
                then
                        echo "IMT-OCR-LOG: Printing out the last text found on image"
                        echo "IMT-OCR-LOG: ======================================================"
                        cat OUT.txt
                        echo "IMT-OCR-LOG: ======================================================"
                        rm -rf $tmp_img OUT.txt
                        exit 1
                else
                        echo "IMT-OCR-LOG: Getting major color of image in attempt #$attempt"
                        color_info=`get_major_color $image`
                        true_color=$(echo $color_info | awk '{print $2}')
                        true_val=$(echo $color_info | awk '{print $1}')
                        echo "IMT-OCR-LOG: Major color of image is '$true_color' with hex value of $true_val in attempt #$attempt"

                        # Blur the image
                        echo "IMT-OCR-LOG: Bluring image in attempt #$attempt"
                        convert $tmp_img -blur 1x65535 $tmp_img

                        # Flip the color
                        inverted_val=`invert_color $true_val`
                        echo "IMT-OCR-LOG: Inverting the major color of image from 0x$true_val to 0x$inverted_val in attempt #$attempt"
                        convert $tmp_img -fill \#$inverted_val -opaque \#$true_val $tmp_img

                        # Sharpen the image
                        echo "IMT-OCR-LOG: Sharpening image in attempt #$attempt"
                        convert $tmp_img -sharpen 1x65535 $tmp_img

                        # Find text
                        tesseract $tmp_img OUT
                        found_oc=$(grep -o "$txt" OUT.txt | wc -l)
                        echo "IMT-OCR-LOG: Found $found_oc occurence(s) of text '$txt' in attempt #$attempt"
                        if [ "$found_oc" != "0" ]
                        then
                                if [ $occurence -le $found_oc ]
                                then
                                        echo "IMT-OCR-LOG: Printing out the last text found on image"
                                        echo "IMT-OCR-LOG: ======================================================"
                                        cat OUT.txt
                                        echo "IMT-OCR-LOG: ======================================================"
                                        rm -rf $tmp_img OUT.txt
                                        exit 1
                                fi
                        fi
                fi

                rm -rf OUT.txt

        done

rm -rf $tmp_img

以下是一个有问题的示例示例，
image（test.jpeg） http：/ /www.igoipad.com/wp-content/uploads/2012/07/03-Word-Collage-iPad.jpeg

Here is a sample example with problem, image (test.jpeg) http://www.igoipad.com/wp-content/uploads/2012/07/03-Word-Collage-iPad.jpeg

[admin@ba-callgen image-magick-tesseract-processing]$ sh imt-ocr.sh test.jpeg Common
IMT-OCR-LOG: Scaling image to 100% in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Common' in attempt #1
IMT-OCR-LOG: Getting major color of image in attempt #1
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #1
IMT-OCR-LOG: Bluring image in attempt #1
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #1
IMT-OCR-LOG: Sharpening image in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Common' in attempt #1
IMT-OCR-LOG: Scaling image to 200% in attempt #2
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 1 occurence(s) of text 'Common' in attempt #2
IMT-OCR-LOG: Printing out the last text found on image
IMT-OCR-LOG: ======================================================
Settings M...
Text
Common words
Exclude numbers
word case
Theme & Layuul
Color theme
Fnnl
Word layout
Clrien lalion
7301
Lrmclsc ape
\u2018OTC
Ergl sw v.-ords >
li( `
I):Jntc1'\:1r\qa )
Landon Spring >
Hough Trad >
H3'fJ|1d :-Ialf >
H L

IMT-OCR-LOG: ======================================================
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ sh imt-ocr.sh test.jpeg Portrait
IMT-OCR-LOG: Scaling image to 100% in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #1
IMT-OCR-LOG: Getting major color of image in attempt #1
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #1
IMT-OCR-LOG: Bluring image in attempt #1
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #1
IMT-OCR-LOG: Sharpening image in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #1
IMT-OCR-LOG: Scaling image to 200% in attempt #2
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #2
IMT-OCR-LOG: Getting major color of image in attempt #2
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #2
IMT-OCR-LOG: Bluring image in attempt #2
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #2
IMT-OCR-LOG: Sharpening image in attempt #2
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #2
IMT-OCR-LOG: Scaling image to 300% in attempt #3
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #3
IMT-OCR-LOG: Getting major color of image in attempt #3
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #3
IMT-OCR-LOG: Bluring image in attempt #3
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #3
IMT-OCR-LOG: Sharpening image in attempt #3
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #3
[admin@ba-callgen image-magick-tesseract-processing]$

如你所见，我可以找到文本常见，但不是肖像。原因是肖像的字体颜色。任何改进此脚本的帮助......

As you can see I can find text "common", but not "Portrait". The reason is because of the font color of the Portrait. Any help to improve this script...

我使用的是Centos 5.

I am using Centos 5.

推荐答案

在操作输入图像时，不要人为限制自己只评估一种或两种方法。您似乎现在只使用 -blur 和 -scale 。

Do not artificially limit yourself to just evaluate one or two methods when you manipulate your input image. You seem to only use -blur and -scale right now.

您还应该考虑使用以下操作：

You should also consider to use the following operations:

-contrast

-despeckle

-edge

-negate

-normalize

-posterize

-type灰度

-monochrome

-gamma

-antialias / + antialias

-contrast
-despeckle
-edge
-negate
-normalize
-posterize
-type grayscale
-monochrome
-gamma
-antialias / +antialias

输入图片：

请参阅此命令生成的内容：

See for example what this command produces:

convert 03-Word-Collage-iPad.jpeg             \
    -scale 1000%                              \
    -blur 1x65535 -blur 1x65535 -blur 1x65535 \
    -contrast                                 \
    -normalize                                \
    -despeckle -despeckle                     \
    -type grayscale                           \
    -sharpen 1                                \
    -posterize 3                              \
    -negate                                   \
    -gamma 100                                \
    -compress zip                             \
     a.tif

输出图片： < br>
（对不起，当将TIFF上传到这个网站时，它会自动转换为PNG。因此，在下载上面看到的图像时，你并没有真正得到我的TIFF - 但你仍然可以看到我真实结果的足够图片。）

Output Image:
(Sorry, when uploading a TIFF to this website it gets auto-converted to PNG. So you don't really get my TIFF when downloading the image you see above -- but you'll nevertheless see a close-enough picture of my real result.)

注1：我使用此ImageMagick版本对此进行了测试：

Note 1: I tested this with this ImageMagick version:

convert -version
  Version: ImageMagick 6.7.6-9 2012-05-12 Q16 http://www.imagemagick.org
  Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC
  Features:

注2： ImageMagick的旧版本或更新版本的行为可能有所不同，尤其是涉及到 -posterize ！

Note 2: Older or newer versions of ImageMagick may behave differently, especially when it comes to -posterize!

这是Tesseract的OCR a.tif ：


And this is the result of Tesseract's OCR for a.tif:
tesseract a.tif OUT  &&  cat OUT.txt

Tesseract Open Source OCR Engine v3.01 with Leptonica
   Page 0
   Text
   Common words Remove English words >
   Exclude numbers
   Word case Don't change 1+
   Theme & Layout
   Color theme London Spring >
   Font Rough Trad >
   Word layout Half and Half >
   Orientation
   Landscape
   Q
   u
   -0
   "H
   I

 
 
 
 
 
 更新：
 
 
 我验证了最新版本of ImageMagick  6.7.9-0 （昨天发布）与我在上面显示的命令+截图（用版本 6.7制作）产生的结果不一样.6-9 ）。区别在于：




Update:

I verified that the most recent version of ImageMagick 6.7.9-0 (released yesterday) does not produce the same exact result as I showed with above command + screenshot (made with version 6.7.6-9). Here is the difference:
  
无论如何，我确定你是否稍微调整了我的命令，玩各种参数，你无论你的ImageMagick版本是什么，都能让它为你工作...... 
In any case, I'm sure if you tweak my command a bit, playing with various parameters, you'll get it to work for you, whatever your ImageMagick version is...     

                        这篇关于OCR  - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

OCR - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本 [英] OCR - Getting text from image using tesseract 3.0 and imagemagick 6.6.5

问题描述

推荐答案

更新：

Update:

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

OCR - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本 [英] OCR - Getting text from image using tesseract 3.0 and imagemagick 6.6.5

问题描述

推荐答案

更新：

Update:

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭