以编程方式将扫描图像分成单独的图像 [英] Programmatically divide scanned images into separate images
问题描述
为了提高OCR质量,我需要对扫描图像进行预处理。有时候我需要用几张图片来对图像进行OCR(页面上的组件和它们处于不同的角度 - 例如,一次扫描一些纸质文档),例如:
In order to improve OCR quality, I need to preprocess my scanned images. Sometimes I need to OCR the image with few pictures (components on the page and they are at different angles - for example, a few paper documents scanned at one time), for example:
是否可以自动以编程方式将此类图像划分为包含每个逻辑文档的单独图像?例如使用像ImageMagick或其他工具?是否存在针对此类问题的任何解决方案/技术?
Is it possible to automatically programmatically divide such images into separate images that will contain every logical document? For example with a tool like ImageMagick or something else? Is there any solutions/technics exists for such problem?
推荐答案
在ImageMagick 6中,您可以模糊图像,使文本重叠和阈值,使文本框在白色背景上每个都是一个大的黑色区域。然后,您可以使用连通组件查找每个单独的黑色灰色(0)区域及其边界框。然后使用边界框值裁剪每个此类区域的原始图像。
In ImageMagick 6, you can blur the image enough that the text overlaps and threshold so that the text boxes are each one large black region on a white background. Then you can use connected-components to find each separate black gray(0) region and its bounding box. Then crop the original image for each such region using the bounding box values.
输入:
Unix语法(将模糊调整到足够大以使文本区域保持黑色):
Unix Syntax (adjust the blur to be just large enough to keep the text regions solid black):
infile="image.png"
inname=`convert -ping $infile -format "%t" info:`
OLDIFS=$IFS
IFS=$'\n'
arr=(`convert $infile -blur 0x5 -auto-level -threshold 99% -type bilevel +write tmp.png \
-define connected-components:verbose=true \
-connected-components 8 \
null: | tail -n +2 | sed 's/^[ ]*//'`)
num=${#arr[*]}
IFS=$OLDIFS
for ((i=0; i<num; i++)); do
#echo "${arr[$i]}"
color=`echo ${arr[$i]} | cut -d\ -f5`
bbox=`echo ${arr[$i]} | cut -d\ -f2`
echo "color=$color; bbox=$bbox"
if [ "$color" = "gray(0)" ]; then
convert $infile -crop $bbox +repage -fuzz 10% -trim +repage ${inname}_$i.png
fi
done
文本清单:
Textual Listing:
color=gray(255); bbox=892x1008+0+0
color=gray(0); bbox=337x430+36+13
color=gray(0); bbox=430x337+266+630
color=gray(0); bbox=202x147+506+252
tmp.png显示模糊和阈值区域:
tmp.png showing the blurred and thresholded regions:
裁剪图片:
这篇关于以编程方式将扫描图像分成单独的图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!