使用python和PIL如何获取图像中的文本块? [英] Using python and PIL how can I grab a block of text in an image?

查看:313
本文介绍了使用python和PIL如何获取图像中的文本块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个图像(* .png),其中包含两个文本块.我正在尝试使用python27中的python图像库(PIL)分别获取每个文本块.

I have an image (*.png) which contains two blocks of text. I am trying to grab each block of text individually using the python imaging library (PIL) in python27.

我尝试使图像模糊,然后找到模糊块的边缘,以便随后可以恢复每个块的边界(稍后在"crop"中使用).但是,当我模糊图像时(我尝试了几次迭代),"find_edges"滤镜似乎只是抓住了每个字符的边缘.

I have tried to blur the image and then find the edges of the blurred block so that I can then recover the boundaries of each block (for use later with "crop"). However when I blur the image (I've tried several iterations) the "find_edges" filter simply seems to grab the edges of each character.

pic = Image.open("a.jpg")
out = pic.filter(ImageFilter.BLUR)
out = out.filter(ImageFilter.FIND_EDGES)

我想我正在寻找类似photoshop"Magnetic Lasso Tool"的东西.您知道哪种方法可能更好吗?

I guess I'm looking for something similar the photoshop "Magnetic Lasso Tool" Any idea what approach may be better?

推荐答案

我将首先制作投影到一个轴上的图像的直方图.拍摄图像,首先裁剪到外部边界框.将直方图投影到y轴的示例:

I would start by making a histogram of the image projected onto one axis. Take your image, crop to the outer bounding box first. An example of the projected histogram onto to the y-axis:

from PIL import Image
import numpy as np

im = Image.open("dummytext.png")
pix = np.asarray(im)
pix = pix[:,:,0:3] # Drop the alpha channel
pix = 255 - pix  # Invert the image
H =  pix.sum(axis=2).sum(axis=1) # Sum the colors, then the y-axis

从此处确定最大的空白块.这确定了分割时的最佳y坐标.请注意,它在上面的直方图中是显而易见的.如果两个文本块靠得更近,您将需要一个更好的条件,只需调整该方法即可满足您的需求.分割后,您可以分别裁剪图像.

From here, identify the largest block of white space. This determines the best y-coordinate to split at. Note how it is obvious in the histogram above. If the two text blocks are closer together you'll need a better criteria, just adapt the method to fit your needs. Once split you can crop the images separately.

这篇关于使用python和PIL如何获取图像中的文本块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆