Tesseract没有收到不同的彩色文本 [英] Tesseract not picking up different colored text

查看:122
本文介绍了Tesseract没有收到不同的彩色文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试制作一个使用tesseract和python从屏幕截图中抓取文本的程序,并且毫无疑问地获得其中的一部分,但是某些文本的颜色较浅并且不会被tesseract拾取.下面是我正在使用的图片的示例:

I am trying to make a program that will scrape the text off of a screenshot using tesseract and python, and am having no issue getting one piece of it, however some text is lighter colored and is not being picked up by tesseract. Below is an example of a picture I am using:

我要使文字位于图片的顶部,而不是下面的3个选项.

I am am to get the text at the top of the picture, but not the 3 options below.

这是我用来抓取文本的代码

Here is the code I am using for grabbing the text

result = pytesseract.image_to_string(
            screen, config="load_system_dawg=0 load_freq_dawg=0")

        print("below is the total value scraped by the tesseract")
        print(result)

        # Split up newlines until we have our question and answers
        parts = result.split("\n\n")

        question = parts.pop(0).replace("\n", " ")
        q_terms = question.split(" ")
        q_terms = list(filter(lambda t: t not in stop, q_terms))
        q_terms = set(q_terms)

        parts = "\n".join(parts)
        parts = parts.split("\n")

        answers = list(filter(lambda p: len(p) > 0, parts))

I当我有黑色的纯文本而没有彩色背景时,可以使用下面的3个选项填充answers数组,但是在这种情况下不可以.我有什么办法可以解决这个问题?

I when I have plain text in black without a colored background I can get the answers array to be populated by the 3 below options, however not in this case. Is there any way I can go about fixing this?

推荐答案

您丢失了二值化或阈值化步骤.

在您的情况下,您只需在灰度图像上应用二进制阈值即可.

In your case you can simply apply binary threshold on grayscale image.

这是带有threshold = 177

在这里1,您可以了解有关使用opencv python库进行阈值处理的更多信息

这篇关于Tesseract没有收到不同的彩色文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆