如何在 mongodb 中为 Doocr 进程存储 byte[] 图像 [英] How to store byte[] image in mongodb for Doocr process

查看:47
本文介绍了如何在 mongodb 中为 Doocr 进程存储 byte[] 图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 mongodb 中存储字节图像并执行 doocr 过程(通过 public static void main).是否可以通过对来自 mongodb 的存储图像执行 doocr

型号:

公开课照片{@ID私人字符串ID;私有字节[]图像;吸气剂 &二传手

控制器:

@Controller公共类 PhotoController {@GetMapping("/照片/上传")公共字符串上传照片(模型模型){model.addAttribute("message", "你好");返回上传照片";}

在存储在其上的图像之后添加了 mongodb 图像路径.?这样做是正确的方法

主要:

 public static void main(String[] args){SpringApplication.run(StackoverflowApplication.class, args);///mongodb 数据路径是不是?为 doocrFile image = new File("mongodb://localhost:27017//test-db//user");//编码和解码样本String encodingString =Base64.getEncoder().encodeToString(originalInput.getBytes());byte[] decodedBytes = Base64.getDecoder().decode(encodedString);String decodedString = new String(decodedBytes);Tesseract tessInst = new Tesseract();tessInst.setDatapath("C:\\Users\\Administrator\\Desktop\\tessdata");尝试 {字符串结果= tessInst.doOCR(image);System.out.println(结果);} catch (TesseractException e) {System.err.println(e.getMessage());} }

这是否可能或需要任何其他 base64.

解决方案

由于您使用的是 spring.您可以使用 MultipartFile 获取控制器中的文件,然后使用 org.bsonBinary 将文件存储到 MongoDB ,如果您的图像大小<16MB(如果图像大小 > 16 MB,您可以使用

您的数据以BinData格式存储在mongoDb中,从数据库中获取数据请参考上述代码的getImage方法.

提问者使用 tess4j 库从图像中提取文本,doOCR 是该库中的一种方法.我已按照以下步骤从 Spring Boot 应用程序中的图像中提取文本.

  1. tesseract-ocr 安装到您的系统中:

    sudo apt-get install tesseract-ocr

  2. https://下载eng.traineddata训练数据github.com/tesseract-ocr/tessdata 并将其移动到项目根文件夹.

  3. 将以下依赖项添加到您的项目中:

 <依赖><groupId>net.sourceforge.tess4j</groupId><artifactId>tess4j</artifactId><version>3.2.1</version></依赖>

  1. 将以下代码添加到现有项目中:

@GetMapping("/image-text")String getImageText(@RequestParam String id) {可选的<用户>用户 = userRepository.findById(id);ITesseract 实例 = new Tesseract();尝试 {ByteArrayInputStream bais = new ByteArrayInputStream(user.get().getImage().getData());BufferedImage bufferImg = ImageIO.read(bais);String imgText = instance.doOCR(bufferImg);返回 imgText;} 捕获(异常 e){返回读取图像时出错";}}

Hi how to store byte image in mongodb and perform doocr process(through public static void main). Is it possible through perform a doocr for stored image from mongodb

Model:

public class Photo {    
    @Id
    private String id;      
    private byte[] image; } getter & setter

Controller:

@Controller
public class PhotoController {
   @GetMapping("/photos/upload")
    public String uploadPhoto(Model model) {
        model.addAttribute("message", "hello");
        return "uploadPhoto";
    }

added mongodb image path after image stored on that. ?Is right way to do that

MAIN:

 public static void main(String[] args) 
{   
SpringApplication.run(StackoverflowApplication.class, args);            

   /// mongodb data path is it right ? for doocr
 File image = new File("mongodb://localhost:27017//test-db//user");

   // encode nd decode  sample

String encodedString =Base64.getEncoder().encodeToString(originalInput. 
  getBytes());      
byte[] decodedBytes = Base64.getDecoder().decode(encodedString);
    String decodedString = new String(decodedBytes);

    Tesseract tessInst = new Tesseract();
    tessInst.setDatapath("C:\\Users\\Administrator\\Desktop\\tessdata");
    try {
            String result= tessInst.doOCR(image);
            System.out.println(result);
    } catch (TesseractException e) {
            System.err.println(e.getMessage());
    }           }

is it possible or any other base64 needed for this.

解决方案

Since you are using spring. You can use MultipartFile to get the file in your controller and then use Binary of org.bson to store file to MongoDB , If your image size < 16MB (if image size > 16 MB you can use GridFs ).

You need to add only one dependency to your project - spring-data-mongoDB

Let's take an example of a User collection which looks like this:

@Document
public class User {
    @Id
    private String id;
    
    private String name;
    private Binary image;
    // getters and setters
}

Here you can see Binary image which represents your image file.

Now create a repository for this User collection using MongoRepository

public interface UserRepository extends MongoRepository<User, String>{

}

Create a Controller for demo purpose. Use @RequestParam MultipartFile file to get file to your controller, get bytes from file and set it to user object user.setImage(new Binary(file.getBytes())); complete example is below:

@RestController
public class UserController {
    @Autowired
    private UserRepository userRepository;

    @PostMapping("/users")
    User createUser(@RequestParam String name, @RequestParam MultipartFile file) throws IOException {
        User user = new User();
        user.setName(name);
        user.setImage(new Binary(file.getBytes()));
        
        return userRepository.save(user);
    }

    @GetMapping("/users")
    String getImage(@RequestParam String id) {
        Optional<User> user = userRepository.findById(id);
        Encoder encoder = Base64.getEncoder();
        
        return encoder.encodeToString(user.get().getImage().getData());

    }
}

Start the server and hit the end point as shown in below postman screenshot

Your data is stored in mongoDb in BinData format and to get the data from database please refer to getImage method of above code.

EDIT:

The question asker is using tess4j library for extracting text from image and doOCR is a method in this library. I have followed these steps to extract text from image in my spring boot application.

  1. Install tesseract-ocr into your system:

    sudo apt-get install tesseract-ocr

  2. Download eng.traineddata training data from https://github.com/tesseract-ocr/tessdata and move it to project root folder.

  3. Add below dependency to your project:

   <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>3.2.1</version>
   </dependency>

  1. Add the below code to existing project:

@GetMapping("/image-text")
String getImageText(@RequestParam String id) {
    Optional<User> user = userRepository.findById(id);
    ITesseract instance = new Tesseract();
    try {
        ByteArrayInputStream bais = new ByteArrayInputStream(user.get().getImage().getData());
        BufferedImage bufferImg = ImageIO.read(bais);
        String imgText = instance.doOCR(bufferImg);
        return imgText;
    } catch (Exception e) {
        return "Error while reading image";
    }
}

这篇关于如何在 mongodb 中为 Doocr 进程存储 byte[] 图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆