如何在 mongodb 中为 Doocr 进程存储 byte[] 图像 [英] How to store byte[] image in mongodb for Doocr process
问题描述
如何在 mongodb 中存储字节图像并执行 doocr 过程(通过 public static void main).是否可以通过对来自 mongodb 的存储图像执行 doocr
型号:
公开课照片{@ID私人字符串ID;私有字节[]图像;吸气剂 &二传手
控制器:
@Controller公共类 PhotoController {@GetMapping("/照片/上传")公共字符串上传照片(模型模型){model.addAttribute("message", "你好");返回上传照片";}
在存储在其上的图像之后添加了 mongodb 图像路径.?这样做是正确的方法
主要:
public static void main(String[] args){SpringApplication.run(StackoverflowApplication.class, args);///mongodb 数据路径是不是?为 doocrFile image = new File("mongodb://localhost:27017//test-db//user");//编码和解码样本String encodingString =Base64.getEncoder().encodeToString(originalInput.getBytes());byte[] decodedBytes = Base64.getDecoder().decode(encodedString);String decodedString = new String(decodedBytes);Tesseract tessInst = new Tesseract();tessInst.setDatapath("C:\\Users\\Administrator\\Desktop\\tessdata");尝试 {字符串结果= tessInst.doOCR(image);System.out.println(结果);} catch (TesseractException e) {System.err.println(e.getMessage());} }
这是否可能或需要任何其他 base64.
由于您使用的是 spring.您可以使用 MultipartFile
获取控制器中的文件,然后使用 org.bson
的 Binary
将文件存储到 MongoDB ,如果您的图像大小<16MB(如果图像大小 > 16 MB,您可以使用
您的数据以BinData
格式存储在mongoDb中,从数据库中获取数据请参考上述代码的getImage
方法.
提问者使用 tess4j
库从图像中提取文本,doOCR
是该库中的一种方法.我已按照以下步骤从 Spring Boot 应用程序中的图像中提取文本.
将
tesseract-ocr
安装到您的系统中:sudo apt-get install tesseract-ocr
从https://下载
eng.traineddata
训练数据github.com/tesseract-ocr/tessdata 并将其移动到项目根文件夹.将以下依赖项添加到您的项目中:
<依赖><groupId>net.sourceforge.tess4j</groupId><artifactId>tess4j</artifactId><version>3.2.1</version></依赖>
- 将以下代码添加到现有项目中:
@GetMapping("/image-text")String getImageText(@RequestParam String id) {可选的<用户>用户 = userRepository.findById(id);ITesseract 实例 = new Tesseract();尝试 {ByteArrayInputStream bais = new ByteArrayInputStream(user.get().getImage().getData());BufferedImage bufferImg = ImageIO.read(bais);String imgText = instance.doOCR(bufferImg);返回 imgText;} 捕获(异常 e){返回读取图像时出错";}}
Hi how to store byte image in mongodb and perform doocr process(through public static void main). Is it possible through perform a doocr for stored image from mongodb
Model:
public class Photo {
@Id
private String id;
private byte[] image; } getter & setter
Controller:
@Controller
public class PhotoController {
@GetMapping("/photos/upload")
public String uploadPhoto(Model model) {
model.addAttribute("message", "hello");
return "uploadPhoto";
}
added mongodb image path after image stored on that. ?Is right way to do that
MAIN:
public static void main(String[] args)
{
SpringApplication.run(StackoverflowApplication.class, args);
/// mongodb data path is it right ? for doocr
File image = new File("mongodb://localhost:27017//test-db//user");
// encode nd decode sample
String encodedString =Base64.getEncoder().encodeToString(originalInput.
getBytes());
byte[] decodedBytes = Base64.getDecoder().decode(encodedString);
String decodedString = new String(decodedBytes);
Tesseract tessInst = new Tesseract();
tessInst.setDatapath("C:\\Users\\Administrator\\Desktop\\tessdata");
try {
String result= tessInst.doOCR(image);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
} }
is it possible or any other base64 needed for this.
Since you are using spring. You can use MultipartFile
to get the file in your controller and then use Binary
of org.bson
to store file to MongoDB , If your image size < 16MB (if image size > 16 MB you can use GridFs ).
You need to add only one dependency to your project - spring-data-mongoDB
Let's take an example of a User collection which looks like this:
@Document
public class User {
@Id
private String id;
private String name;
private Binary image;
// getters and setters
}
Here you can see Binary image
which represents your image file.
Now create a repository for this User collection using MongoRepository
public interface UserRepository extends MongoRepository<User, String>{
}
Create a Controller for demo purpose. Use @RequestParam MultipartFile file
to get file to your controller, get bytes from file and set it to user object user.setImage(new Binary(file.getBytes()));
complete example is below:
@RestController
public class UserController {
@Autowired
private UserRepository userRepository;
@PostMapping("/users")
User createUser(@RequestParam String name, @RequestParam MultipartFile file) throws IOException {
User user = new User();
user.setName(name);
user.setImage(new Binary(file.getBytes()));
return userRepository.save(user);
}
@GetMapping("/users")
String getImage(@RequestParam String id) {
Optional<User> user = userRepository.findById(id);
Encoder encoder = Base64.getEncoder();
return encoder.encodeToString(user.get().getImage().getData());
}
}
Start the server and hit the end point as shown in below postman screenshot
Your data is stored in mongoDb in BinData
format and to get the data from database please refer to getImage
method of above code.
EDIT:
The question asker is using tess4j
library for extracting text from image and doOCR
is a method in this library. I have followed these steps to extract text from image in my spring boot application.
Install
tesseract-ocr
into your system:sudo apt-get install tesseract-ocr
Download
eng.traineddata
training data from https://github.com/tesseract-ocr/tessdata and move it to project root folder.Add below dependency to your project:
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.2.1</version>
</dependency>
- Add the below code to existing project:
@GetMapping("/image-text")
String getImageText(@RequestParam String id) {
Optional<User> user = userRepository.findById(id);
ITesseract instance = new Tesseract();
try {
ByteArrayInputStream bais = new ByteArrayInputStream(user.get().getImage().getData());
BufferedImage bufferImg = ImageIO.read(bais);
String imgText = instance.doOCR(bufferImg);
return imgText;
} catch (Exception e) {
return "Error while reading image";
}
}
这篇关于如何在 mongodb 中为 Doocr 进程存储 byte[] 图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!