Answers from chat (source) - 🍻 Selber Bier Brauen

Das ist ein Kommentar zu rc3 - Chaos Communication Congress, eingetragen von WikiAdmin am 30.12.2020 12:39

Quelltext der Seite Answers from chat

we tried a lot of stuff with opencv to preprecess images to improve results for unstructured sparse text with tesseract, but it seemed that you need large datasets train the model properly

for regular documents, tesseract works fairly ok - but google vision/aws textract are a lot better for unstructured text

Try one with tesseract.js https://tesseract.projectnaptha.com/ (just drag and drop file into), to compare results with google's stuff, you could upload the image to google drive and then right-click open with google docs (iirc)

https://github.com/manisandro/gimagereader

https://www.systutorials.com/docs/linux/man/1-tesseract/