Becoming Batman

Ever since I was a child I always dreamed of becoming a super hero, I idolized these protagonists due to them being able to achieve a higher sense of them self and going above and beyond and self…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Salient Object Detection for Web Document Scanning

Document scanner apps are often used in smartphones to scan every day documents, they use UI tools and sometimes AI to improve the UX. The majority are native apps that take advantage of phone processing power and the storage support... On the other hand, Web alternatives are very limited until now. A Web Scanner can be a more cost effective alternative. It is a great way to start with Demos for use cases using NLP or Computer Vision. Image cleaning is essential for these tasks in order to get accurate results.

The idea is to create a Web document scanner that doesn’t require any manual editing, everything is taken into account (background removal, cleaning, enhancing and text deskewing).

Some results showcasing how this SOD model works:

The U²-NET model can be trained from scratch on custom datasets, which can improve significantly its performance on specific tasks (low light, complex backgrounds…). I think it is a more robust approach than coding Computer Vision algorithms (to detects document edges and corners).

Tesseract is a popular Open-source OCR engine used for extracting text from images. But it has other functionalities. Among them, the most important for our use case, was to detect the text orientation, which is essential to deal with documents scanned upside down.

Sometimes when the scanned image has an angle between 5 and 85 degrees, Textcleaner.sh and Tesseract suffer to deskew the image document correctly.

The results seem promising, we tested on multiple angles one document image, the results are shown below.

This is a a cost effective Responsive Web App Demo template to test NLP and Computer Vision models that deal with photos taken from smartphone camera.

Finding a sweet spot between Process time Vs Accuracy Vs Hardware Cost is the main challenge. It takes around 7 to 10 seconds to get Scan results (using a Serverless service), but there is room for improvements whether on code or hardware. Ex. by sticking a NLP model like NER to this App, we can still get 10 seconds below response time which is not bad !

Any ideas on improving Process time without decreasing overall performance are welcome.

Becoming Batman

Salient Object Detection for Web Document Scanning

Add a comment

Related posts:

Hemp seen as boon to Marche region of eastern Italy

Choque cultural en Dinamarca

User requirement document framework for Product Managers