Becoming Batman

Ever since I was a child I always dreamed of becoming a super hero, I idolized these protagonists due to them being able to achieve a higher sense of them self and going above and beyond and self…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Salient Object Detection for Web Document Scanning

Document scanner apps are often used in smartphones to scan every day documents, they use UI tools and sometimes AI to improve the UX. The majority are native apps that take advantage of phone processing power and the storage support... On the other hand, Web alternatives are very limited until now. A Web Scanner can be a more cost effective alternative. It is a great way to start with Demos for use cases using NLP or Computer Vision. Image cleaning is essential for these tasks in order to get accurate results.

The idea is to create a Web document scanner that doesn’t require any manual editing, everything is taken into account (background removal, cleaning, enhancing and text deskewing).

Some results showcasing how this SOD model works:

The U²-NET model can be trained from scratch on custom datasets, which can improve significantly its performance on specific tasks (low light, complex backgrounds…). I think it is a more robust approach than coding Computer Vision algorithms (to detects document edges and corners).

Tesseract is a popular Open-source OCR engine used for extracting text from images. But it has other functionalities. Among them, the most important for our use case, was to detect the text orientation, which is essential to deal with documents scanned upside down.

Sometimes when the scanned image has an angle between 5 and 85 degrees, Textcleaner.sh and Tesseract suffer to deskew the image document correctly.

The results seem promising, we tested on multiple angles one document image, the results are shown below.

Scanner results 1
Scanner results 2
Visual Demo

This is a a cost effective Responsive Web App Demo template to test NLP and Computer Vision models that deal with photos taken from smartphone camera.

Finding a sweet spot between Process time Vs Accuracy Vs Hardware Cost is the main challenge. It takes around 7 to 10 seconds to get Scan results (using a Serverless service), but there is room for improvements whether on code or hardware. Ex. by sticking a NLP model like NER to this App, we can still get 10 seconds below response time which is not bad !

Any ideas on improving Process time without decreasing overall performance are welcome.

Add a comment

Related posts:

Hemp seen as boon to Marche region of eastern Italy

Stakeholders in the Marche region of Italy envision a thriving industrial hemp sector active in construction, textiles, bioplastics and food under regional hemp legislation expected to be finalized…

Choque cultural en Dinamarca

La primera vez que fui a Europa me quedé prácticamente un año entero en Aarhus, una ciudad danesa en la costa este de la península de Jutlandia. Aunque había oído hablar del choque cultural y el…

User requirement document framework for Product Managers

On my journey of personal development. I was on the Requirement writing bit of the Product management module of Bestpracticer. Several articles stood out to me. They were majorly on writing 1 pager…