Google OCR tech now understands over 200 languages

Shane McGlaun - May 8, 2015, 6:30am CDT
Google OCR tech now understands over 200 languages

Google has been pushing an ambitions scanning project for years in an attempt to get printed books and magazines in digital form where they can be searched. Google has also fought lawsuits alleging that it is violating copyrights by scanning books into digital form. Suits such as those haven’t stopped Google from working to scan even more material into digital form and the search giant has announced a significant landmark for its OCR technology.

OCR, or Optical Character Recognition technology turns pictures into texts that can be searched, indexed, and edited. Google is now able to turn images of text in over 200 languages into documents that can be edited and searched in Google Docs.

With support for over 200 languages, Google says that its OCR tech can now support all of the world’s major languages. It’s more than supporting the spoken languages alone, the software also recognizes 25 different writing systems. To use the new OCR tech the user uploads a document in its current form, such as an image or PDF, to their drive account.

Users then right click the document in the Drive interface and select open with Google Docs. The doc opens the document in its original form and places extracted text below the image. The OCR system determines what language the original image is in automatically. The OCR feature is also available in the Drive app for Android.

SOURCE: google


Must Read Bits & Bytes