Text Recognition in React Native

October 30, 2020 Dave Cumming

At Adapptor, a Perth app developer, we’re sometimes tasked with creating prototypes for potential projects, which can be great fun as we’re frequently trying something that nobody else in the team has tried before. In this case I was asked to investigate the feasibility of reading and recognising specific text in a React Native app.

Google provides a library called MLKit which can read text, recognise landmarks, track objects and many other pretty amazing things, so that seemed like the perfect starting point. However, sadly it doesn’t support React Native.

Luckily there’s an open source project in React Native Community called react-native-camera, which wraps MLKit and gives you hooks into React Native to use many of the features.

At the moment, this is using an older version of MLKit, but as it’s a widely supported community library I expect they’ll catch up soon, so it seemed a safe bet to use this library knowing that it would continue to be supported.

I’m going to cover iOS here as that’s the platform I was interested in initially, but Android is also supported and is covered in all the relevant guides.

Installation

As with most React Native projects we add react-native-camera through your favoured package manager, in our case yarn. To add the iOS specific code we use Cocoapods.

If you want to use any of the additional optional features beyond just taking photos and capturing videos, such as Face, Text or Barcode detection, then these need to be added separately. Thankfully this is as simple as adding a line to the Podfile. In my case I wanted the TextDetector, so I had to add this.

pod ‘react-native-camera’, path: ‘../node_modules/react-native-

camera’, subspecs: [ ‘TextDetector’ ]

Ok, so our Podfile is good to go. Let’s run the install.

yarn add react-native-camera

cd ios

pod installPermissions

Permissions

See this content in the original post

As we’re going to be accessing the phone’s hardware, we have to request permission to do so. The framework will do the actual requesting, so [or remove ‘so’ use a ; or .] we just need to set the message we’d like to display to the user, which is done using the OS plist permissions. Always try to ensure it’s clear why you’re requesting the permission or the user may well reject it and your app won’t work as intended.

See this content in the original post

Firebase

As under the hood this uses the Firebase library you’ll need a Firebase account setup for your app and the Firebase SDK configured in your app. I won’t go into that here, but a guide to setting up React Native Firebase can be found here.

RNCamera

The first code we need to add in your app is setup of the camera object which performs the image capture. Doing that is pretty simple.

First import RNCamera:

import { RNCamera } from “react-native-camera”;

Then add it to to your component render method:

See this content in the original post

In this case I add a callback for the onTextRecognized method which tells RNCamera that I want it to tell me about any text it sees in the frame. Similar methods are available for face detection, bar code scanning, etc. RNCamera also includes options for camera control in terms of image size, video quality, zoom levels and so on, so depending on your use case you may want to play around with these settings to get the best results.

The line inside the RNCamera tag referring to this.state.ocrElements is rendering the output for debugging purposes and we’ll discuss that later.

My textRecognized looks a bit like the code block below. RNCamera returns blocks of text, which you can then step down through to lines and then to words (or at least what the scanner thinks are words). In our example we only go down as far as grabbing the lines. We then write those to our local state.

See this content in the original post

Pretty much anything can be put inside the camera view like this, so you can overlay the text or barcode with what’s scanned, outline objects, add buttons, etc. to provide feedback and interact with the user.

My renderOcrElement function which provides this debug info by displaying the outline and the actual text the library has returned looks like this:

See this content in the original post

It’s quite hard to see the results. I’ll admit I didn’t spend much time on the styling, as it was purely a proof of concept and this was just the debugging not the end result, but you get the idea. You can clearly see each line of text being pulled out and rendered by the app. Accuracy is pretty good, although it depends largely on the source text. In this case it’s quite clear, large lettering so it works really well. Background images, watermarks, etc. can cause problems for the recognition software.

See this content in the original post

In our project we required a fair bit of extra processing as the order of the text items and the accuracy was important, but for basic text scanning that should be enough to get you up and running.

Scanning barcodes and QR codes is very similar, so once you’ve mastered one the other is trivial to implement. The uses for this excellent library are endless so give it go and have fun!