Text Recognition in React Native

At Adapptor, a Perth app developer, we’re sometimes tasked with creating prototypes for potential projects, which can be great fun as we’re frequently trying something that nobody else in the team has tried before. In this case I was asked to investigate the feasibility of reading and recognising specific text in a React Native app.

Google provides a library called MLKit which can read text, recognise landmarks, track objects and many other pretty amazing things, so that seemed like the perfect starting point. However, sadly it doesn’t support React Native.

Luckily there’s an open source project in React Native Community called react-native-camera, which wraps MLKit and gives you hooks into React Native to use many of the features.

At the moment, this is using an older version of MLKit, but as it’s a widely supported community library I expect they’ll catch up soon, so it seemed a safe bet to use this library knowing that it would continue to be supported.

I’m going to cover iOS here as that’s the platform I was interested in initially, but Android is also supported and is covered in all the relevant guides.

Installation

As with most React Native projects we add react-native-camera through your favoured package manager, in our case yarn. To add the iOS specific code we use Cocoapods.

If you want to use any of the additional optional features beyond just taking photos and capturing videos, such as Face, Text or Barcode detection, then these need to be added separately. Thankfully this is as simple as adding a line to the Podfile. In my case I wanted the TextDetector, so I had to add this.

pod ‘react-native-camera’, path: ‘../node_modules/react-native-

camera’, subspecs: [ ‘TextDetector’ ]

Ok, so our Podfile is good to go. Let’s run the install.

yarn add react-native-camera

cd ios

pod installPermissions

Permissions

1_JoPTBDfbeAKqu4xV_cc4fQ.png

As we’re going to be accessing the phone’s hardware, we have to request permission to do so. The framework will do the actual requesting, so [or remove ‘so’ use a ; or .] we just need to set the message we’d like to display to the user, which is done using the OS plist permissions. Always try to ensure it’s clear why you’re requesting the permission or the user may well reject it and your app won’t work as intended.

<! — Required with iOS 10 and higher → 
<key>NSCameraUsageDescription</key> 
<string>Your message to user when the camera is accessed for the 
first time</string>
<! — Required with iOS 11 and higher: include this only if you are 
planning to use the camera roll → 
<key>NSPhotoLibraryAddUsageDescription</key>
<string>Your message to user when the photo library is accessed for 
the first time</string>
<! — Include this only if you are planning to use the camera roll → 
<key>NSPhotoLibraryUsageDescription</key>
<string>Your message to user when the photo library is accessed for 
the first time</string> 
<! — Include this only if you are planning to use the microphone for video recording → 
<key>NSMicrophoneUsageDescription</key>
<string>Your message to user when the microphone is accessed for the 
first time</string>

Firebase

As under the hood this uses the Firebase library you’ll need a Firebase account setup for your app and the Firebase SDK configured in your app. I won’t go into that here, but a guide to setting up React Native Firebase can be found here.



RNCamera

The first code we need to add in your app is setup of the camera object which performs the image capture. Doing that is pretty simple.

First import RNCamera:

import { RNCamera } from “react-native-camera”;

Then add it to to your component render method:

<RNCamera
style={{ flex: 1 }}onTextRecognized={this.textRecognized}
captureAudio={false}>
  {this.state.ocrElements && this.state.ocrElements.map((element) => 
this.renderOcrElement(element))} 
</RNCamera>

In this case I add a callback for the onTextRecognized method which tells RNCamera that I want it to tell me about any text it sees in the frame. Similar methods are available for face detection, bar code scanning, etc. RNCamera also includes options for camera control in terms of image size, video quality, zoom levels and so on, so depending on your use case you may want to play around with these settings to get the best results.

The line inside the RNCamera tag referring to this.state.ocrElements is rendering the output for debugging purposes and we’ll discuss that later.

My textRecognized looks a bit like the code block below. RNCamera returns blocks of text, which you can then step down through to lines and then to words (or at least what the scanner thinks are words). In our example we only go down as far as grabbing the lines. We then write those to our local state.

private textRecognized = ({ textBlocks }: { textBlocks: 
TrackedTextFeature[] }) => {
var ocrElements: Array<OcrElement> = [];
  textBlocks.forEach(textBlock => {
    textBlock.components.forEach(textLine => {
      ocrElements.push({
        bounds: textLine.bounds, text: textLine.value
       });
    });
  });
  this.setState({
    ocrElements: ocrElements,
  });
});

Pretty much anything can be put inside the camera view like this, so you can overlay the text or barcode with what’s scanned, outline objects, add buttons, etc. to provide feedback and interact with the user.

My renderOcrElement function which provides this debug info by displaying the outline and the actual text the library has returned looks like this:

private renderOcrElement = (element: OcrElement) => {
return (
  <View
    style={{
      borderWidth: 1,
      borderColor: “blue”,
      position: “absolute”,
      left: 0,
      top: element.bounds.origin.y,
      right: 0,
      bottom: 50,
    }}
  >
    <Text>{element.text}</Text>
  </View>
);};

It’s quite hard to see the results. I’ll admit I didn’t spend much time on the styling, as it was purely a proof of concept and this was just the debugging not the end result, but you get the idea. You can clearly see each line of text being pulled out and rendered by the app. Accuracy is pretty good, although it depends largely on the source text. In this case it’s quite clear, large lettering so it works really well. Background images, watermarks, etc. can cause problems for the recognition software.

1_rlOjpj_YIwdHUpnpYwLT7A.jpeg

In our project we required a fair bit of extra processing as the order of the text items and the accuracy was important, but for basic text scanning that should be enough to get you up and running.

Scanning barcodes and QR codes is very similar, so once you’ve mastered one the other is trivial to implement. The uses for this excellent library are endless so give it go and have fun!

Dave Cumming

David, who is originally from Scotland, brings a wealth of experience to the team from a multitude of industries including retail, travel, industrial and entertainment in the UK and in Australia.  He has moved from technology to technology over the years as well having managerial and team lead experience so could be considered an all-rounder, jack of all trades ready to take on pretty much any task big or small.

Throughout his career, he has worked for many larger companies including lastminute.com, CSC and Rank group but has also enjoyed the startup and agency scenes and likes to bring these varied experiences to bear in helping both Adapptor and our clients further not only their IT needs but their overall business needs.

Amongst the many brands David has worked with and for, while in Scotland he was part of a team which produced the iOS app for Fanduel, a US-based sports gaming company.  This app not only had over 1 million downloads in the first month but went on to win two Webby awards. Fanduel was so pleased with the product they bought the company!

David has been a part of the Adapptor development team for three years and helped develop a range of mobile apps, including the Lotterywest, Wilson Parking, QLDTraffic, Swift Networks, and Grand Cinemas.

Joined Adapptor in 2017.

Presentations

David spoke at DDD 2019 on the subject of cross-platform development solutions.

Technical Skills

React Native, React, Objective C, Java, JavaScript, TypeScript, AWS, Python, App Code, Xamarin (C#), SQL, Android, iOS, HTML5, GIT, SVN, Fastlane.

Qualifications

University of Paisley: Graduate BSc Computing Science

AWS Certified Developer

Previous
Previous

Designing for iOS 14 Home Screen Widgets

Next
Next

Creating an Apple App Clip with React Native