How a NASA Facility is Digitizing over 90,000 Planetary Mission Images
University of Arizona uses Matrox Imaging OCR software to read text-field data from Surveyor missions in record time and with perfect accuracy
The University of Arizona's Lunar and Planetary Laboratory (LPL) is home to the Space Imagery Center, a NASA Regional Planetary Image Facility. Founded in 1960, LPL was one of the few places engaged in studies of the solar system at that time.
In 2015, NASA partnered with the University of Arizona, providing funding to digitize the film images and data from the Surveyor moon landers that have been in storage since the 1960s. The goal is to create an archive for inclusion in the NASA Planetary Data System (PDS), a collection of data products from NASA planetary missions. As John Anderson, senior media technician at LPL, describes it, his "focus and primary area of responsibility is the digital recording of the images, extracting and decoding the encoded image data optically recorded on each film frame, and processing the pictures for viewing in a digital format."
Between 1966 and 1968, the five successful Surveyor missions returned over 92,000 individual images of the moon's surface. Film images were created by focusing a 70 mm film camera at a precision CRT display monitor and photographed onto special recording film.
In the 50 years since, the computer files and video tape records have long disappeared or become obsolete—the only existing copies of the images are the film rolls.
Many frames from the Surveyor missions had seemingly legible text, which the operators initially thought could easily be read by conventional optical character recognition (OCR) software. They soon discovered that the characters in the text were a dot matrix similar to old printers using a 7x9 teletype-style character, making it a challenge to find an OCR software capable of accurately reading the text fields. A comprehensive OCR solution was needed.
A stellar solution
This is where Matrox comes in. Anderson notes, "Lorne Trottier, co-owner of Matrox, saw an article in Planetary Report about the NASA PDS project. He reached out to the university through Arnaud Lina, director of research and innovation at Matrox Imaging, offering assistance using Matrox's OCR software to read LPL's text information. [LPL] selected some cropped images to upload for a test and the results were amazing. It was very encouraging, especially with the failure of other OCR products to read the human readable text (HRT)."
The overall project involves creating a searchable archive that will outlast conventional physical media repositories. Given the possible long-term reference potential of the images and data, there is need for careful and accurate treatment of the resources.
The workflow comprised an image scanning system from Stokes Imaging. The Stokes Imaging System captured between four and eight frames per minute as high-resolution TIFF images. At the conclusion of the scanning phase, LPL found themselves with over 92,000 individual images.
Operator interaction was intensive during the original scanning process. While the Stokes Imaging System was automated, the film itself was not uniform in spacing, indexing, exposure, or processing. Once scanned, Adobe® Photoshop® and MATLAB software were used to pick out the details and create large composite mosaics from the image files. The process also required manual error checking since the decoding of the dot-field data relied on calibration lookup tables created from the original 1966 pre-launch test data.
We have liftoff
The project began in February 2015 with the assembly of the Stokes system, and continues to process, catalog, and data-mine the information contained within the images.
Even though there are sprocket perforations on the film stock, the original recording transport was sprocket-less, resulting in inconsistent frame spacing as well as frames drifting with respect to the edge perforations. The team at LPL were unable to determine a consistent film advance, and with each new roll of film, the spacing of the frames and lateral positioning of the image shifted. This resulted in overall images with text in different places, as well as some images tainted with artifacts. Moreover, the data fields have HRT with varying number of characters.
Matrox's solution—based on one of its efficient and accurate OCR software tools—beautifully addressed the problem of reading dot matrix characters, and reduced the time expenditure to a few minutes per roll.
The initial review of the Matrox OCR solution showed an almost perfect read from nearly 4,500 different image files. For example, for roll 1 of Mission 5, the Matrox OCR solution scanned 846 files, reading 15,191 individual fields for a staggering 99.77% accuracy. Rolls 2 and 9 of Mission 5, were even better, yielding respective 99.92% and 100% accuracy rates.
Looking to the future
The University of Arizona Lunar and Planetary Laboratory Space Imagery Center, a NASA Regional Planetary Image Facility, serves as the repository for many images and resources from all NASA missions. To date, the Matrox software has helped tackle data from Surveyor 5, and will prove a valuable tool during the catalogue and error check of data from Surveyor 6 and 7, along with other mission materials from NASA projects and explorations.
The Matrox OCR software has been an instrumental addition to the archiving project. Continued use of the system will accelerate the recording of text information from the Surveyor image files, enhance the accuracy of the metadata, and streamline what can be a very labor intensive and tedious task.
Anderson notes, "Compared with accuracy rates of 75% to 85% achieved with the original approach, there is no doubt as to the better result. Our project has been greatly enhanced and the progress of reading and cataloging the data with high accuracy would not have been possible without the gracious assistance of the Matrox team."
The team at the University of Arizona’s LPL would like to recognize the following individuals for their contributions to the NASA PDS project:
Justin Rennilson, primary technology resource person for the PDS project.
Shane Byrne, director of the Space Imagery Center.
Maria Schuchardt, data manager of the Space Imagery Center.
John Anderson, senior media technician at LPL.
Leon Palafox and Rodrigo Savage, for overseeing MATLAB coding and processing.
John T. Stokes, president of Stokes Imaging, supplier of the scanning system and technical consultant to the project.
For more information, contact Media Relations.