Why we don’t use computers to transcribe these historical weather records

We’ve had a few queries on our current Zooniverse project Forum page about why we aren’t using machines or artificial intelligence (AI) to transcribe these weather journals. Sometimes, this type of data transcription is called Machine Learning (ML) or Optical Character Recognition/Reader (OCR).

The reason we need actual people to transcribe these documents is that machines still have trouble accurately deciphering the old handwriting from historical weather observations, especially when it’s in tabulated format like meteorological records. It can also be quite fiddly work. So, this research still replies on people reading, interpreting and transcribing the observations.

An OCR weather rescue trial in 2018 found there was only a 50% accuracy rate for machine transcriptions – compared to 99% for manual transcriptions. There’s a slightly higher accuracy rate for printed weather journals compared to handwritten weather journals, but it’s still an expensive and resource/time-intensive process to set up.

The latest “Guidelines on Best Practices for Climate Data Rescue” (2016) by the World Meteorological Organisation, has some information on OCR, stating: “At this time, OCR is of limited value as it requires specialized forms for readability”.

And, the “Best Practice Guidelines for Climate Data Rescue” (2019) by the Copernicus Climate Change Service, says:

“OCR of tabular weather data is a different problem from OCR of prose: the analysis of the layout of each page is essential, i.e., finding the locations of data values on the page to be transcribed, and there is less opportunity to guess character values from preceding and following characters.”

We had help with our pilot project design last year from world-leading weather rescue experts, including Dr Andrew Lorrey, Prof Ed Hawkins, Prof Rob Allan and Dr Philip Brohan – all who have looked into the possibility of using OCR for weather rescue projects. We discussed with the possibility of using OCR for Climate History Australia’s project, however we concluded that the technology isn’t yet resource-effective or accurate enough to use for our historical weather rescue.

Dr Lorrey has been working on a fascinating machine learning project with Microsoft on Zooniverse to recognise characters in old weather records. There’s more detail on Dr Lorrey’s work on New Zealand in this article (here: https://www.nzgeo.com/audio/the-week-that-snowed-shedding-new-light-on-old-weather-records/) explaining that the AI technology isn’t quite ready yet. And the Southern Weather Discovery team shared their results on Zooniverse to date, here.

Dr Brohan has also dabbled with a range of logbook image types, but the tricky part seems to be ensuring the image is segmented appropriately.

In our current Zooniverse project for Perth, a total of eight volunteers will transcribe each observation, and this will give us a high level of accuracy for the transcriptions.

In our pilot project for Adelaide last year, we were pleased to find a high level of agreement in the transcriptions entered through the Zooniverse portal. On average, around 90% of the observations showed that six out of eight transcriptions agreed with each other. Even among the remaining 10%, we still saw a consensus emerge, giving us strong confidence in the results.

Machine learning work to improve character recognition has been helping to train the AI to improve machine transcriptions – but it’s still early days for use in weather rescue. Probably the most promising OCR for weather rescue at present is Weather Wizards. Watch this space for further developments in this area! We’ll certainly keep a close eye on it too.

So, as you can see, this research really relies on people like you to help transcribe important historical weather data. Want to uncover Australia’s climate history better than a computer? Join in here.