Backed by Google: ubisend and Archant Bring Old Newspapers to Life
UPDATE, 21/11/18: We've made tremendous progress! Watch a video introducing Local Recall to understand the powerful impact this project will have on local news and local history.
UPDATE, 12/09/18: Phase one of this project is almost live, we are looking for volunteers! Read the article below, then fill out this form to be a part of this amazing project: Be a part of LocalRecall.
We are incredibly proud to announce ubisend is partnering up with publishing company Archant in an exciting, Google-funded project.
Table of contents
What is the Google DNI fund?
The Google Digital News Initiative (DNI) is a Google fund designed to support high-quality journalism through innovation. The fund was launched in 2016 and has, since then, offered over €73 million to 359 projects across Europe.
This year, UK-based publisher Archant and ubisend were fortunate enough to receive Google’s backing on an audacious and exciting project.
This is the story.
Wanna skip straight to the official Google announcement of the project? Read this.
Turning millions of words written on dusty, moth-eaten archived newspapers into an AI-driven chatbot
Archant approached us this summer with a crazy project.
They are a well known UK publisher, going strong since 1870, publishing many papers and magazines daily. Over the years, they have kept hard copies of everything down in their basement archives.
As time passed, these archives turned from a valuable asset into binders that are falling apart, filled with yellowing and cracking paper.
In comes ubisend.
We have been turning digital, written words, into chatbots for a long time. Taking PDFs, word documents, web pages, and other ‘digital things’ and turning them into a conversation -- our forte.
Archant came to us with the crazy idea of turning those old, yellowing, almost illegible 150-year-old real-life newspapers into a chatbot.
Sometimes, a video paints a thousand words. Watch the introduction to Local Recall below.
We met and assessed the size of the work, and it was by no means a simple task. There are many hundreds of ledgers (typically one per year per journal), hundreds of thousands of pages and many millions of words.
Just thinking about crawling millions of words looking for an answer to a question in real time made our chief geek’s brain hurt.
It is a massive project, and we love every aspect of it. Working with one of the UK’s best-known publishers, the community editing open-source software we will build (more on that another day), to the nostalgia of saving dying words, and, of course, working with Google.
We could not pass on the opportunity.
Our audacious goals
So, what is it we are doing?
The premise is simple. Archant has hundreds of thousands of pages of content in analog format. We need to turn it all into a real-time text and voice conversation.
Our first goal is for a user to be able to ask things like:
‘What happened today in 1934?’
‘What was the headline news on the 4th of January 1899?’
This would return the news that happened on these particular dates in a natural, conversational, way.
The ultimate goal is even more exciting. Once we have processed enough data from Archant’s archives, the users will be able to ask things like:
‘Tell me the headlines on the Queen’s coronation.’
‘When was the last time Norwich Football Club won a game 6-0?’
‘What else happened on the day the Second World War was declared?’
The way we will, over time, label, store, and resurface the data we get from Archant’s archives will make this possible. It will also enable us to deliver topic-specific information (sports, weather, politics, etc.).
Backed by Google
This is an exciting and sizeable project and Google’s DNI fund was the perfect opportunity.
Our project is about saving archives no one is doing anything with, and making this content available for the world to use. Imagine the power of being able to naturally question over 150 years of history, imagine the knowledge we will gather from it.
It is this that excited Google. After many late nights of work on the application, we got the good news: we were one of the projects to get funding this year.
With just under €1million in funding, Archant, ubisend, and Google will work closely to make this project successful over the next two years.
The ubisend team is hugely proud to be part of this project. We have the opportunity to build something truly unique, something no one has ever done before.
We hope you all follow us along for the ride.
So, so much more
This project -- called LocalRecall -- has much more to it than I explained with the 600ish words above. It involves image recognition, OCR, machine learning, natural language understanding, local community engagement, and more.
Google’s funding will help us in every aspect of the LocalRecall project, from technical builds to marketing the final product. As this is a particularly exciting (and, for once, one we are not under NDA!) project, I will make sure we keep you all up to date with everything that is happening.
Stay tuned, stick around -- it is going to be a fun ride.
Looking for volunteers
LocalRecall has progressed quite a bit since we published this article! We are updating this in the middle of September 2018 with some exciting news. The first phase of the project is almost live.
We have started digitising and parsing lots of news through our processes. We are now looking for volunteers to help us contribute and enhance some of the more complex content. If you are interested in local news, Norwich/Norfolk, or would simply like to spend some time on this amazing project; have a read through LocalRecall's page.
We'd love to have you!
LocalRecall in the press
Eastern Daily Press chatbot will give access to 150 years of local news (Matthew Moore, The Times)
Project begins to bring newspaper archive to life through cutting-edge technology (Doug Faulkner, EDP)
Lorna Willis (Archant Digital Executive) talks through the project at Google Paris :