National Film and Sound Archive of Australia unlocks collection

By

Builds a full-text search website.

The National Film and Sound Archive of Australia (NFSA) has updated its collection search website, installing ‘vector’ capabilities rather than traditional methods of retrieving data in efforts to ease finding information. 

National Film and Sound Archive of Australia unlocks collection

The archive organisation implemented MongoDB Atlas vector search entity recognition called LangChain, deploying a production-ready search API on AWS said Shahab Qamar, software engineering manager at the NFSA.

Speaking at a recent AWS Summit in Sydney, Qamar said the work focused on extracting knowledge from NFSA's audiovisual collection at scale using the new capabilities. 

“We're growing at about one petabyte a year; we have eight petabytes so far in our collection. 

“The NFSA has three main objectives. We collect, we preserve and we share,” Qamar said. 

The Australian audiovisual archive aims to preserve news documentaries, movies, films, VR content plus social media posts and employs specialised staff and systems “that grab all this material and preserve it.”

As an example, under the new platform, Qamar pointed to the Australian movie Mad Max Fury Road which would be considered the “main item” under the new search function. 

“We call it the title version …everything gets catalogued under the main title, which is the main work of art.

“Then at the bottom, we have information about where that item is. It's something called the FRBR model in the archiving universe.

“The title is your overarching category of what is it that you've collected – ‘version’ is a more specific version of that and ‘media’ is where that item is.”

He said “All of that is put into relational database” allowing users to search the collection against the data. 

“All this metadata we have, we used to have a collection search website, built in 2012, it was really, really old. 

“As soon as I saw that it was best viewed in IE6 [Internet Explorer 6], we decided to rebuild it as soon as possible. 

“We started looking for the fastest way to build a full-text search website, we looked at a few vendors and landed on MongoDB Atlas. 

“The main thing with full-text search was the full-text search - needed the ability to have facets, relevance-based search results and pagination. 

“We wanted to get there as quickly as possible. MongoDB Atlas allowed us to do that, because the database is fully managed, easy to deploy on an auto scale. It has a mature developer experience, search transactions, analytics, all in one platform and it's also easy on the bank. 

“And with a small team like ours, we don't have time to manage infrastructure, we just want to build applications.”

Qamar said the NFSA is now looking to build an Australian English-specific model that is based on an “open-source transformer, base foundation model” to help with its vector search functions. 

He explained the team will look at crowdsource transcriptions and use this data to “fine tune” its model.

“We have partnerships with our indigenous partners because there are many languages that are at risk of being lost forever and there's wisdom in those languages.”

He added this too will be open-sourced and are also “working towards doing Named Entity Recognition [NER] using spaCey and gliNER models.” 

“We're also hoping to visualise all of that using Nomic Atlas, it lets you visualise your embeddings using this beautiful layout, so you can see what you don't know that you don't know,” Qamar said. 

Got a news tip for our journalists? Share it with us anonymously here.
© Digital Nation
Tags:

Most Read Articles

Telstra eyes AI multi-agent systems for its processes

Telstra eyes AI multi-agent systems for its processes

Westpac pilots AI to analyse inbound call content

Westpac pilots AI to analyse inbound call content

King & Wood Mallesons Australia to give Gen AI tool to 1200 lawyers

King & Wood Mallesons Australia to give Gen AI tool to 1200 lawyers

ANZ explores agentic AI opportunities

ANZ explores agentic AI opportunities

Log In

  |  Forgot your password?