Hubbry Logo
search
logo

Australian Web Archive

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Australian Web Archive

The Australian Web Archive (AWA) is a publicly available online database of archived Australian websites, hosted by the National Library of Australia (NLA) on its Trove platform, an online library database aggregator. It comprises the NLA's own PANDORA archive, the Australian Government Web Archive (AGWA) and the National Library of Australia's ".au" domain collections. Access is through a single interface in Trove, which is publicly available. The Australian Web Archive was created in March 2019, and is one of the biggest web archives in the world. Its purpose is to provide a resource for historians and researchers, now and into the future.

The PANDORA service started archiving websites in October 1996.

In 2005, the NLA started archiving annual snapshots of the entire Australian web domain (URLs with the suffix. ".au"), collected via large crawl harvests. Later, the earliest websites from the .au web domain, dating back to 1996, were obtained from the Internet Archive. In 2019 this content was first made publicly accessible through Trove.

The PANDORA infrastructure, which works well for a selective small scale archiving, does not adapt to large scale "bulk harvesting" of web content, so a new technical system had to be developed whereby a web archiving service which would integrate the delivery of archived websites within a live website interface delivering the archived websites seamlessly to the user, which is difficult to achieve technically.

Australian Government websites are Commonwealth records, and are therefore publications to be managed in accordance with the Archives Act 1983.

The Australian Government Web Archive (AGWA) consists of bulk archiving of Commonwealth Government websites. The NLA began regular harvests of the websites in June 2011, after a significant obstacle had been overcome with an administrative agreement made in May 2010 allowing the NLA to collect, preserve and make accessible government websites without having to seek prior permission for each website or document, as was the case before that. The service uses the Heritrix web crawler for harvesting, WARC files for storage and Open Wayback for delivery of the service. There is a huge amount of publishing by the government, but many challenges to overcome trying to preserve content, such as its sudden disappearance. In March 2014, the AGWA was made publicly accessible.

The AGWA meets the preservation and retention requirements for websites as "retain as national archives" (RNA) material under the Archives Act; however videos and document files ( such as PDFs or Word documents) are not always captured, so must be managed separately.

As of early 2015, the AGWA included content dating from 2005, which amounted to about 144 million files occupying 15 terabytes. It only included Commonwealth Government websites collected through bulk harvests of nearly 1000 seed URLs. The scheduling of the harvests was not yet routinely established, but harvests were being conducted roughly three times per year.

See all
User Avatar
No comments yet.