British Library begins web harvest
Published 05/04/2013 | 00:46
The British Library will begin to preserve the digital age for future generations when new regulations come into force on Saturday.
It aims to "harvest" the entire UK web domain to document current events and record the country's burgeoning collection of online cultural and intellectual works.
Billions of web pages, blogs and e-books will now be amassed along with the books, magazines and newspapers which have been stored for several centuries. The library could eventually collect copies of every public tweet or Facebook page in the British web domain.
Lucie Burgess, leading the project at the British Library, said the unprecedented operation would provide a complete snapshot of life in the 21st century which increasingly plays out online.
She said: "If you want a picture of what life is like today in the UK you have to look at the web. We have already lost a lot of material, particularly around events such as the 7/7 London bombings or the 2008 financial crisis. That material has fallen into the digital black hole of the 21st century because we haven't been able to capture it. Most of that material has already been lost or taken down. The social media reaction has gone."
The operation to "capture the digital universe" will begin with an automatic "web harvest" of an initial 4.8 million websites - or one billion web pages - from the UK domain, she said. This will start on Saturday and is expected to take three months. It will then take another two months to process the data.
Until now the British Library could only preserve a relatively small number of websites. The 2003 Legal Deposit Library Act paved the way for the information to be stored but copyright laws forced the library to seek permission each time it wanted to collect web content.
Under the new regulations - which extend to the Bodleian Library in Oxford, Cambridge University Library, the National Library of Scotland, the National Library of Wales and Trinity College Library in Dublin - it has the right to receive a copy of every UK electronic publication.
Roly Keating, chief executive of the British Library, said: "The regulations now coming into force make digital legal deposit a reality, and ensure that the Legal Deposit Libraries themselves are able to evolve - collecting, preserving and providing long-term access to the profusion of cultural and intellectual content appearing online or in other digital formats."
Culture minister Ed Vaizey said: "Legal deposit arrangements remain vitally important. Preserving and maintaining a record of everything that has been published provides a priceless resource for the researchers of today and the future. So it's right that these long-standing arrangements have now been brought up to date for the 21st century, covering the UK's digital publications for the first time."