The Ever-Expanding Job of Preserving the Internet's Backpages
A quarter of a century after it began collecting web pages, the Internet Archive is adapting to new challenges. From a report: Within the walls of a beautiful former church in San Francisco's Richmond district, racks of computer servers hum and blink with activity. They contain the internet. Well, a very large amount of it. The Internet Archive, a non-profit, has been collecting web pages since 1996 for its famed and beloved Wayback Machine. In 1997, the collection amounted to 2 terabytes of data. Colossal back then, you could fit it on a $50 thumb drive now. Today, the archive's founder Brewster Kahle tells me, the project is on the brink of surpassing 100 petabytes -- approximately 50,000 times larger than in 1997. It contains more than 700bn web pages. The work isn't getting any easier. Websites today are highly dynamic, changing with every refresh. Walled gardens like Facebook are a source of great frustration to Kahle, who worries that much of the political activity that has taken place on the platform could be lost to history if not properly captured. In the name of privacy and security, Facebook (and others) make scraping difficult.
from Slashdot https://ift.tt/inwg4mZ
Read more of this story at Slashdot.
from Slashdot https://ift.tt/inwg4mZ
0 Response to "The Ever-Expanding Job of Preserving the Internet's Backpages"
Post a Comment