The World Wide Web is perhaps the largest source of information in
a wide range of areas such as science and culture, news and
entertainment, social communities, and so on. It is not only rich in
content, but more importantly, evolves at a rapid pace -- more than
50-80% of content is estimated to be changing within a year. This
evolution results in the loss of much of digitally born content,
ultimately leading to the loss of knowledge. Organizations such as Internet
Archive, have taken first steps towards preventing this loss of
content by archiving large parts of the Web.
In this talk, we focus on the next big challenge -- to search and discover
knowledge buried in archive-contents and the evolutionary history of
the Web. The first part of the talk will be about scalable searching of
archives, presenting FluxCapacitor, a system we have built
for efficient time-travel search. The second part of the talk
will outline the latest set of research problems we are pursuing in
the areas of archive gathering and their mining.