Tree and Web

Scott B. Weingart

From 1994-2014, Yahoo! Directory provided order to the web's chaos. Two Stanford grad students, Jerry Yang and David Filo, dreamt it up as a way to keep track of the ever-growing World Wide Web, which over the course of 1994 grew from about 700 websites to 10,000.¹ It was a labor of love, in its early years cataloging almost half of all existing websites.²

To denizens of the Wikipedia Generation, Jerry and David's organization scheme may seem unwieldy. They gave every website a specific home deeply nested in a hierarchical category. The New York Times website, for example, could be found by clicking through these successive links: 'Business and Economy > Companies > News > New York Times Company'. The website for the U.S. Patent office could be found by navigating through 'Government > Executive Branch > Departments and Agencies > Department of Commerce > United States Patent and Trademark Office'. A very precise lid for every pot.

You can imagine how unwieldy this must have been to manage by June 2009, when Yahoo! Directory reached 3,068,086 websites. Try finding anything in that tangled tree. And remember, that's how you had to find a site, since early search engines were almost entirely unreliable.

To their credit, Yahoo! ensured the same site could be reached via many paths. Stanford University, for example, could be found in either 'Regional > Countries > United States > Education > Colleges and Universities' or 'Education > Higher Education > Colleges and Universities > United States'. It's not that Stanford was separately listed in both sections of the hierarchy, but that the second path simply redirected a user to the first path.

A limited view of Yahoo! Directory's structure from 1996, showing how certain links in the hierarchy redirected to other areas.

When the web was young and relatively unpopulated, this organizational scheme launched Yahoo! into fast fame. Unfortunately, the hand-curated nested tree approach couldn't keep up with the 2.5 million websites that appeared between 1994 and 1998.⁴ Not only were people complaining that it took forever for their site to get listed by Yahoo! because of the backlog; the very structure itself did not match the way people produced and browsed the web.

Consider why Wikipedia is so successful. Two features stand out: the ability for anyone to contribute, and the deep interlinking between encyclopedia entries. Browsing Wikipedia feels fundamentally different than browsing the Yahoo! Directory. You start reading an entry on orangutans, and before you know it you're reading about natural gas (Orangutans > Tanjung Puting National Park > Indonesia > Natural gas). A million paths spread in every direction, unconstrained by any hierarchy. This is how the web works: anyone can link to anyone else, and you can browse through any path you choose.

While hierarchical trees are good for creating easy-to-browse interfaces, they fail to organize the web because the web is not inherently hierarchical. It's a chaotic, anything-goes mess, a network of interlinking pages without any inherent order. It is, in short, a web.

Which is why, four years and one million catalogued websites after Stanford grads Jerry Yang and David Filo released Yahoo!, the time was ripe for something better. More suited to the shape of web. That something turned out to be Google, founded by the slightly younger Stanford grads Larry Page and Sergey Brin. The universe loves parallels, I suppose.

You wouldn't know it from the mostly-blank page that greets you at Google.com, but underlying the now-ubiquitous search engine is a network exactly as nonhierarchical and intertwingled⁵ as the web itself. Google's clever move was to let the World Wide Web dictate its own terms. Instead of Yahoo's hierarchy, Google maintained a database of how every website it could find linked to every other website. They relied on this web of interconnected links to determine which sites were good enough to show users in search results.

It worked like this: sites receiving the most links are considered better. Since "better" sites probably know what's what, the sites they in turn link to are also considered better. If a thousand pages all link to Wikipedia, Google ranks Wikipedia more highly. Since Wikipedia is ranked highly, when it links to Amazon, Google ranks Amazon highly as well, even if Amazon is linked to by very few other sites. Quality links to quality, so the reasoning goes, thus Google figures highly ranked sites ought to confer their status to those they link to. Since Amazon is now highly ranked, its own links to other sites must be worthy of merit. This continues forever.

Google's algorithm, called PageRank, roughly approximates the likelihood someone will reach a particular website if they clicked random links all day. You're more likely to stumble across a site with a high PageRank than one with a low PageRank. Google used this rank to determine the order of search results. Someone searching for "newspaper" should see the New York Times first, if the newspaper has the highest PageRank of its competitors. And what determined the New York Times PageRank was its structural location within the network of links comprising the World Wide Web.

The strategy was so successful that everyone forgot other search engines existed. Earlier search engines returned sites that were often spam, untrustworthy, or worse; Google's search results were almost magically reliable in comparison. By mid-2000, Yahoo!, still struggling to hand-curate the web, wound up changing their website to run off of Google's search results. Anyone searching Yahoo.com between 2000 and 2004 were actually using Google.

Yahoo! was once the most popular website in the world.⁶ Two decades later, it quietly shuttered the hierarchical directory that made it famous.⁷ Other multi-million-site web directories still exist and grow, but do so in quiet obscurity. Business decisions and market forces certainly played a role in Yahoo!'s eventual negative $3 billion valuation,⁸ but much of its downfall can be attributed to the simple fact that Google's network-driven organization of the web was simply easier to navigate than Yahoo!'s ever-growing tree.

The fall and rise of these two giants is a millennium-long story writ small. Over the last thousand years, the Judeo-Christian Greco-Roman world's fascination with trees as the metaphor to order the world slowly shifted to networks. The changeover was more impactful than you might expect. It is woven into Aristotelianism, Christianity, and postmodernism, to our concepts of privacy and power, to the way librarians organize the world and philosophers think about thinking. The new metaphor quietly influences scientific innovation and corporate development, and even shapes the way we write and argue points.

Trees and networks are geometries of thought, and the rest of this book is about why we think in them and what they do to thinking.

Gray, Matthew. “Web Growth Summary.” Web Growth Summary. ://stuff.mit.edu/people/mkgray/net/web-growth-summary.html, 1996.↩
McCullough, Brian. “On the 20th Anniversary – The History of Yahoo’s Founding.” Internet History Podcast, March 2015.↩
Callery, Anne. “Yahoo ! Cataloging the Web.” Untangling the Web. ://misc.library.ucsb.edu/untangle/callery.html, 1996.↩
Internet Live Stats. “Total Number of Websites.” Internet Live Stats. ://www.internetlivestats.com/total-number-of-websites/, May 2016.↩
Morville, Peter. Intertwingled: Information Changes Everything. Semantic Studios, 2014.↩
BBC Journalist (no byline). “Yahoo Still First Portal Call.” BBC, June 1998.↩
Rossiter, Jay. “Progress Report: Continued Product Focus.” Blog. Yahoo! ://yahoo.tumblr.com/post/98474044364/progress-report-continued-product-focus, September 2014.↩
Somaney, Jay. “Yahoo Is Undervalued, Even If It’s Worth Zero.” Forbes, March 2016.↩