The idea of librarians creating a “Reference Extract” for the Web — a “credibility engine” of linked Web pages based on how well they help answer users’ questions — has kinda sorta been tried before by Microsoft, who then proceeded to pretty much bungle it.
When I worked as an editor at MSN Search in the mid-1990s, the main goal of our staff was to organize the content of the World Wide Web around the language used in the most commonly entered search terms on our site. The leaders and many members of this team were librarians, and we used library-like language to describe our work. We created synsets with disambiguators to group and describe these keywords, then assigned Web sites or individual Web pages to those synsets in the ranked order we wanted them to be returned to the user of the search engine.
For example, we may see a new term in our keyword logs one week: Saturn. First we’d need to determine whether that keyword was a part of a larger keyphrase that was already in the database. So maybe we already had a synset for “Sega Saturn,” a sadly defunct video game console system. If so, maybe we could add “Saturn” as a new keyword to that synset and call it a day? But does Saturn mean anything else? Of course. There’s Saturn (planet), Saturn (car manufacturer), and Saturn (Roman god). That’s three new synsets to create. Now we need to search the keyword logs for other search queries that equal each of these concepts. Does a search for “Saturn’s rings” get its own sysnet, or can that query be added to Saturn (planet)? And where do queries like “Saturn facts” or “Saturn information” go? Once the synset work is done, now we have to find websites that the editorial staff think best answer each of these queries. We would even write the descriptions for these sites so that when a visitor to MSN Search enters the search term “Saturn,” they receive a lovingly hand-crafted set of the best search results back in return.
In hindsight this approach has many flaws (Microsoft didn’t lose the search wars for nuthin’!). A team of 20 human editors can only tackle so much. We did not open the process up to other professionals — other librarians — to expand the number of credible contributors. We also tended to focus on new search terms and the most popular terms, so that once a search term was included in a synset it would not usually be revisited. And — crucially, I think — Microsoft did not promote the fact that there were reliable, credible human beings doing this work. In fact, they hid it. PR materials would describe MSN Search as being powered by “SmartSense.” What was SmartSense, you ask? We were SmartSense! The 20 librarians, indexers, and editors in the editorial suite out in Redmond! We even had t-shirts made: “Ask me: I have SmartSense.”
As a technology company, Microsoft was proudest of its technology solutions: its crawler, its results engine, its throughput. And of course, the real goals of the service were more about selling banner advertising and sponsored links, and driving users to other MSN resources. But having started down the path of creating a credible, human-powered system for finding the connections between the information people were looking for on the Web, Microsoft did not have the SmartSense to see it.
–lori