Ask naturally

Since the days when Ask Jeeves entered the search engine foray with its know-it-all butler, Internet users have gravitated to the concept of asking queries in everyday language. In practice, for all but the simplest questions, they’ve been disappointed by the results.

Now, a new generation of startups believes it has the answer to the difficult question. Taking advantage of lower computing costs, a receptive funding climate and the promise of voluminous ad revenue, companies such as Powerset, Hakia and TextDigger are creating search engines that offer an alternative to standard keyword queries.

Developers of so-called natural language search technologies—which focus on gleaning meaning from the relationships between words—have raised more than $30 million over the last four months in early stage funding. Entrepreneurs and backers say they see opportunities in searchers’ frustrations with the limitations of “keywordese.”

Back to nature

“The initial promise of Ask Jeeves back in ’96 really hit a nerve,” says Charles Moldow, a general partner at Foundation Capital and a Powerset board member. “People definitely felt more comfortable doing their searches as natural language queries. It just didn’t work that well.” Moldow predicts 2007 will be “the year of natural language search.”

The most signficant difference between 1996 and today is the availability of low-cost computing power, which has made natural language search feasible. “The underlying technology trends mean that you can afford to spend more memory and processing power on text,” says Oren Etzioni, a search expert and venture partner at Madrona Venture Group.

The initial promise of Ask Jeeves back in ’96 really hit a nerve.”

Charles Moldow, General Partner, Foundation Capital

The steep reduction in the cost of data processing has been instrumental in enabling the underlying technology used by Hakia, says Melek Pulatkonak, chief operating officer of the New York-based startup. Hakia uses a technology called QDEX, or query detection and extraction, to analyze a Web page and figure out what questions could be extracted from it, essentially anticipating in advance the answer to a searcher’s query. Low-cost processing power is critical because, “You have to deliver in milliseconds an answer, and that time limit makes it difficult to deliver natural language search,” Pulatkonak says.

Another impetus behind natural language search is the increasing complexity of search engine queries, Pulatkonak says. No longer is a simple search on “aspirin” sufficient for a headache sufferer. Nowadays, she says, “They want to know: Does aspirin cure migraines?” (That’s something the beta version of Hakia did not directly answer.)

TextDigger CEO Tim Muskgrove sees room for a natural language-inspired approach to “pesky, difficult queries.” In a pitch to raise funding for his startup, Muskgrove relayed his difficulties using keyword search to locate a hotel room in San Francisco with a view of the Golden Gate Bridge. When he entered the same query in, his company’s beta-stage search engine, the site requested clarifications for words like “view” to ensure it was looking for a room with “a visual perspective on a scene” and not “an opinion.”

Prep work

Powerset CEO Barney Pell sees a vast opportunity addressing a central limitation of today’s leading search engines: their almost exclusive focus on content-bearing terms, commonly ignoring prepositions like “by,” “for,” “about,” “of,” and “in” that shed light on the relationships between words. Pell compares keyword search to communicating at a basic level in a foreign language in which one is unable to express complex thoughts. Once a search engine can focus on the underlying grammar of a query, in addition to main content-bearing words, Pell predicts, “We are going to look back five to 10 years from now and say: ‘Remember when we used to search using keywords?’”

History isn’t much of a guide in evaluating such forecasts. While past trials of natural language search haven’t been a raging success, they haven’t been a disaster, either. Ask Jeeves, which now goes by the name, raised $34 million in venture capital in the late 1990s and generated a market capitalization of $1.5 billion when it went public in 1999.

We are going to look back five to 10 years from now and say: ‘Remember when we used to search using keywords?’

Barney Pell, CEO, Powerset CEO

Although the days of skyrocketing search engine IPOs are long past, the query methodology the pioneered remains popular for limited uses. Etzioni says the question-and-answer approach has long functioned well for factoid questions, such as “What’s the capital of Kansas?” or “What time is it in Bangalore?”

Better answers

However, the technique has never really worked for complex questions. Try asking something that requires analysis, Etzioni suggests, such as “Which nanotechnology companies on the West Coast are currently hiring?” Query results are certain to fall short.

The problem isn’t just primitive search technology, Etzioni says, but the way online content is organized. People know to look to a Wikipedia entry, for example, to find an answer to a question. But that information is included in a much longer entry. Getting an immediate answer requires isolating only the most relevant information from a long scroll of text. That’s something computers haven’t mastered.

Still, daunting as the challenges may be, Etzioni concedes the rewards will be phenomenal for a functional natural language search engine. With the way online advertising revenue is growing, even a search site with modest market share can deliver a sizeable return on investment.

By 2011, worldwide spending on ads related to key words in searches will exceed $40 billion, up from $14 billion in 2006, according to Piper Jaffray Analyst Safa Rashtchy.

The underlying technology trends mean that you can afford to spend more memory and processing power on text.”

Oren Etzioni, Venture Partner, Madrona Venture Group

The sharp rise in ad-sponsored search revenue hasn’t escaped the notice of venture capitalists or holders of search-related intellectual property. In February, Powerset signed a licensing deal with Xerox’s Palo Alto Reseach Center to commercialize search technology based on natural language processing.

Since late last year, VCs and angel investors have funded more than a dozen general-purpose and vertical search sites, including Farecast, an airfare forecasting website, Spock Networks, a site for finding people, and ChaCha, a human-powered search tool. Over the past year, venture investments in companies developing search engines and search-related technologies have exceeded $200 million, according to Thomson Financial.

The G word

What sets natural language search apart, says Moldow, is its potential to spawn a viable rival to Google. He compares current conditions to the state of search 10 years ago, when Google first came on the scene. Then, like now, dominant search sites were focused on diversifying into other areas of online media. Google succeeded by focusing on delivering a better search experience.

Of course, it’s not as if leading search engines haven’t thought about the potential of natural language search themselves. Google, Microsoft and Yahoo all actively recruit staffers with expertise in natural language processing technology, Etzioni says. Their search engines are also capable of processing simple queries posed as questions, although results for complex questions tend to disappoint.

So far, Madrona has passed on natural language search plays, although Etzioni says the Seattle-based fund might have been “tempted” by Powerset if the startup were located in the Pacific Northwest. That said, he sees no reason why somebody, eventually, won’t get it to work. “Why can’t search be like Star Trek?” asks Etzioni. “I’m sitting at the bus stop and I think of a question. Why not just pick up my cell phone, ask my question and move on?”