Venture’s emerging data gap

Venture capitalists have poured billions into machine learning and big data startups over the past decade, hoping to remake industries from healthcare to financial services to transportation.

Remaking venture might be next on the list.

An early phalanx of investors has begun harnessing the power of advanced data analytics and so far their feedback seems positive. A warning to the uninitiated: It’s more difficult than you might think.

What these pioneers are building are systems that spiral in size over the proprietary in-house databases most firms employ today, some with trillions of data points and millions of companies. On top of these large data sets they place machine-learning, or artificial intelligence, algorithms to surface patterns and insights that humans alone cannot find.

The benefits appear to be enhanced deal flow, quicker diligence, better portfolio support and market insight that might otherwise be missed. Their efforts could be the start of a new divide in venture’s crowded, competitive market place between firms devoting considerable time and money to data analytics and those that aren’t.

But it comes with a warning. Data analytics is not a quick study that firms can hope to master in a few short months, or even years. Cleaning data and training systems are big jobs requiring significant expertise and persistence, with cleaning data alone accounting for up to 90 percent of the work.

“I don’t believe AI will replace venture capitalists,” said Paul Arnold, founder of early mover Switch Ventures. But “I think it will supercharge them.”

The systems most typically focus on deal sourcing and due diligence, spitting out lists of companies to contact. Increasingly, they appear to show promise in LP communications and investment support, identifying potential portfolio company customers and partners.

Putting wind in the sails of advocates are several key transformations in data analytics. One is that compute power has become readily available and less expensive from cloud providers, such as Amazon Web Services. Second, large, granular data sets have become more available on the internet and from proprietary sources, giving ML algorithms reams of data to pore through.

Finally, the combination of the previous two have brought well-established machine-learning algorithms more punch, heightening their ability to surface patterns and insights from the detailed data. This “clustering,” or pattern recognition, not just saves time by eliminating manual analysis, but improves as humans interact with the data, in effect training and honing systems to heighten their value.

Among the firms pushing hardest are SignalFire, EQT Ventures, Social Capital, Switch and Hone Capital. Sequoia Capital also appears to be ramping up its efforts with the addition of chief data scientist Chandra Narayanan in September. Narayanan declined to discuss the firm’s strategy.

Correlation Ventures made waves a half a decade ago with a related quantitative model.

“I think it will be a core part of a lot of funds,” Arnold said. “I can see up close how well it works.”

Switch’s system, for instance, tracks and ranks companies on their ability to outperform and identifies ones ripe for meetings. It also picks out early teams and calculates which will do well.

After three years of revisions, it draws data on people from about a dozen data sources.

The benefit is “we’re not fishing in the ocean,” said Arnold, who is migrating his effort to Google from Amazon. “We’re fishing in a pond stocked with trophy fish.”

Still, it is easy to miss bias in the data, he said. Building models and cleaning the data are hard, and keeping up has meant bringing onboard a full-time person working with the models and a part-time person working on data collection.

Also on the vanguard is SignalFire, with a system that tracks millions of companies as well as millions of people from the startup ecosystem, including engineers and sales staff. The system makes use of machine learning as well as big-data techniques, and has given jobs to eight engineers. There were months during the firm’s first fund when Amazon ate up more than half of management fees, said Chris Farmer, a managing director.

The system monitors performance indicators and flags companies when they over perform. It also understands that tracking an enterprise company requires monitoring different data than tracking a consumer internet company.

VC Venture VCJ Data
Chris Farmer, managing director and CEO, SignalFire. Photo courtesy of the firm.

“Our goal is to predict the present, not the future,” Farmer said.

Farmer argues that firms need to take their data systems beyond lead generation and filtering to get the most out of them. Systems are “doomed for failure” if they don’t address a broad range of tasks, he said.

For instance, Farmer said the majority of SignalFire’s data efforts are focused on portfolio support. That means delivering value through recruiting, lead generation, providing competitive intelligence, benchmarking against peers and permitting data collaboration across the portfolio.

“You’ve got to build this the way Google would build this,” he added.

Farmer said as a result he is winning deals at a higher rate than competitors, and that his follow-on funding rate is in the top 1 percent of the industry.

“We seem to be getting on base a lot,” he said. “We are winning deals at a high rate.”

Another firm blazing a path is EQT Ventures, the venture arm of the Swedish private-equity firm EQT. It opened its doors two years ago and built a machine-learning system, Motherbrain, as it created the firm from scratch. The system collects data from proprietary and public sources as it tracks millions of companies. This includes web traffic, but also investment data, founder background and information on company performance.

The aim is to identify companies the investment team will want to see at precisely the moment they want to see them. Then it helps with due diligence and company support, while monitoring market trends.

Analytics Partner Henrik Landgren said it has identified deals the firm would not have found on its own, though partners make investment decisions.

Because of the work necessary to get the system up and running, Landgren said he considers the firm “half VC and half startup. To get this machinery to work is not easy.”

By running analytics ordinarily done by hand, it frees the team to pursue the human side of the business, such as building relationships. It also identifies patterns in big data sets much better than humans, generating not just information to support a partner’s hypothesis, but observations team members wouldn’t have come up with on their own.

“This wasn’t possible 10 years ago.” Landgren said. “For me it’s a bit hard to see this direction would not be the future.”

Israeli firm Viola also recently kicked off a major initiative and believes it is reaping the benefits. The aim is to track personnel data within the Israeli startup ecosystem, charting not just backgrounds but where entrepreneurs go and what businesses they create. The data has surfaced a map of startup clusters that wasn’t obvious ahead of time, uncovering a migration of people from cybersecurity to fintech and even to specific subsectors of fintech.

“It’s a very clear drift,” said Daniel Tsiddon, founder of Viola FinTech.

The data are clear enough to enabled investors to draw conclusions about whether a particular migration to a fintech subsector makes sense, Tsiddon added. That means partners don’t have to rely solely on founder interviews to judge whether an entrepreneur will be successful.

“You make decisions based on more information,” he said

Investors also can test business assertions. An entrepreneur’s claim that product development can be completed in 1.5 years can be crosschecked with the experience of other companies in similar industries.

The firm also draws on the data to generate reports for LPs and to handle LP requests.

Tsiddon said the data project made sense as the firm, which has 17 investors and 100 portfolio companies, expanded. To justify the work for just a handful of investors might be hard.

Another firm with a unique approach is Veronica Wu’s Hone Capital, which has access to AngelList and other data sources. Her data set includes 30,000 seed deals from the past decade.

Venture VCJ Data
Veronica Wu, managing partner, Hone Capital. Photo courtesy of the firm.

“My algorithms are not my secret,” Wu said. “It’s my ability to get the data.”

The system scores companies on their likelihood to a raise an A round and Wu uses it to guide deal selection. She said its accuracy is high at 84 percent. The result is a follow-on-funding hit rate of 42 percent for her 2016 seed deals and a greater than 62 percent hit rate for her 2015 seed deals.

“It’s a good early indicator” that the system works, she said. “Nobody has ever been able to do this at this scale.”

A second module is under development to identify companies expected to break out. The aim is to help her decide where to double-down. It will look at revenue, traction and other factors.

She says her system’s observations aren’t always intuitive. For instance, a company with two founders has double the chance of raising an A round compared with a company with one, she said. Founders from different schools similarly are more successful than those from the same school. The different backgrounds bring together different perspectives.

A firm with a different strategy is Off The Grid Ventures, which uses machine-learning algorithms to generate personality profiles of founders. It then scores the quality of teams to screen companies and assist in due diligence.

“We’re playing the moneyball of venture capital,” said General Partner David Mes. “You’re basically fishing in a much better pond.”

The system scrapes data from the web on background, education and cultural identity, which includes social media posts and speeches.

Mes said if he is looking at a company and not ready to invest, a good founders score might lead him to take a second look. Or if a score isn’t strong, he might examine whether a specific hire will strengthen the team.

Wildcat Venture Partners’ predictive analytics system also looks into teams to gauge their expected success. It examines variables, such as where people went to school, how they did in school, their startup experience and if they worked for a big company, then scores them. It is especially useful where Wildcat doesn’t know the team.

“There is a lot of data about these people,” said Bill Ericson, a founding partner. “It is clean data.”

But he cautioned true machine learning, or artificial intelligence, systems may still be off in the future.

“I don’t think we’ve seen an effective machine-learning approach,” he said.

Given that venture is long-horizon business with exits sometimes taking 10 years, it’s no surprise doubt remains. What’s likely is value is beginning to accrue for those willing to work at it. But none have produced a substitute for human decision-making.

“The judgment and work required to identify outliers is still a lot of art, and not only science,” said Joe Horowitz, a managing general partner at Icon Ventures.

And yet, they won’t go away. So getting to know them might be worth the investment.