Page 1 of 8

Data Scientist:

The Sexiest Job of the 21st Century

Meet the people who

can coax treasure out of

messy, unstructured data.

by Thomas H. Davenport

and D.J. Patil

Artwork Tamar Cohen, Andrew J Buboltz

2011, silk screen on a page from a high school

yearbook, 8.5" x 12" Spotlight

hen Jonathan Goldman ar- rived for work in June 2006

at LinkedIn, the business

networking site, the place still

felt like a start-up. The com- pany had just under 8 million

accounts, and the number was

growing quickly as existing mem- bers invited their friends and col- leagues to join. But users weren’t

seeking out connections with the people who were already on the site

at the rate executives had expected. Something was apparently miss- ing in the social experience. As one LinkedIn manager put it, “It was

like arriving at a conference reception and realizing you don’t know

anyone. So you just stand in the corner sipping your drink—and you

probably leave early.”

70 Harvard Business Review October 2012

Spotlight on Big Data

Page 2 of 8

Data Scientist:

The Sexiest Job of the 21st Century

Page 3 of 8

SPOTLIGHT ON BIG DATA

Goldman, a PhD in physics from Stanford, was

intrigued by the linking he did see going on and by

the richness of the user pro les. It all made for messy

data and unwieldy analysis, but as he began explor- ing people’s connections, he started to see possi- bilities. He began forming theories, testing hunches,

and finding patterns that allowed him to predict

whose networks a given profile would land in. He

could imagine that new features capitalizing on the

heuristics he was developing might provide value to

users. But Linked In’s engineering team, caught up in

the challenges of scaling up the site, seemed unin- terested. Some colleagues were openly dismissive of

Goldman’s ideas. Why would users need LinkedIn to

gure out their networks for them? The site already

had an address book importer that could pull in all a

member’s connections.

Luckily, Reid Ho man, LinkedIn’s cofounder and

CEO at the time (now its executive chairman), had

faith in the power of analytics because of his experi- ences at PayPal, and he had granted Goldman a high

degree of autonomy. For one thing, he had given

Goldman a way to circumvent the traditional prod- uct release cycle by publishing small modules in the

form of ads on the site’s most popular pages.

Through one such module, Goldman started to

test what would happen if you presented users with

names of people they hadn’t yet connected with but

seemed likely to know—for example, people who

and Sue, there’s a good chance that Larry and Sue

know each other. Goldman and his team also got the

action required to respond to a suggestion down to

one click.

It didn’t take long for LinkedIn’s top managers

to recognize a good idea and make it a standard fea- ture. That’s when things really took o . “People You

May Know” ads achieved a click-through rate 30%

higher than the rate obtained by other prompts to

visit more pages on the site. They generated mil- lions of new page views. Thanks to this one feature,

Linked In’s growth trajectory shifted signi cantly

upward.

A New Breed

Goldman is a good example of a new key player in

organizations: the “data scientist.” It’s a high-ranking

professional with the training and curiosity to make

discoveries in the world of big data. The title has been

around for only a few years. (It was coined in 2008

by one of us, D.J. Patil, and Je Hammerbacher, then

the respective leads of data and analytics e orts at

LinkedIn and Facebook.) But thousands of data sci- entists are already working at both start-ups and well- established companies. Their sudden appearance on

the business scene re ects the fact that companies

are now wrestling with information that comes in

varieties and volumes never encountered before. If

your organization stores multiple petabytes of data, if

the information most critical to your business resides

in forms other than rows and columns of numbers, or

if answering your biggest question would involve a

“mashup” of several analytical e orts, you’ve got a big

data opportunity.

Much of the current enthusiasm for big data fo- cuses on technologies that make taming it possible,

including Hadoop (the most widely used framework

for distributed le system processing) and related

open-source tools, cloud computing, and data visu- alization. While those are important breakthroughs,

at least as important are the people with the skill set

(and the mind-set) to put them to good use. On this

front, demand has raced ahead of supply. Indeed,

the shortage of data scientists is becoming a serious

constraint in some sectors. Greylock Partners, an

early-stage venture rm that has backed companies

such as Facebook, LinkedIn, Palo Alto Networks, and

Workday, is worried enough about the tight labor

pool that it has built its own specialized recruiting

team to channel talent to businesses in its portfolio.

“Once they have data,” says Dan Portillo, who leads

SPOTLIGHT ON BIG DATA

Goldman, a PhD in physics from Stanford, was

intrigued by the linking he did see going on and by

the richness of the user pro les. It all made for messy

data and unwieldy analysis, but as he began explor- The shortage of data

scientists is becoming

a serious constraint in

some sectors.

had shared their tenures at schools and workplaces.

He did this by ginning up a custom ad that displayed

the three best new matches for each user based

on the background entered in his or her LinkedIn

pro le. Within days it was obvious that something

remarkable was taking place. The click-through

rate on those ads was the highest ever seen. Gold- man continued to re ne how the suggestions were

generated, incorporating networking ideas such as

“triangle closing”—the notion that if you know Larry

72 Harvard Business Review October 2012