Page 1 of 8
Data Scientist:
The Sexiest Job of the 21st Century
Meet the people who
can coax treasure out of
messy, unstructured data.
by Thomas H. Davenport
and D.J. Patil
Artwork Tamar Cohen, Andrew J Buboltz
2011, silk screen on a page from a high school
yearbook, 8.5" x 12" Spotlight
hen Jonathan Goldman ar- rived for work in June 2006
at LinkedIn, the business
networking site, the place still
felt like a start-up. The com- pany had just under 8 million
accounts, and the number was
growing quickly as existing mem- bers invited their friends and col- leagues to join. But users weren’t
seeking out connections with the people who were already on the site
at the rate executives had expected. Something was apparently miss- ing in the social experience. As one LinkedIn manager put it, “It was
like arriving at a conference reception and realizing you don’t know
anyone. So you just stand in the corner sipping your drink—and you
probably leave early.”
70 Harvard Business Review October 2012
Spotlight on Big Data
Page 2 of 8
Data Scientist:
The Sexiest Job of the 21st Century
Page 3 of 8
SPOTLIGHT ON BIG DATA
Goldman, a PhD in physics from Stanford, was
intrigued by the linking he did see going on and by
the richness of the user pro les. It all made for messy
data and unwieldy analysis, but as he began explor- ing people’s connections, he started to see possi- bilities. He began forming theories, testing hunches,
and finding patterns that allowed him to predict
whose networks a given profile would land in. He
could imagine that new features capitalizing on the
heuristics he was developing might provide value to
users. But Linked In’s engineering team, caught up in
the challenges of scaling up the site, seemed unin- terested. Some colleagues were openly dismissive of
Goldman’s ideas. Why would users need LinkedIn to
gure out their networks for them? The site already
had an address book importer that could pull in all a
member’s connections.
Luckily, Reid Ho man, LinkedIn’s cofounder and
CEO at the time (now its executive chairman), had
faith in the power of analytics because of his experi- ences at PayPal, and he had granted Goldman a high
degree of autonomy. For one thing, he had given
Goldman a way to circumvent the traditional prod- uct release cycle by publishing small modules in the
form of ads on the site’s most popular pages.
Through one such module, Goldman started to
test what would happen if you presented users with
names of people they hadn’t yet connected with but
seemed likely to know—for example, people who
and Sue, there’s a good chance that Larry and Sue
know each other. Goldman and his team also got the
action required to respond to a suggestion down to
one click.
It didn’t take long for LinkedIn’s top managers
to recognize a good idea and make it a standard fea- ture. That’s when things really took o . “People You
May Know” ads achieved a click-through rate 30%
higher than the rate obtained by other prompts to
visit more pages on the site. They generated mil- lions of new page views. Thanks to this one feature,
Linked In’s growth trajectory shifted signi cantly
upward.
A New Breed
Goldman is a good example of a new key player in
organizations: the “data scientist.” It’s a high-ranking
professional with the training and curiosity to make
discoveries in the world of big data. The title has been
around for only a few years. (It was coined in 2008
by one of us, D.J. Patil, and Je Hammerbacher, then
the respective leads of data and analytics e orts at
LinkedIn and Facebook.) But thousands of data sci- entists are already working at both start-ups and well- established companies. Their sudden appearance on
the business scene re ects the fact that companies
are now wrestling with information that comes in
varieties and volumes never encountered before. If
your organization stores multiple petabytes of data, if
the information most critical to your business resides
in forms other than rows and columns of numbers, or
if answering your biggest question would involve a
“mashup” of several analytical e orts, you’ve got a big
data opportunity.
Much of the current enthusiasm for big data fo- cuses on technologies that make taming it possible,
including Hadoop (the most widely used framework
for distributed le system processing) and related
open-source tools, cloud computing, and data visu- alization. While those are important breakthroughs,
at least as important are the people with the skill set
(and the mind-set) to put them to good use. On this
front, demand has raced ahead of supply. Indeed,
the shortage of data scientists is becoming a serious
constraint in some sectors. Greylock Partners, an
early-stage venture rm that has backed companies
such as Facebook, LinkedIn, Palo Alto Networks, and
Workday, is worried enough about the tight labor
pool that it has built its own specialized recruiting
team to channel talent to businesses in its portfolio.
“Once they have data,” says Dan Portillo, who leads
SPOTLIGHT ON BIG DATA
Goldman, a PhD in physics from Stanford, was
intrigued by the linking he did see going on and by
the richness of the user pro les. It all made for messy
data and unwieldy analysis, but as he began explor- The shortage of data
scientists is becoming
a serious constraint in
some sectors.
had shared their tenures at schools and workplaces.
He did this by ginning up a custom ad that displayed
the three best new matches for each user based
on the background entered in his or her LinkedIn
pro le. Within days it was obvious that something
remarkable was taking place. The click-through
rate on those ads was the highest ever seen. Gold- man continued to re ne how the suggestions were
generated, incorporating networking ideas such as
“triangle closing”—the notion that if you know Larry
72 Harvard Business Review October 2012