The Data Science Skills Network

As a data scientist, I am usually heads down in numbers, patterns, and code, but as crazy as it sounds, one of the hardest parts of my job is actually describing what I do. There are plenty of resources that offer descriptions and guides on the career of a data scientist. I've heard them described as those at the intersection of statistics, hacking abilities, and domain expertise. Or, as data analysts who live in San Francisco.

Rather than add a new definition to the collection, I thought I’d take a data-centric approach towards defining the role. I looked at what skills people with the title "Data Scientist" have listed on their LinkedIn profiles and aggregated the top ten by occurrence*.

*Corrected using a measure called TFIDF

While this list sheds some light on what skills are most frequently included on the profiles of data scientists, it's difficult to understand how they relate to each other when we’re just looking at a stagnant ranking. To dig a bit deeper, I explored the relationships among these skills by representing and visualizing them as a network. A’la, the Data Science Skill Network (High Res Image):

In the network, each node is a skill. Skills are connected when both are listed together in a profile, with the connection growing stronger the more often they are listed together. Since the goal was to visualize the relationships between skills, I clustered similar skills together, represented by colors. Next, skills were sized depending on how well connected they were, and to what extent they influenced other skills in the network, using a measure called network centrality. While there are plenty of conclusions to be drawn, both figures highlight a few key themes. Namely, that today’s data scientists typically:

Approach data with a mathematical mindset

  • We see that machine learning, data mining, data analysis and statistics are all highly ranking skills in the network. This indicates that being able to understand and represent data mathematically, with statistical intuition, is a key skill for data scientists.

Use a common language to access, explore and model data

  • Python, R, and Matlab are the three most popular languages for visualization and model development and SQL is the most common for data access. When it comes to data, extracting, exploring, and testing hypotheses is a big part of the job, so it’s no surprise to see these skills rising to the top.

Develop strong computer science and software engineering backgrounds

  • We also see computer science and software engineering skillsets, with Java, C++, Algorithms, and Hadoop all having notable real estate on the Network visualization. These are skills that are primarily used to leverage data to architect systems.

In my experience, most data scientists will not be experts in all of these categories (math, tools, and software development), but, instead, specialize or hone their skills in one or two of them. These are, therefore, a more holistic view of the skills represented within a typical data science team.

I hope this helped to shed some light on what a data scientist is, and what skills are required to become one. These analyses are all pulled from the skills you list on your LinkedIn profile so hopefully it is also a reminder for you to keep your profile up to date.

Thank you, and I’d be interested in hearing your thoughts below.

Roland Craeye

Digital Transformation Leader | Empowering E-Commerce & Content Creators with Amazon Influencer Insights | Expertise in Management, Sales & Recruiting | Proficient in Agile, Scrum, & Kanban | Dual US & EU Citizenship

8mo

I loved how you pulled this all together and visually represented it for a lay person to understand. Also, great meeting you at the Internet Marketing party. I definitely want to learn more about what you are doing with trends in real-time. Game changing!

Like
Reply
Vishvapalsinhji Parmar

Researcher | Data Science | Data Engineering | Lifelong Learner

5y

It's worth reading. Especially the way you represented the graph.  Superb..

Like
Reply
Lauren Delapenha

English Teacher at Greenwich Academy

7y

I'm just discovering this article, and it's still so useful even two years after it was written! Thank you for putting this together, Ferris! I especially appreciate the visualization, as I find many articles on this topic that seem to go off of people's opinions of what a data scientist actually is/does. But I'd be curious to know if anything has changed in these skill sets since you wrote it, especially since this field is evolving and growing so quickly. Any chance of an update? :)

Danny Ma

Helping 1M+ people pivot into data roles 📊

7y

Thanks for the research and awesome visualisation!

Like
Reply
Priyanka Singh

Content Management at iDiva.com | Mensa

7y

Thanks for sharing your view point on Data Scientist skills. Just a simple question since data scientist is blooming career all over the world, and there are some institutes training people who have no knowledge of statistics or technical background, could you please throw some light on how such people can become pro-efficient in statistics and with numbers which may help them to become a good data scientist.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics