The Inconvenient Truth About Data Science
- Data is never clean.
- You will spend most of your time cleaning and preparing data.
- 95% of tasks do not require deep learning.
- In 90% of cases generalized linear regression will do the trick.
- Big Data is just a tool.
- You should embrace the Bayesian approach.
- No one cares how you did it.
- Academia and business are two different worlds.
- Presentation is key - be a master of Power Point.
- All models are false, but some are useful.
- There is no fully automated Data Science. You need to get your hands dirty.
photo cc-by Moyan Brenn (https://www.flickr.com/photos/aigle_dore/)
Group Chief Data Officer | Strategic Data Management and Analytics | Data-Driven Digital Transformation | AI and Machine Learning | Data Science
4yI love these points! Anyone who says otherwise probably doesn’t know what they are talking about, and there are many of them. I agree the most with points 1, 2, 4, 8, 9, and 11.
Independent Researcher at n/a - between jobs - who wants me? I want to work!
5yStart-up Vertical Data uses a mix of linguistics and statistical methods, viz. Semantic Web SUBJECT PREDICATE OBJECT. Kullback Leibler for document classification + POS-tagging, Hypernyms (WordNet)+ SVO gets > 90% precision. We have more methods under development. https://www.linkedin.com/pulse/entailment-hypernyms-semantic-web-technique-joined-nlp-vanderwilt/
World-Class Strategies & Transformational Technologies for Motivated SMEs.
5yAll of these are so true. I personally push 3 and 5 the most.
Data Scientist Praedicat
6yThank you for posting this. I am very partial to #3 and #6. I keep hoping that simpler models and particularly Bayesian methods will become more widely adopted. Unfortunately, it appears that deep neural networks have become the starting point for many data scientists.
Database Specialist at Amazon Web Services (AWS)
7yI like #7 - Just show the insights!