What it takes to be a Data Scientist?

Posted by Mario Wijaya on August 12, 2018

When I told people that my job title is a Data Scientist, they usually pause for a couple moments and ask what exactly is a Data Scientist? The simplest answer that I can give is Data Science is a combination of math, computer science, and engineering . For math and computer science, usually we have a problem statement (what we would like to solve), the approach (how we solve the problem), and finally the implementation (deployment using certain software such as Python and R) . A typical problem that is solved in data science is a Classification task (eg. Predict if the picture is a cat or a dog, etc) or Regression task (Predict the sales of the store on a particular day based on hour, weather, location of store).

Now you must be thinking, what about the engineering part? In order to solve problem, a Data Scientist needs to be knowledgeable on the subject that he/she is working on. For example: If you are working in a Supply Chain company, you must understand how the Supply Chain works (terminologies and concepts). It will make you a better problem solver if you understand the fundamentals of what you are working on.

To reiterate, here are what it takes to be a Data Scientist:

  • Math
    • Optimization (Foundation of Machine Learning is Optimization)
    • Proofing: This is vital if you want to know how math formula is generated
  • Computer Science
    • Data cleaning
    • Visualization
    • Good coding practice
  • Engineering: Approach the problem as if you are an Engineer (You will save a lot of time if you think through the problem before solving it)
  • Curiosity to learn and always update yourself with new skill and novelty approach (algorithm, modeling technique, etc) to solve problems

Where Do I Learn to Become a Data Scientist?

There are several paths one can take to become a Data Scientist:

  • Go to a university that has an Analytics related degree or Computer Science with focus on Machine Learning
  • Self-taught via MOOC (Massive Open Online Course): Coursera, edX to name a few

    The most important part is PRACTICE, PRACTICE, AND MORE PRACTICE! This is very relevant especially on coding side of things. For someone who codes regularly and then takes a month or two months break for vacation, he/she will forget a lot of syntax for that particular programming language after not coding for couple months. I highly recommend working on a lot of interesting problems on Kaggle. You can learn from the notebook (coding and algorithm approach) posted by other users.

    What is/are My Favorite Programming Language?

    You must be rolling your eyes because there are 2 groups: R > Python and Python > R. There are a lot of debates out there on which is better. My take on it is to use whichever language that you feel the most comfortable. I used both languages interchangeably because there are packages available in R but not in Python and vice versa. Also, I recommend everyone to learn SQL (Structured Query Language) because you will spend 60% of your time pulling the data that will be important for analysis as well as data cleaning.

    What is the Job Market Like?

    I believe the job market is best explained in this article written by PwC. The job market is hot and in high demand. However, it will become more competitive as the job market matures. According to PwC, US average annual salary for Data Scientist is $94,576. From my personal experience, most companies look for someone with a PhD degree or Masters degree with 3-5 years experience for Data Scientist position. However, there are companies that would omit that requirement if you have proven that you are capable.

    Conclusion

    At the end of the day, you have to pursue the job that will make you happy. I hope this small write up can help you to see what it takes to be a Data Scientist. It takes a lot of passion and hard work to be a good Data Scientist but it is quite satisfying if you can use data to come up with cool insight. Last but not least, remember this quote by W. Edwards Deming.

    In God we trust. All others must bring data.

    Feel free to reach out on LinkedIn or Email if you have further questions.