Richard Feynman, the great Physicist, has said that in order to understand a subject, understand its fundamentals to a level that you can explain it to a 10-year old kid.

Now, if you look back at how you were taught Mathematics or any other subject for that matter in pre-school, you were taught starting from the absolute building blocks. Digits, numbers, counting, arithmetic operations and so on. So, a longer path was taken to reach a particular level of understanding.

Do you see what I am getting at?

Look at the picture below. If you have to go from A…


You must be aware why are we being advised to stay at least 6 ft. away from other people whenever we go out? It’s being said that the closer we get to a COVID positive patient, the higher is the chance for us to get the disease as well.

The K-Nearest Neighbours algorithm works on a similar principle. It assumes that an entity acquires the characteristics of k of its nearest neighbours.

That is why it is rightly said, ‘You become like the 5 people you spend the most time with.’ Here, we take the value of k is considered…


How Hypothesis generation can help you understand your data in a quicker manner and speed up your modelling process?

One who knows how to improvise and can deal with all kinds of situations is a winner, right? Similarly, if a Data Scientist knows how to adapt and improvise in various situations then he can solve any kind of business problems using Data!

Let’s say you have a problem in hand but you have strict deadlines and there is not enough time to understand it. To be honest there is never enough time to understand your data. But, it is essential to understand the problem well. …


How can Data Science, besides helping you professionally, improve the quality of decisions you make daily?

Data Scientist as a term sounds so intimidating, right? Like someone who invents data-related stuff to serve humanity. No! I mean Yes, obviously some people have done their doctorates in Statistics or Computing and who constantly work hard to come up with algorithms and all. But, do you know that ‘Data Science’ is a very generalized term which has been defined in the 21ˢᵗ century but has been out there since forever.

In fact, in the year 1962, John W. Tukey wrote in his book “The Future of Data Analysis”:

“For a long time, I thought I was a statistician…


Understanding the problem of Overfitting in Decision Trees and solving it by Minimal Cost-Complexity Pruning using Scikit-Learn in Python

Decision Tree is one of the most intuitive and effective tools present in a Data Scientist’s toolkit. It has an inverted tree-like structure which was once used only in Decision Analysis but is now a brilliant Machine Learning Algorithm as well, especially when we have a Classification problem in our hand.

They are well-known for their capability to capture the patterns in the data. But, excess of everything is harmful, right? Decision Trees are infamous as they can cling too much to the data they’re trained on. …


Guidelines to make your Bar Charts appealing and how to do that using Python

Python, being an open-source language, has a variety of libraries which enable us to do tasks within seconds which would otherwise take days to code and hours to compute.

People who use Python as a Data Science tool are well aware of Matplotlib and Seaborn. Plotting a graph using a single line of code is one of the specialities of these libraries.

If you want to plot to just understand your data, then these one-liners would come handy. …

Sarthak Arora

A Data Apprentice, trying to learn how to solve problems, preferably using Data. Visit www.github.com/iasarthak to see what I have done till now.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store