What if you could use data science in the way everyone advertises it to be?
Hello, and welcome to this course on Practical Data Science: A guide for new-age business owners. I’m Pragyan, a data scientist and a business owner at The Click Reader.
In this course, I will share the insights I’ve gained working as a data-strategy consultant for fortune-5000 companies as well as local SMEs. This course will consist of things that are most crucial for a business to be data empowered and this involves topics such as:
- How can you start looking at data science as an art form?
- How can you use the five V’s of data to keep track of data quality in part of the data collection process?
- How does the law of attraction play a role in Exploratory Data Analysis? and,
- Why you may want to pursue data modeling as a business owner?
Most of the topics taught in this course are not to be found in any other courses. Therefore, this course will certainly give you a new mindset as a business owner. So, if you’re excited to gain all of this knowledge, please feel free to join me in the course and let’s start learning.
Part 1: From Information to Inspiration
Let us start this course off with an introduction to data scientists since moving forward, you will need to think like one.
Decades ago, the term ‘scientist’ used to give us an image of a white-haired researcher pacing around his/her laboratory with a chalkboard full of mathematical equations, that looked like a cauldron of calculus. A century of innovations later, it may not be incorrect to say that the term has lost its grandeur. The status of a scientist is now easier to reach than it was ever before.
To lead the society of today, we do not need a laboratory equipped with an electron microscope or a collection of dirty flasks bubbling with chemicals in it. All we need is data. So, a scientist who studies data intensively is a data scientist and these people understand the language of numbers, including me.
Okay, now that you have a picture of who data scientists are, let us talk a little about what they do.
Most people say that data scientists convert data into information. However, from my view point, they convert information into inspiration. Think of it in this way, if a musician is presented with an object that can be used to play musical notes, he/she can convert those notes into a chord or a pattern and create sounds that are music to the ears.
Similarly, if a data scientist is presented with valuable data, then he/she can take that data and convert it into a revenue-generating being or an experience elevating algorithm. Netflix’s Recommender System, Tesla’s self-driving car, Google’s extremely useful search engine are all such works of inspiration.
So, if you really want to change your business into a full-fledge data driven business, you should start looking at data from an artist’s perspective and not limit yourself to just information retrieval.
Tremendous business growth happens when information is converted into inspirational work.
Part 2: The Five V’s of Data
‘Garbage in, Garbage out’ is one of the most used phrases in computer science.
Interpreting this in terms of data science, it means that if you provide faulty data as an input to even a properly built system, you will still get faulty outputs. So, how you collect and store business-related data is as important as analyzing or modeling the data.
Therefore, in this lesson we will be talking about the five V’s of data you should keep in check to maintain data quality.
The five V’s of data are Volume, Velocity, Variety, Veracity and Value of data. Let us talk about them one by one.
The first V is Volume. It refers to the amount of data that you should be collecting for your business. Looking at this V helps you understand whether you have the required amount of data or not for implementing data science strategies. However, more data typically doesn’t mean better results but we’ll get to that in a second.
The second V is Velocity. Velocity refers to the amount of time in which data can flow through your systems. It helps you understand if your business decisions can be made in real time or not.
So, next time you plan on hoarding multiple Excel files in a hard-drive, think of how tiresome it maybe to generate reports and share those reports across your team members. Investing in a proper database management system is a good step for keeping data velocity in check.
The third V, Variety refers to the variety of sources from which you collect data. It helps you understand if your business has the necessary data or not for implementing a business strategy.
Think about this, can you build a customer support chatbot with financial transaction data? Obviously not. You’ll need a transcript of how your current customer support staffs are answering customer queries to build such chatbots. This is a good example of how Variety in data collection is important.
The fourth V, Veracity refers to the quality of the data. Looking at it helps you understand if your data has missing values, duplicated values and so on. Keep in mind that when the data is dirty, it takes twice as much time to do whatever you intend to do with the data.
Therefore, to save your team the painful hours of manually cleaning the data, always make sure to have a programmatical script in place that automatically checks the veracity of data and fixes common errors each time your business goes on a data collection spree.
And now lastly, the fifth and final V is Value. It refers to the fact that whether or not your data is actually valuable. Until and unless your collected data is valuable, it cannot be changed into information nor can it be changed into inspiration.
So, when you look at all of the five V’s of data together, you will form a clear picture of what kind of data you should be collecting and how you should be storing it.
Part 3: Exploratory Data Analysis (EDA) and the Law of Attraction
Once you’ve checked the five V’s of data, the next most common step is to perform Exploratory Data Analysis on the data you’ve collected.
By definition, Exploratory Data Analysis (EDA) is an approach for data analysis that makes use of various analytical and graphical techniques to:
- Better understand the data
- Extract important variables for data modelling
- Detect outliers and anomalies
- Generate and test a single or multiple hypothesis about the data
There are multiple resources on learning how to perform EDA programmatically on the internet for free. Therefore, I do not want to go in that direction for this lesson. Instead, I want to talk about a concept that largely affects the effectiveness of EDA.
It is called the law of attraction.
The law of attraction is the belief that the universe creates and provides for you that which your thoughts are focused on. So, interpreting this law in terms of data science, you can say that the analytical outcomes of your EDA process are a result of the questions you focus on. Therefore, it is really important to ask good questions.
But, what is a good question? A good question is any sort of question that is clear, concise, purposeful and most importantly, free from bias. So, let us understand this with an example.
Consider that you have a dataset consisting of customer reviews for a product of your business. What are the questions you can ask the dataset in order to get inspirational work done? Pause, take a moment and try to come up with as many questions as you can.
Here’s a question that I would have liked to answer given the dataset – “How many people like and dislike the product and what are the top 5 product characteristics that influence such reviews?”
If we study this question, you can see that I’m not only focusing on the people who like my product but I’m removing any biasness by paying equal attention to people who like as well as dislike my product. Another takeaway here is that I’m trying to feed myself information relating to the top 5 product characteristics influencing the reviews so I can improve them or innovate them.
The goal of my question in short is to help me get from information to inspiration. So, let your questions help you do the inspirational work you aspire to do during the EDA process.
“Ask questions, the data will confess.”
Part 4: Modeling your Business Data
For most businesses, their data science journey usually ends up on Exploratory Data Analysis.
From there onwards, different business departments pick up on the analyzed insights, work on them and deliver results. These results get tracked and are further analyzed to improve efficiency. The cycle continues till the business finds that EDA has become a fundamental process of how they operate.
They become a data-driven business.
But, there are a few businesses who actually move past the analytical stage and continue their data science journey towards data modeling.
By definition, data modeling is the process of creating an abstract mathematical model that organizes elements of data and finds relationships between them. As an example, if you need to accurately forecast your sales number, you will need a data model that represents the relationship between your sales volume and how it fluctuates in regards to the passage of time.
If done correctly, data models can help tremendously in improving organizational efficiency, elevating customer experience and for generating more revenue. Therefore, in this lesson I’ll try my best to help you find the right reasons for pursuing the time, budget and expertise needed in building data models (for large projects).
One major reason for pursuing the development of data models as a business owner may be to compete better in an existing market through incremental improvement and that improvement should be at least 10 times better than the original solution.
For example, Amazon has a very powerful data model in place which is its product recommender system. The recommender system massively elevates customer experience and also brings in large chunks of revenue. Such recommender systems have now already been adopted by a lot of e-commerce platforms but there hasn’t been a bigger name than Amazon although a lot have tried.
On the other hand, a completely different reason for pursuing the development of data models as a business owner may be for innovating a new market.
For example, Tesla took a large amount of risk and started to build self-driving electric cars. These self-driving algorithms are a combination of multiple data models and since Tesla was successful in pulling it through, they now acquire a totally different market through their innovation.
So, based on your aspirations, you should only pursue the development of data models if you plan on incrementally improving a product, solution or experience by a magnitude of 10 or if you are planning to innovate a new market. This is not a generally accepted rule, but through my experience, any company who puts in the time and effort to build data models that do not meet these marks are typically not happy about the results.
“Inspiration leads to innovation.”
So that is it for this course.
Hopefully, by now, you’ve gained the confidence needed in approaching data science in a more practical way, as a business owner. And, if you’re just starting out in your data science journey I recommend you to check out www.theclickreader.com.
We’ve created a platform that contains a large array of courses which can help you in different aspects of your business. And, if you’re already far along in your programming journey, then you can check out our data science specialization courses on the platform as well. So, that is it for now, bye bye!
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!