Data Analytics Full Course 2026 | Data Analytics Tutorial | Data Analytics Course | Simplilearn
Chapters29
An introductory overview of a comprehensive data analytics course, outlining core concepts, tools, and the goal of enabling learners to analyze data, ask the right questions, and communicate insights through reports and dashboards.
A practical, hands-on data analytics deep dive (Excel, SQL, Python, R, PowerBI) with real-world use cases, plus career prep and job-ready projects.
Summary
Simplilearn’s Data Analytics Full Course 2026 is a comprehensive, career-focused program designed to take you from fundamentals to advanced analytics skills. The course blends theory with practice, depth with breadth, and covers essential tools like Excel for analytics, SQL for data handling, Python and R for data manipulation, and PowerBI for visualization. You’ll learn data cleaning, preparation, exploratory data analysis (EDA), descriptive/predictive/prescriptive analytics, and how to communicate insights to stakeholders. The curriculum includes real-world use cases across finance, healthcare, marketing, logistics, and city planning, plus a thorough look at data science workflow, data warehousing vs. data lakes, and the five data analytics process steps (collection, preparation, exploration, modeling, interpretation). There’s a strong emphasis on practical skills: hands-on Python notebooks, R Studio exercises, Excel dashboards, PowerBI dashboards, and SQL querying—from simple selects to complex joins, aggregations, and window functions. The program culminates in portfolio-ready projects, interview prep, and guidance on landing data roles, with SimlpleLearn’s cohorts offering university partnerships and multiple certifications. Expect project-based learning (EDA, dashboards, KPIs, and predictive modeling), plus career-readiness modules and interview strategies that cover real-world analytics scenarios and presentable storytelling for stakeholders.
Key Takeaways
- Excel remains a foundational platform for data analytics: pivot tables, data cleaning, Power Query, and Excel-based dashboards are complemented by modern BI tools.
- SQL fundamentals are reinforced with real-world join types (inner, left/right, full), grouping, having vs. where, and practical examples like department-employee relationships and sales analytics.
- Python and R are taught through project-based notebooks (EDA, data preparation, modeling, visualization) and paired with visualization libraries (Matplotlib, Seaborn) and data-science workflows.
- Power BI is introduced through a hands-on path: data import, model creation, visualizations, slicers, and dashboard interactivity, including data-driven storytelling for executives.
- The course emphasizes data ethics and governance concepts like data quality, data lineage, validation, and the critical ACID properties (atomicity, consistency, isolation, durability) in data environments.
- Projects span real business contexts (fraud analysis, healthcare diagnostics, delivery/logistics optimization, targeted marketing) to build a tangible portfolio for interviews and job applications.
- Career readiness modules provide interview tips, resume guidance, and cohort-based job placement and university-partner certifications to accelerate job outcomes.
Who Is This For?
Essential viewing for aspiring data analysts, BI developers, and data scientists who want hands-on tool mastery (Excel, SQL, Python, R, Power BI) and a strong portfolio for data-driven roles. Great for professionals switching to data analytics or expanding their BI toolkit with modern practices and real-world case studies.
Notable Quotes
"“data analytics focuses on examining data to uncover patterns, trends, and insights that support decision making.”"
—Opening definition of data analytics and its purpose.
"“Excel stands as one of the most trusted tools for data analysis, even in the age of AI and automation.”"
—Emphasizes Excel’s enduring relevance as a starting point for analytics.
"“Descriptive analytics is about what happened; predictive is about what will happen next; prescriptive tells us what to do.”"
—Core taxonomy of analytics types highlighted in the course.
"“Power BI dashboards are interactive, with slicers and cross-filtering that tie visuals together for storytelling.”"
—Power BI module and dashboard interactivity details.
"“ACID—atomicity, consistency, isolation, durability—are the backbone of reliable databases.”"
—Foundational database concepts covered in data modeling and SQL.
Questions This Video Answers
- What is the best order to learn data analytics tools (Excel, SQL, Python/R) for a bootcamp?
- How can I build a portfolio using Excel dashboards and Power BI dashboards for a data analytics interview?
- What are descriptive, predictive, and prescriptive analytics, and how do they differ in business applications?
- How do you join tables in SQL (inner, left, right, full) and when should you use each?
- What is the role of data cleaning in analytics and what are practical techniques to handle missing values and inconsistencies?
Data Analytics Full Course 2026SimplilearnData Analytics FundamentalsExcel for Data AnalyticsSQL for Data AnalyticsPython for Data AnalyticsR for Data AnalyticsData Cleaning and PreparationEDAData Visualization (Power BI, Tableau, Plotting Libraries)
Full Transcript
Hey everyone, today we bring you the data analytics full course, a comprehensive learning program designed to take you from data analytics fundamentals to advanced analytical skills used in real world business environments. Now, as we know, data analytics focuses on examining data to uncover patterns, trends, and insights that support decision making. It plays a critical role across domains such as business, marketing, finance, operations and technology helping organizations measure performance and optimize outcomes. So in this course we will start with the basics understanding data analytics concept and tools and also gradually moving on to advanced topic such as data cleaning, visualization and analytical problem solving.
You will learn not just how to analyze data, but how to ask the right questions and communicate insights effectively to stakeholders. And by the end of this course, you will be confident working with data, performing analysis, creating reports and dashboards, and applying analytics techniques to solve real business problems. Having said that, let's take a look at the agenda for this course. We will start off with introduction to data analytics. Then we'll talk about data fundamentals. Moving on to Excel for data analytics, SQL for data analytics. We'll also learn about data cleaning and preparation. Exploratory data analysis.
We'll see how data visualization works and also learn about reporting. We'll also show you how to use Python for data analytics, statistics, and advanced analytics concept. Then we'll talk about some real world data analytics use cases. And at last, we'll have a look at career readiness and interview preparation. Hope I made myself clear with that agenda. Now that said, if these are the type of videos you would like to watch, then hit the subscribe button and hit the bell icon to get notified whenever we host. Also, just that you know, if you want to skill yourself, master data science and data analytics skill and land your dream job or growing your career, then you must explore SimplyLearn's cohort of various data science and data analytics programs.
Simple learn offers a variety of masters certifications and post-graduate programs in collaboration with some of the world's leading universities like IT Gojarti and many more. Through our courses, you will gain business and leadership knowledge along with work ready expertise in skills like Python, Tableau, PowerBI, Generative AI and over a dozen others. And that's not all. You'll also get the opportunity to work on multiple projects led by industry experts working on top tier data and product companies. After completing these courses, thousands of learners have transitioned into data science or data analytics role as a fresher or moved on to higher paying job and profile.
So if you're passionate about making your career in this field, then make sure to check out the link in the pin comments and the description box below. So what are you waiting for? Hurry up and enroll now. Now before we get started, here's a quick small quiz question for you. Which step in the data analytics process focuses on identifying patterns, trends, and relationships in data? Your options are data collection, data cleaning, exploratory data analysis, EDA, data deployment. Let us know your answers in the comment section below. And the two steps are analyze the data and then make decisions based on the data.
Applications of data analytics. Now the sky's is the limit on this. In today's world, almost every business act of life, your music on your Spotify are driven by data analytics. But some of the big players when you go in there job hunting are going to be your fraud analysis. Uh if you want to go make a lot of money and you're good at it and you like dealing with numbers, uh go join the banks and track down the criminals who are stealing money. It's a lot of, you know, it's a big thing to protect credit cards, protect uh sales purchases, bad checks, any of those things when you can track them down is huge.
Healthc care exploding. Uh there is everything from trying to find cures for uh the co virus or any of the viruses out there. Uh using your cell phone to diagnose different ailments. Uh that way you don't have to go in and see the doctor. you can actually just go in there and take a picture of the funky growth on your arm. Hopefully, it's not too big. And then they send it in there and the data analytics goes in there and looks at it and says, "Oh, this is what this is. This is a professional you need to go see or don't need to see." And that's just one aspect of healthcare.
uh the databases uh being generated by health care and getting the right doctors and helping the doctors analyze whether something is uh benign or malignant if it's cancerous all those things are now part of the ongoing healthc care growth in data analytics inventory management think one of those huge warehouses where they're shipping out all the goods how do you inventory that in such a way so that uh you maximize the stuff that's being purchased the most near the entrance and all the other stuff towards the back or even pre- ship it. Uh so it's huge to be able to inventory the and manage your inventory and pretty soon they'll just have a drone come in there and start picking up some of those boxes and move them around.
Also, delivery logistics. Again, this goes from uh getting from point A to point B. Uh you can combine it with our inventory. So you pre- ship stuff if you know a certain area is more likely to purchase it. How do you get it the delivery to the most destinations the quickest in the short amount of time? And then they even pre-stack the trucks going out. And that's all done with data analytics. How do we stack all that stuff so it comes out in the right order? Targeted marketing, huge industry, any kind of marketing, whether you're generating uh the right content for the marketing, who are you targeting with that marketing, researching the people, what they want, so you know what products to market out there.
All those things are huge. And these are just a few examples. You can probably go way beyond this from tracking forest fires to astrology and studying the stars. All of this is part of data analytics now and plays a huge role in all these different areas. Uh city planning is another one. You know, you can see a nice organized city like this one where you can get in and out of the neighborhoods. If you're a firet truck, uh police officers need to be able to get in and out. You want your tourists to be able to come in, yet you still want the place to look nice and you have the right commercial development, the right industrial development, like enough residence for people to stay.
All those things are part of your city planning. Again, huge in data analytics. So, sky's the limit on what you use it for. Let's take a look at types of data analytics. And this can be broken up in so many ways. Uh but we're going to start with looking at the most basic questions that you're going to be asking in data analytics. And the first one is you want descriptive analytics. What has happened? Hindsight. Uh how many cells per call ratio coming out of the call center? If we have 500 tourists in a forest and you have a certain temperature, how many fires were started?
How many times did the police have to show up to certain houses? Um all that's descriptive. The next one is predictive. Predictive analytics is what will happen next. We want to predict. Uh this is great if you want have a ice cream store and you want to predict how many people to work at the ice cream store in a certain day based on the temperature coming up in the time of the year. And then one of the biggest growing and most important parts of the industry is now prescriptive analytics. And you can think of that as combining the first two.
We have descriptive and we have predictive. Then you get predcriptive analytics. How can we make it happen? Foresight. What can we change to make this work better? In all the industries we looked at before, we can start asking questions. Uh especially in city development. There's a good one. If we want to have our city generate more income and we want that income to be commercialbased, uh, what kind of commercial buildings do we need to build in that area that are going to bring people over? Do we need huge warehouse sales, Costco sales buildings, or do we need little mom pod joints that are going to bring in uh people from the country to come shop there?
Or do you want an industrial setup? What do you need to bring that ind industry in there? Is there a car industry available in that area? uh if it's not a car industry, what other industries are in that area? All those things are prescriptive. We're guessing. We're guessing what can we do to fix it? What can we do to fix crime in area with education? What kind of education are we going to use to help people understand what's going on so that we lower the rate of crime and we help our communities grow better? That's all prescriptive.
It's all guessing. We want foresight into how can we make it happen? How can we make this better? And we really can't not go into enough detail on these three because a lot of people stumble on this when they come in and are doing analytics. Whether you're the manager, shareholder, or the data scientist coming in, you really need to understand the descriptive analytics where you're studying the total units of furniture sold and the profit that was made in the past. Uh here we go into predictive analytics, predicting the total units that would sell and the profit we can expect in the future.
gear up for how many employees we need, how much money we're going to make, and prescriptive analytics, finding ways to improve the sales and the profit so we can uh sell maybe a different kind of furniture. Uh we're going to guess at what the area is looking for and how that marketing is going to change. Data analytics process steps. So, let's take a look at some of the basic processing and what that looks like when you're working with this data. So, there's five basic steps. uh the five steps of processing and and this changes and then there's a lot of things that go on when they talk about um agile programming.
The whole concept of agile is you take some kind of framework like this and then you build on it depending on what your business needs. So the first step is data collection. And usually with a large company, you might have somebody who uh is responsible for the database management. Um you might have another one where they're pulling APIs and they're pulling data off of uh maybe the Census Bureau. Uh maybe something very very um specific uh domain specific. So if you're analyzing cancerous growths and how to understand them, then the data collection is going to be those measurements they take from the MRI or it might be even the MRI images, they've used those also.
Uh so there's a lot of things with data collection and how to control that and make sure it has uh what you need and is clean and you don't have misinformation coming in. Uh once you have the data collected, there's a data preparation. Uh so stage two is we take that data and we format it into something we can use. Probably one of the biggest formats that you see is when you're processing text. How do you process text? Well, you use what they call a one hot encoder and each word is represented uh by a yes no kind of setup.
So it' be like a long array of bits. Um that's one way to prepare it. And so you know bit number one is 'the' bit number two is has or whatever it is. Other preparations might be if you're using neural networks you might be taking integers or float numbers and converting them to a value between zero and one. That way you don't have one of them creating a bias in there. Uh so there's a lot of different things that go into data preparation. That is 80% of data science. So when we talk about the data analytics which is a little bit more on the math side and they usually say talk about a data scientist kind of being the overall preparer of this stuff you're going to spend 80% of your data preparation data exploration uh that's the fun part this is where you're exploring things uh and it is maybe 10 to 15% of what you do with the data you spend with the data exploration it is probably uh the most important step because this is where you got to start asking asking questions.
Uh if you ask your questions wrong, you're going to get some wrong information. If you're working with a company and they want to know the marketing values, then you really got to focus on, hey, how do we generate money for this company or fraud? How do we lower the fraud rate while still generating a profit? Four, data modeling. This is where we start actually getting into the data code, uh which model to use that predicts what's going to happen. Uh and then result interpretation. We want to be able to interpret those results. You usually see that in your mattplot library where you create nice, beautiful images so it shows up on their dashboard for the marketing manager or for the CEO so they can take a quick look and say, "Hey, I can see what's going on there." You want to reduce it to something they can easily read.
Uh they don't want to hear the scientific terms. They want to see something they can use. And we'll talk about that a little bit more when we start looking at some of this in a demo. So firstly let us understand why data analytics. Can somebody here tell me why data analytics and why is it used in organizations? So let us understand why data analytics and what is the use of data analytics. As we all know that data is growing exponentially year over year. It is collected and it is also available everywhere. Data is no more just available in structured format but it is also available in semistructured and unstructured format.
I'm sure you would have come across this term called as sentiment analysis pretty often. Right? What does that mean? Can we perform that in data analytics? Yes. Right? We use some natural language processing models try to identify what are the good reviews, what are the bad reviews spoken by the customer. Correct? So we try to classify the text based on the good, bad and the neutral. So that is what sentiment analysis about. So for that to perform that activity we have to also do some data analytics. Now that companies have realized the importance of these informations not just structured but also unstructured data format the companies have started utilizing these data to take some crucial business decisions which can boost their business and also which can increase the efficiency of the business.
So now that the raw data is accessible to the organizations, it becomes very important that the data is also stored. Well, I'm sure you all have pretty much heard about data warehouse. In the last few decades, the trend was mostly on the data warehouses, the business intelligence tools. So the data warehouse used to collect this data, pre-process the data and also filter the data and make it available in a structured format for further analysis. However, now that is not the scenario. A term coined as data lake is available and many firms are utilizing data lake because it is a central repository which stores the raw data in form of structured and unstructured data.
And now let us take a scenario. Let us choose one of the participants here Mark. So Mark has recently joined an organization as a data analyst or in essence a data scientist. The business connects with him and says that Mark we have a business problem for you and we expect you to provide us a data analytic solution. So Mark sits with the stakeholders and he listens very carefully to the business question. So the business question says that we have couple of products which are performing really well in the market and we see higher sales and some of the products are just not catching up in the market and we experience lower sales.
Can you help us identify what are the factors driving this higher sales and the lower sales? Now Mark has to think with the data scientist mindset and be prepared to ask them right questions. some questions such as do you have the price of all these products available in the database and can I also know what is the duration of data availability and also Mark can ask some question such as do you also have some features of these individual products already captured in your data base. So these are some of the interesting questions that makes sense to the business and the conversation continues which also means that data is not only information.
Data analysis is about unlocking insightful informations from this raw data. And hence data analysis plays an important role in discovering insightful information, asking questions or answering the right questions and also predicting the future or the unknowns. And to perform all these activities, we use data analytics. So are you all with me so far? Are we on the same page? Yes. Great. Now the question is what is data analytics? So can somebody tell me what is data analytics? So let us see what is data analytics. We understood why data analytics and the importance of it. But perform any activities there has to be a process right.
So data analytics is a process to extract meaningful insights from data. Now let us continue with the scenario of Mark. So Mark now understands the business problem and he has also started asking some relevant questions to the stakeholders and he has also got the answers in return. He may start thinking about what could be the suitable solution for this. He may have to perform some exploratory data analysis to unlock some hidden patterns to identify some correlation between the variables and to also know which are the key variables in the data set and he may also have to view the market trend which means he may have to see that how the sales has been performing across the years or across the months.
There might be some insightful information there. He may see that the sales has been growing exponentially for some of the products and some may be volatile. Some may have some seasonality pattern or a cyclical pattern etc. And also he may have to focus on the customer preferences via the customer reviews and do some sentiment analysis. So now let us understand what are those life cycles of data analytics. We will begin with the discovery phase. This is the first phase. Now that Mark has understood the business problem, he will also start focusing on identifying the resources that is the data resources.
Some may be internal data resources that is uh available within the firm. could be some transactional data and some external data sources maybe via web scrapping identifying and capturing some um competitor price on the products. After gathering all of these right data, Mark will focus on data preparation which is the next phase. Now Mark either individually or along with the team will start focusing on the data preparation which includes data wrangling which means cleansing the data imputing the records if there are any missing values or you know he may also go ahead by removing those records if they're not required and also doing some exploratory data analysis which can include some statistical analysis like Looking at the data distributions, understanding the summary of this data distribution at individual variable level, doing some biariate analysis and also trying to you know figure out which are the important variables that might be required for the model building phase.
After performing all of these EDA activities which also include some visualization, Mark will now sit with his team and try to identify the suitable models. The suitable models could be simple statistical techniques or it can also be some machine learning models. So let's say that Mark and his team has identified some five models that can provide the required result and out of these five models they will filter down and they will prioritize only three models. Now there are only three models that Mark and his team have finalized. After this they start focusing on model building activity.
Now for model building activity they have a data set already in place. So this data set will be split into training data set and test data set. It's not only training and test. We can also do some validation in between training and test. But here let's focus on training and test data set. He will separate 75% of the data as training and 25% of the data as test. Now if your question is that why perform this activity of splitting the training data set and test can't we just go with the one single data set? What happens if you just utilize one single data set?
Let us say that you have used the original data set and uh also executed this data in one of the selected model and you will also observe the accuracy. Let's say the accuracy return is about 98%. 98% is a very good accuracy percentage and you may also be overconfident because of this. That may be a case due to overfitting. Now what will happen when you add some new records into this data set and you rerun it? The executed model may not return you the same accuracy what you had seen. The accuracy might be 72%age. Now that's not fair, right?
To avoid these overfitting issues, we ensure that some new records are tested separately. So hence we locate 25% of the data to the test data set and then we predict this records the unknown records which is located in the test data set. we predict them and then we test the accuracy of training data set and the test data set and we make a comparison. Now let us say the accuracy result of training is 98% and the test is 97%. In this case we can say that the model is performing really well. However, while executing the model there are certain things that has to be considered.
for example, inclusion of the parameters, tuning the parameter which also will execute the optimal results. So this is very important. Now let's say after performing this model building the time comes to analyze the results. That is the next phase. Now the team will sit and analyze the result and they will notice that out of the three filtered models only two models are returning excellent accuracies. They will sit with the business team and they will also explain them the result and what are the activities that they have performed to obtain this result. Some of the stakeholders may be technically savvy, some of them may be non-technical people.
So it has to be very important that you also communicate these results accordingly. All right. Now that you have the results, you will also gauge them based on the business objective which was developed in phase one. Looking at the results of the two models. Now the business might select one of the model and say that okay this particular model seems to be returning some right information and it also appears valid to us. Let's go ahead with this one model. So finally the result and the model needs to be operationalized and that is where the team will start documenting the business problems the steps that were taken for executing the models and they will include all the codes and the findings and finally they will implement this model so that uh the business can view the results and also utilize them for strategic decision making at your firm.
home. So, are we good so far with the understanding of the life cycle? All right, great. Now, let us focus on the types of analytics. What are the types of analytics? Can somebody tell me which are the types of analytics you are aware of? Okay. Predictive analytics. Great. Descriptive analytics. Good. All right, good enough. Now let us focus on this example of Google maps. So as we look at this particular Google map, we understand that the blue color route is nothing but the route direction from Sacramento to Floren. And also we see a display of the duration estimated as well the distance to travel from Sacramento to Floren as well as we see another route here that is gray colored.
This is a substitute route or the connecting route just to avoid the traffic which is in orange within the blue color route. So the gray colored route as well shows us the estimated duration to travel as well the distance. Now let us understand what is descriptive analytics and why do we focus on this particular map example. As we understand the root map, the estimated duration as well the distance to travel via the blue color route as well the gray color route. This is one way of understanding descriptive analytics as in what is happening. But there is another way of understanding descriptive analytics that is by focusing on summarized past data.
And this is a descriptive analytics. We see that what had happened in the previous year. Now let us focus on predictive analytics. What is predictive analytics? This type of analytics looks into the historical and present data to make predictions of the future. What does this mean? So Google has already suggested the best route which is the blue color route to travel from Sacramento to Floren and the duration is 18 minutes and distance is 9.7 mi. Let's assume that Google map has already collected historical data of this particular route and based on the available data their model has predicted the best route and also the duration that will be taken to travel from Sacramento to Floren.
Now let us focus on prescriptive Prescriptive analytics describes the solution to a particular problem. What was the problem in this case on the predicted best route? Some predictions on the traffic congestions and that's when Google map recommends the substitute routes. Correct? Now these substitute routes are also prescribed by Google map as a recommendation. So prescriptive analytics is nothing but a solution and a recommendation provided for a problem. So in this case we have the best root and we also have other substitute truths. So let's quickly summarize. Descriptive analytics is about summarizing the past data or to see what is happening.
For example, in the Google map scenario, predictive analytics is about what would happen and prescriptive analytics is about prescribing the solution, the best solution and the recommended solutions. Now let us refer back to Mark's earlier scenario. Mark along with his team identified the best model. They also did some testing and they got identified the best results and they also finalized on one particular model based on that particular model. Now the results have to be provided in such a way that they are the best results and also the recommendations. Correct? So it may not be just one single solution.
it may be a solution with couple of other recommendations as well. So that is exactly what happens in the entire process of data analytics. I hope this has been clear so far. Yes. Okay. And now let us focus on benefits of using R. Why do companies extensively use R for data analysis and why is it chosen? Firstly, R is an open-source programming language, which means that there is no license required to work with R. And R does not require you to have a coding experience, which means that a non-technical person in your team can also learn Rasily and start coding or building models in few lines of codes.
R can also be used with other programming languages such as Java, C++, and Pythons. And the integration of R with other programming tools or BI tools is very simple and easy. And various statistical models are readily available in R. Which also means that there are plenty of inbuilt libraries and packages already available. And reporting the results of an analysis becomes easier by using these inbuilt packages and for creation of these models in just simple few lines code. With this understanding of the benefits of using R, let us quickly hop on to R Studio and start performing the hands-on exercise for data analysis.
For this exercise, we will use a data set named as demographics which is in a CSV file type. Firstly, let us load the data set to R studio and we will locate this in an variable named as demo. We also refer to this as a data frame. And now you will notice that a variable is created in an environment section which is in the bottom right hand side of the R studio window. And this particular variable comprises of 510 observations or records with eight variables. Let us simply expand this particular data frame and have a quick check on the data structure and understand the data types.
This particular data frame includes variables such as age, marital, income. The unit of income is dollar per day, education levels, the car price, car category with several levels, gender and retired status. Now let us view the top six records of this particular data set. For this, let's simply type head of demo. And the result is now visible in the console section. If you're interested to view all the records, then simply type view of demo. And this new window will show you every single record that is being loaded to our studio. You may also simply apply the filters and the filter section here on individual categorical variables.
Now that we have loaded the data set and also viewed individual records, let us focus on creating subsets of records by applying filters on individual variables or multiple variables. So firstly let us apply filter on gender. We will only retrieve the records with gender is equal to female. And we will locate these records in a variable named as demo 2. As you notice now in the environment section the second variable is also created that is demo. And this now is comprising of 250 observation which means the records are filtered down to only gender female. Next let us see how to apply a filter on income variable.
Let us only retrieve the records where income is greater than 100. Let's view the result. As we see here, all the records include income greater than 100. Now, let us modify this query and we will ensure that the retrieved records includes income greater than 100 and also specific variables are returned. Let's say we only want to have the first variable, third variable, and the seventh variable returned as the result. Let's have a quick check. So, we only have the first, third, and the seventh variable returned. How about we only exclude the variable 6 to 8?
For this we include a prefix of minus sign. And now let us see what is the result. We have the variables from first to the fifth variable. However, we don't have six, seventh and eighth variable. I hope it's clear so far. Yes. All right. Now let us see how can we apply condition by including both the variables that is gender and income and then we will filter the record and create a subset of data. Now let's view the result. Income is greater than 100 and the gender is only female. This is one way of creating subsets.
However, now let us see how to use the subset command and create the subset. Let's create a subset of records by applying filter on marital status and age. We'll only retrieve records where marital status is equal to married and age is greater than 35. Let's now view the result. So here we have the age greater than 35 and marital status is married. Let's use the same code and this time we will retrieve selected variables. Let's say variables ranging from 1 to 3. Let's have a quick check. So there are three variables. age is greater than 35 Now let us see how to structure the data by sorting the data frame in ascending and in descending order.
We will apply this order function on the variable income. Firstly let us see how to order income variable in ascending order. Let's do a quick check. And here we have income in ascending order. Now let's see how to modify the same code and view the records with income in descending order. So we have now the income in descending How about include two variables and sort the variables accordingly. Firstly, we will sort the records by ordering income and age in ascending Let's quickly view the result. We have income in ascending and age as well in ascending.
Let's now modify this code. This time we will order income in descending and age in ascending. Let us view the result. So the income is in descending and age is in ascending So hope this is clear on how to solve the data frame by ascending and descending order per variable or by using multiple variables. With this we will focus on learning statistical How to perform statistical analysis on individual variable or multiple variable? Let's start by understanding the data distribution of variable income so that we identify what is the minimum value of income, what is the maximum, what is the range, what is median, what is the mean and we will also focus on the quantile distribution which is also analyzed in a box plot.
What is the minimum value in the variable income? It's nine. And what is the maximum? So we have the maximum value. Now let us see what's the range. So the range shows you the result with minimum and the maximum value. How about the difference of maximum and the minimum? Now let us focus on other summaries of data distribution for this variable income. Let's identify what is the mean value of income. The mean is 78. Let's also understand what is the standard deviation. All right. So the standard deviation is $112. Let us see what is the variance.
The variance should be larger than the Now let's say what is the median absolute deviation. As you notice here the median absolute deviation value is lower than standard deviation. Why do we make this comparison? From this it is evident that median absolute deviation is robust to outliers and standard deviation is sensitive to outliers and also to the change in the mean value. Now let us understand the quantile distribution. This is the same analysis that is visualized in a box plot ranging from 0% to 100% identifying the individual data points. And we can also refer and compare this to the min, the max and the median values.
Let us quickly see what is the median value of income. As you notice here the median is 45 as well. The 50th% of quantile is 45 which means 0% is minimum and 100% is the maximum value. Now if your question is what is 25% and 75%. This is again used for identifying the range of interquartile. The interquartile range is nothing but the difference of 75% minus the 25%. Let's quickly see what is the IQR of income. The IQR of income is 58. Let us do a quick check. 75% of quantile is 86 and 25% of quantile is 28 and the value is 58 which is equal to the IQR result.
Now that we have focused on the statistical analysis of the individual variables data distribution, let us focus on the data visualization. In this we will have a pictorial representation of analysis to identify the outliers to see what is the minimum and where do we see the data densely populated and how is it scattered etc. We will begin with creating a histogram. Now histogram can be used for univaried analysis which means in this scenario we will consider income variable and we will see how the count of income ranges gets distributed in a histogram. For this we will have to install a package called as ggplot 2 and also call this library ggplot.
Let us install the package. And now let us call the library. All right. And we are ready now to begin with visualization. For this we will use the geometric object histogram on the data demo data frame. Let me expand this window so that the code is visible and also used an aesthetic mapping for variable income. This will be helpful for filling colors or filtrations etc. and only include 30 bins with individual bin size width of 100 which means there will be 100 incomes in individual bins. Let's quickly look at the distribution of this histogram. As you notice there are couple of outliers.
The counts of these income range are very limited. However, we see the densely populated income ranges with higher counts between 0 to 200 per day. This is also a way to identify and segment the customers based on their income ranges. Now, let us see how to change the color of this histogram and also the border of the histogram. For this we will include some additional options such as fill fill with blue color and the border color is black. Now as you notice here the executed code provides us the histogram with blue color bars and black color border lines.
Now we will focus on creating a faceted grid. Facet grid is also an aesthetic mapping object. We will see how to enable the multiple histograms across the marital status and the genders so that we identify how the income is distributed for individual marital status as well the genders. Let's zoom this view and have a look at it. As you notice here, there are some interesting outliers here in the data distribution. Female unmarried drawing higher income and male unmarried and married also drawing higher income as compared to the females. Whereas if you notice that the female unmarried is drawing much higher income than the male.
This may also be very much related to the age. Now let us see how to create a stacked histogram. When I say a stacked histogram, I mean instead of filling the color, we will fill the gender so that there is a stack within the histogram. So as you notice here I have made couple of changes. I have included fill equal to gender within the aesthetic mapping. Now let us look at this histogram. As you see here the gender is filled in the histogram. Hence we have stacked distribution of female and the male. Now let us focus on creating a bar chart with education versus income where we can identify the education levels and the income ranges for these education levels.
As you notice here, we are going to create a visualization where we have the aggregation in form of mean and the geometric object used here is bar plot. Now let's zoom this view and understand which education level have higher average income. So as we see here the blue color bar is the post undergraduate degree which means this education level draws higher average income as compared to other education levels. Now let's create a histogram where we will see car price and the number of cars for individual category. Let's look at this visualization. This visualization provides us some interesting insight just by looking at the distribution of the car prices and the counts of the cars at uh the car category economy and even the luxury.
Luxury car category or car price is pretty much distributed. Whereas economy car category is densed which means that we could also look back into the income and age variables and try to figure out further more insights and then segment the customers for further targeting of these customers. Now what happens if we simply change this bin width to 30. As you observe here, changing the bin width or increasing the bin width will also reduce the number of bins. Now we only have four bins here and the car category is filled. That is what we have enabled within the aesthetic mapping.
And we see some more interesting inside. As you look at the standard and the luxury car category, the car prices are pretty much overlapping for the car category, luxury and standard. This could be the starting car price of the luxury brands. Now let us create a clustered bar chart. In this visualization, as you observe, though we have enabled fill equal to gender in the aesthetic mapping, we do not have the view in stack form, but we have the bars one besides the other. It is also because we have enabled a position called as position equal to dodge in the code.
Now what is the insight that we can draw from this visualization? As you see post-graduate degree with female gender is drawing higher average income as compared to any other education level. Now let us see how to create a box plot for variable income across the genders. So the box plot can be enabled if there is a bariate analysis to be performed on a continuous variable and a categorical variable or multiple categorical variables with a continuous variable. Now let's look at this visualization. What does this say? We have data distribution of income for individual genders that is for female and the male.
And we also notice outliers here. Anything above this whisker is considered to be outliers. It might make more sense if we also include some coloring for these outliers. Maybe also enable shape. So now we have colored the outliers and it's colored orange. Let's see if we can also enable the shapes. And now we have here the outlier color as well the shape enabled. Now let us see how to enable a violin plot. What is the utility of violin plot? With a box plot we understand the analysis and the distribution of the data points is to identify the outliers to know what is the minimum value, what is the max, what is the median and what are the But what is the purpose of a violin plot?
Let us have a quick check. As you observe there is some concentration of data points in the bottom of every car category. However, the concentration is higher for standard car category as compared to economy and the luxury. Now, this is an interesting insight that you wouldn't have come across in box plot. The box plot is a very good representation for identifying outliers. However, violin plot will help you focus on the nuances which is not captured by the box plot. We can also simply combine the box plot and the violin plot together. Simply include this jaw object.
Let's zoom this. And now you have a representation of box plot and the violin plot both combined in a single visualization. Interestingly you notice the outliers as well the concentration in the bottom of this violin plot. So this could be some interesting insights that you draw and focus on these data points and understand what exactly is happening there. Now let's focus on the density plot that is density estimate of the histograms rather than just viewing the frequencies. Now we see the frequency in the y-axis across the income distributions. How about enabling the probability as true so that we enable the density instead of the frequency.
So now we have the density in the y-axis and in the x-axis we still have the income distribution. This is the way of also adding a line plot which is a density plot on the histogram. Now as you observe here the density plot is not in the same level as the bar. So let us adjust this line. For this we will include adjust let's say equal to three. And now let us see how the visualization appears. Now the density plot is on the same level as the bar. Now let us see how to create a cross table for car category and gender.
For this let us call the library deescr. Now let us create the visualization enabling cross table for car category and gender. Let's look at the result in console. As you see here now we see the counts of the gender for individual car category. The values over here represents that there are 67 females falling within the card category economy and 80 males within the card category economy. And for luxury we see that the count of female is higher than the male as well the proportions. Now how do you understand what proportions are presented here? We may simply turn off some of the proportions like the t test, the kai square etc.
Let us see how to enable that. Now let's look at the result. This looks better. Now that we have the counts, the female counts and the male counts across individual car category. We also see the percentages rather than just looking at the absolute value. So there are 45.6% of female within the car category economy and 54.4%age of male within the car category economy. Similarly across rest of the car categories. This kind of cross table or a contingency table is also helpful when you want to analyze the different categorical variables and identify the counts or the proportions.
Now let us see how to use a scatter plot of age versus income. Scatterplot is a visualization used for barvaried analysis. When you want to perform some analysis between two continuous variable at a data point level rather than performing the analysis at an aggregated level such as sum or mean. And now we have a scatter plot of age versus the income. Age in the x-axis and income in the y-axis. Though we do not see any kind of a positive correlation or a negative correlation, but we still see some interesting insights over here. Some of the data points are pretty much scattered and much away from densely populated data points.
I hope the learning has been informative and interesting. So far we have covered the concepts of data analytics as well. We have performed some hands-on doing some statistical analysis and also creating interesting visualization. The humble Microsoft Excel still stands as one of the most trusted and widely used tools for data analysis. Even in the age of artificial intelligence, automation and advanced analytics platforms, Excel stands tall. Despite the rapid rise of technologies like PowerBI, Tableau, Python, and machine learning, Excel continues to be the starting point and often the finishing line for millions of professionals analyzing data around the world.
Whether you're a financial analyst building forecast or marketeer tracking campaign performance or a student learning statistics, chances are Excel is where your data journey begins. Now let's start with the first module which is about what is data analytics and its purpose. So let's start with a simple but important question. What exactly is analytics? At its core, data analytics is a process of collecting, transforming and examining our data to uncover meaningful insights, patterns and trends that help support decision making. It's not about numbers. It's about turning data into stories that drive actions. For example, imagine you work for a retail company.
Every day, your store generates sales data, customer feedback, and inventory records. Data analytics helps you interpret that information to identify which products sell best, when sales peak, and what marketing campaigns are most effective. In a sense, analytics transforms confusion into clarity. It helps decision makers understand what's exactly happening in their business rather than relying on the gut feeling. The purpose of data analytics goes far beyond just collecting and storing data. Everyone can have data but not everyone can make sense of it. Analytics helps organizations to identify opportunities and risks early to improve operational efficiency, optimize marketing and sales strategies, enhance customer experience, and ultimately make datadriven decisions instead of assumptions.
As the saying goes, without data, you're just another person with an opinion. That's why analytics has become such an essential skill in today's digital world. Now let's go to the section two types of data analytics. Now data analytics isn't one sizefits-all. It actually comes in four major types. Each one answering a different question and serving a unique purpose in decision-m process. Let's go through one after the other. First one descriptive analytics. What happened is the question here. So this is the foundation the starting point of any analysis. Descriptive analytics looks at historical data to summarize and understand past performance.
For example, you might generate a sales report for the last quarter, calculate average monthly revenue, or track website visits over time. The goal here isn't to explain why something happened, but simply to describe what happened. Tools like Excel are great at this using formulas, pivot tables and charts to make data easy to interpret. Now the second type, diagnostic analytics. This answers why did it happen. Once we know what happened, the next logical step is to ask why. Diagnostic analytics digs deeper to uncover the causes behind the trends and anomalies. Maybe sales dropped in July. Was it due to seasonality, pricing issues or reduced marketing efforts?
Diagnostic analytics helps pinpoint the reasons using techniques like correlation analysis, variance analysis and drill down reporting. In Excel you can easily explore relationships between variables using charts, filters and regression tools. Followed by that we have predictive analytics. What might happen next is the main purpose of using predictive analytics. Predictive analytics uses statistical models, forecasting, and machine learning techniques to anticipate future trends. For example, an e-commerce company might forecast next month's revenue based on historical sales and market spends. In Excel, you can build predictive models using functions like forecast, trend, or even more advanced methods through Power Query or Excel's built-in AI tools.
Predictive analytics transforms historical insights into forward-looking predictions. Now the next one which is prescriptive analytics. What can we do about it? Finally, prescriptive analytics takes things to one step further. It doesn't just predict outcomes. It suggests actions. It answers questions. Given what we know, what should we do next? For instance, if predictive analysis shows a potential drop in customer retention, prescriptive analytics might recommend increasing loyalty rewards or adjusting pricing. While Excel has limitations here compared to specialized AI tools. So, it's what of analysis, goalsek, and solver add-ins allow users to test scenarios and determine optimal decisions making it an accessible gateway to prescriptive modeling.
So these four types descriptive, diagnostic, predictive and prescriptive form the foundation of modern analytics. Together they help organizations move from simply knowing what happened to confidently deciding what to do next. Now we enter the module three. Why Excel is still relevant for data analysis. Now you might be wondering with so many modern tools like PowerBI, Python, R or Tableau, why does still Excel still matter in today's world of advanced analytics? Well, the answer lies in its balance of simplicity, versatility, and power. Let's break down. First one, accessibility and ease of use. Excel is one of the most widely available tools on the planet.
From students to CEOs, almost everyone has access to it. And that universal familiarity makes it incredibly powerful. You don't need to be a programmer or data scientist to start analyzing data. Excel provides a low barrier to entry with drag and drop features, pre-built templates, and intuitive functions that allow anyone to learn. It's often the first step in a person's analytics journey. Many professionals use Excel as learning environment before transitioning into tools like PowerBI or Python. And because of it available on virtually every operating system, Excel remains a democratizing tool. Bringing data analysis capabilities to anyone anywhere.
Now powerful analytics capabilities is the second reason. Behind its familiar grid interface lies a surprisingly advanced analytical engine. Excel supports a vast range of functions, formulas, pivot tables, data visualization tools, and automation features. Here are a few examples. Pivot tables summarize and explore large data set instantly. Conditional formatting highlights trends and anomalies. Power Query allows for efficient data cleaning and transformation. Our pivot enables advanced modeling and relationship between tables. Charts and dashboards communicate insights visually with ease. And let's not forget Excel's AI powered tools such as analyze data and ideas which automatically suggest insights, trends or patterns.
With these capabilities, Excel isn't just a spreadsheet. It's a comprehensive analytics environment in Excel. Now, the third reason, integration and compatibility. Another major strength of Excel is how seamlessly it integrates with other platforms. You can connect Excel to databases, APIs, PowerBI, Google Sheets, SharePoint or even cloud services like Azure and AWS. This flexibility makes it a perfect bridge between simple data analytics and more advanced analytics platforms. For example, a financial analyst might export or import realtime data from SQL database, clean it with the help of Excel, and then publish the results to a PowerBI dashboard for interactive visualization.
This smooth workflow keeps Excel at the heart of many enterprise analytics ecosystems. The fourth reason is real world use cases across industries from finance and HR to marketing, logistics, and education. Excel continues to be an indispensable tool. A few examples, finance budgeting, forecasting, variance analysis and KPI tracking, marketing, campaign performance analysis, ROI calculation and customer segmentation, human resources, workforce analytics, attendance tracking and salary modeling, operations, inventory management, scheduling and performance dashboards. Even in organizations that use sophisticated BI tools, Excel is still used for quick exploratory analysis, ad hoc reporting, and presentation ready summaries. It's like a Swiss Army knife of all data analytics.
Simple enough for beginners, yet powerful enough for professionals tackling complex business challenges. Now, with that, let's dive deeper into data analytics. Excel continues to be a cornerstone for analytics around the world. In the next module, we will dive into Excel's interface and explore the key functions that make it such a reliable tool for any datadriven professional. Some of these essential functions include the following. Trim to remove unwanted spaces and text. Clean to remove unprintable characters. Text to format dates, numbers, and values. Average, median, standard deviation for statistical analysis, index match, and lookup to perform advanced data searches and relationships.
Mastering these functions will take your Excel skills and your analytical confidence to the next level. Now let's dive into the demonstration of building dashboards with Excel. So now let's get into data analytics. On my screen we have a banking data set. So here we have the customer details, gender, marital status, age, education, occupation, contact numbers, the state, the pin code, bank, IFSC, region, account number and apart from that we also have bank related data where we have the same customer details and I have added a few more columns here. So the reason is we will be attaching these entries to this particular sheet so that we have all the data segregated in one place.
You'll learn that on the go. We'll be using some lookup functions for that. And now here you can see uh the bank details. So here we have a few more details based on the account type, the account balance, what are the type of transactions happening in their account, transaction type, is it uh check transfer, car transfer, net banking, phone related transactions like Google pay or phone pay and uh is it uh any other transaction like any FDR or RTGS or is it about EMI options and a lot more options here. So all these data sets that I'm using today are not real data sets.
Those are created data sets. So you might find some names here which are not real names or doesn't belong to any of the real customers. Even the customer ids and all the uh phone numbers those are not real numbers. So you're just using this data set created by artificial intelligence for analytics purpose only. And all the names of the banks are also created names not the real names. Let's take care of it now. So here we have the account balance and transaction transaction type. Is it a wire transfer check transfer and we also have a payment network which we have considered.
We have a maestro, Visa, Mastercard etc. rupee and also the card type. Is it contactless card, virtual card or a student card, premium card? The card type as well. Is it a debit card, credit card? And then you also see here we have loan types and not all customers have loans. A few of them have savings accounts or current accounts or business accounts and a few of them have loans here. And also we have a wide variety of loan types available such as ST loan, educational loan, business loan etc. And here we have the principal amount, loan tenure in terms of months and uh rate of interest and the remaining balance they need to pay.
And if any of the accounts have converted into NP or nonprofiting or no, we have the details over here. So you can also have some blank cells which means that these customers do not have any accounts or loan accounts related to this particular bank. Now that we have a brief understanding of what exactly are we dealing with, let's get started with data cleaning, data segregation and then some data analytics. Now if you see on the first row we have customer details. So uh AI created customer details in this particular fashion which has the customer id along with the customer first name and the last name but in real time this might not be the good scenario to work with right so we will be cleaning this particular data set so for that let's insert a few more rows so here I'm using uh maybe one or two different rows.
And now the cleaning. Select the column that you want to be split or cleaned. Go to data. And here we have text to columns. Now what does a text to column operation do? So here if I click on or hover onto this particular option, it says split a single column into multiple columns. For example, you can separate a column full of names into separate first and last name columns. You can choose how to split it up fixed within or split at each comma, period or other character. So it is nothing but a delimiter, it can identify the delimiter and then based on the delimiter, it will split a column into one or more different columns.
So here we have a delimiter set as hyphen. Now there are multiple ways of cleaning this data. Let's go with the first option which is text to columns. So I'll click this text to columns and here I am selecting delimited with characters such as comma or tabs separate each field. Right? So in our case we have the hyphen right. So we will go with the limited option and then go to next. Here we have preset delimiters like tab semicolon comma space and others. So we have a hyphen. So we'll go with the other option and we'll mention the D limit up which is hyphen and you can already see the preview on the screen.
So we have the customer details column and then the first name column and the second name column. Once you're happy with the preview you can just click on next and then you will have this particular data format general text etc. So you can always go with general where Excel decides the data type based on the data present in the columns or you can also choose the type of data item or data type which you want for your data set. So I want text here. So I'll go with text. So all the customer details which is the customer ID is a text format and first name and last name is also a text format.
So I'll go with text option and then you can click on finish. And as soon as you do that, you will have the data already available. So replace it. So there you go. This is one way of doing it. Let me quickly undo it. And the next way is flash fill. Now that's one intelligent feature where you don't have to manually check for uh the option which is text column and all those things. You can copy this particular first customer ID and paste it over here. And then select the first name and copy it over here.
And then the last name, copy it and paste it over here. Now you have a template which Excel understands. If you doubleclick, right? Let me just quickly undo that. You can see an option, a small box, right? That is flashfill option. If you doubleclick it, Excel has automatically understood a pattern. It means it now can segregate the first names and the last names along with the customer ID all by itself. Now let's quickly perform that flashful option. Let me expand data a little bit. I'm just double clicking on the square box like this. I think there has been a small mistake but not not a problem.
It needs a little training. So what I'm doing is I'm trying to copy and paste the elements a little bit so that it understands what to do next. Yeah, that's the flashfill logo. There you go. I think there has been a small error but that's resolvable. Basically that's how the flash fill works. Now let's go with the initial approach of text to columns. Not a big deal. So I'll go with delimited option. Then comes next and uh our delimiter would be hyphen that's the preview and go to next and my data format would be text and then finish it.
The data already exists. Do you want to replace? Yes, I want to replace it. Now I'll rename the columns. So instead of customer details I'll have it as customer ID and then this will be customer first name. And this will be customer last name or second name. There you go. So far so good. Now here we have uh gender. I'll delete this column. We don't need this anymore. And we have gender, marriage status, age, education, occupation, phone number, and all those things. That's good. Now uh I see we have a lot of uh unique ages here.
We have 30, 39, 24, 19 etc. But when we are dealing with creating visualizations for transaction based on age types, this might be a lot clumsier, right? You'll have a wide range of barriers. Now what if I segregate them into age groups? something like aged between uh 25 to 60 are middle-aged and above 60 would be senior citizens and below 25 would be like students or something like that. It makes a lot of sense, right? So let's try that. I'll insert a new column here and uh I'll expand this a little more and I'll name it as age groups.
Now we have a wide range of opportunities to do this. We will go with if condition. So I have an if condition. If so you have the uh syntax readily available from the Excel here. So you can see if function works with the logical test and if the logical test passes what should I value it as? Value if true and if it doesn't pass if it fails should have a value for that as well. So value if false. This is the syntax for if condition. Now the logical test is we will have to provide the data for the logical test.
I'll go with the column f_sub_2. If this particular value is greater than 25 then middleaged else I'll term them as students. So don't forget to add the double quotes because this particular data is text type data. So we should not forget text representation which is double quotes. And apart from that we were also supposed to add the senior citizen. So I'll add another if condition and the same value will be considered the cell is greater than 60. Okay. greater than or equal to 60 then senior citizen. So let me make a quick change here. So I think we had a small error with the formula.
Let's quickly try that once again. So if condition logical condition is the cell value should be less than 25. If this is true then we will turn them as students. This is my condition. If they are not then we will have another if condition. So there is an age barrier where we will have uh students and apart from the students age which is more than 25 to 60 we can call them as middle-aged and for that we'll write a new condition. So it will be encapsulated inside another if condition. So I'll open an if condition and we'll consider the same cell and this should be less than or equal to age number 60.
If it's true then they are okay let's not take 60 let's go with 55. 55 then we can call them as middle-aged else we can call them as senior citizens. Let's consider 55 for retiring age. Actually it's 60. For the sake of data set, let's go with 55. And I'll call them as senior citizens. There you go. And you can go close the first if conditions, then the second if condition. Once you're okay with the formula, enter. Now we have the first middle-aged entry. Now remember the flash that we were trying to separate the first name, second name, and the customer ID.
We can also use this here. So you don't have to manually write the if condition to all these cells. So let me show you how many number of cells we have. So we have about 5,000 entries here. So we don't have time for writing 5,000 entries. So for the sake of saving time, we will use the flash fill. So here you have the small box. So just double click on it and automatically the formula will be applied to all the cells. Now you can quickly check that as well. So here we have a senior citizen and uh followed by that we have all the middle-aged people and also a student entry.
So all these are the ones. Now we have data segregation for customer first name, customer last name, age groups are also created and now comes the uh yeah banks. So if you check the data set properly, we don't have bank names. Even in the banking details, we don't have the bank names. So here you have account numbers and account types but no bank name and also in the customer details. Now how do you extract the bank name? That's the main question. That can be done. Don't worry. So let's create a few more columns and let's make use of some basic functions like left and if conditions again.
So here you can see we have sp i n a x i s i and uh pnb and a few more entries. So what we are going to do is we are going to extract the first four letters of this particular bank if based on that we will deem or we will term that particular bank first name to be the complete name. based on the first section of this IFSC code we can name or term that particular bank's name that's the idea so basically since I created this particular data set I have a clear idea that we have about five bank names here so based on that let's go ahead and name the banks so I'll use the function left and text so this is the text I want to go ahead with which is in M2 and how many number of characters you want to extract So s B I N.
So we have about 3 to four. So let's go with four. I'll go with four and enter. We have the name here. And based on this name, we will consider the name to be given to this particular bank. So we will use the same if condition here as well. So equals to if press tab to select the function. If this particular entry is equals to SBIN then name it as state bank of India or if this particular value is equals to axi which is axis then name it as axis. this bank another if this value is equals to HDFC then name it as HDFC bank.
Similarly, if this particular entry is PN B 0, I think we missed the cell address equals to PN B 0. Then name it as Punjab National Bank and I think we have covered 1 2 3 4 and one more is remaining which is ICICI. If this particular value is equals to ICIC then name it as IC IIC bank and there you go we can close the brackets. First bracket is done. First if condition is covered. Second if condition is also covered. Third if condition is covered and fifth if condition. No no I think we have covered the fourth one.
last fifth year that's the one there you go now enter now we have the first bank name which is state bank of India now we will quickly use flashfill to extract this left function to all the entries and similarly another flashfill function for filling the bank names so we can use the doubleclick option there you go we have all the first four letters of uh the IFSC code and similarly based on the IFSC quotes first four characters will also have the bank names. So we have bank name provided and uh the first forsc let's name it as this.
There you go. Now we have a better cleared data. We have dedicated customer ID. We have a dedicated first name, second name or the last name, gender, marital status, age and age groups as well and their education, occupation, phone numbers, state, regions, bank names, etc. Right? I think uh that's good enough. Now let's also check this. Do we have any other details to be added or not? That's okay. So we added bank name based on formula of if and left. Right? Now there was another method. Okay, this is just for learning purposes only. There's another method where we could have added these names manually to all the banks based on the IFSC code.
Now let me explain you how that can be done. Since we discussed that this data set is created by me for learning purposes only and we know uh there are five different banks. So let's add the names of five different banks here. ICICI Bank, HDFC Bank, State Bank of India, Access Bank and lastly Punjab National Bank. There you go. And these will be my banks. Right? Let me quickly arrange that in the form of a table. So here we are learning uh table formatting right. So that's how you can color and you can create a table add borders etc.
Now let's say I have an option called bank name. Select the bank name. So do I have a drop-down option where I can manually select the bank name? And can we create it? Yes, we can create it. And that's called data validation where you can validate the data and based on the selection you can have a drop-own menu. So to do that you need to first select the cell go to data and then here you have an option called data validation. So you can see it says pick from a list of rules to limit the type of data that can be entered to a cell.
For example, you can provide a list of values like 1 2 3 or only allow numbers greater than thousand as long as it's valid entries. Right? So here we have uh that option. So you can click on data validation and here you have the settings. So we have a wide variety of options to go ahead with. You can go with numerical values, add any value or you can go with the whole numbers or decimal numbers. You can also have a list of options like we have here. You can see we have ICIC, HDFC and all the bank names that we using in this particular data set and you can use a list and also you have an option to enter a calendar.
You can provide a specific date and a calendar there and you can select the date from the calendar. So there are a wide range of options in data validation for more learning more details on data validation go through the tutorial on data validation from SFAN. So now we'll go with the list as we discussed. So here you can provide the source. You can manually type the source like ICIC like we typed here or you can also provide the location where you have this particular source. So I have the location over here. So I'll select the location and then click on okay.
So we have uh the location ready. Now this is not just done yet. You can also make some modifications to it. So let me delete this particular validation. And here let's not delete. You can also directly edit. Here you can add an input message. So let me provide the title. Select her bank. And we can also um okay this can be the input message and this can be banks list. Now you can also have an error alert as well just in case if the user tries to enter any manual name apart from the provided list then it should be an error right so I want to have an error message here so I can use stop warning or warning or information I'll go with stop and the title should be saying wrong or the uh you can say bank not found and the message will be Please select from the list provided only.
There you go. So I'm done with my data validation and also the error messages. Click on okay. And there you go. You can see when you hover on this particular icon, you can see a message. This is a bank list. And you can select the bank name from this particular list. And let's say I select the list. And there you go. Using the same flash I can extend this particular list of options. And now this particular cell has data validation. Now apart from the available banks, let's go with uh some other bank name which is not provided in my list.
Let's go with uh random bank. And if I press enter, I'll find an error. And the error says please select from the provided list only. Right? So this is the error. Now this what this does is this avoids any kind of mis entries into your data set. That's one good functionality of uh data validation. Now you can select the list entries only and you can add the list whatever you want. There you go. Now that's how you do data validation. Now the next job is to segregate all the data into one place. Then we can perform the filter operations.
We can also eliminate the duplicates. Right? we have about 5,000 different entries. So they might be a possibility or may not be a possibility. Both could be possible, right? You may have a duplicate entry or may not have a duplicate entry. But the best thing is before you start with any data analytics on any platforms whether it is Excel, PowerBI or Tableau or any platform. The first step is data cleaning and segregation. So that's what we were doing so far. So we have segregated all the data that we needed. Now the remaining part is we need to collect all the data into one place.
Since you're working on Excel uh is not so complicated but if you're working on a analytics platform like PowerBI or Tableau that is called as data modeling where you connect all the different data models in one place and they have a relationship established between them. Now it's simple in Excel. All you need to do is just copy paste in the best way possible or you can also build a data model in Excel. So both are possible. So let's go with the simpler version. Now what we need to do is copy all the data and paste it in one single place.
So um so far I'm comfortable with the data whatever I have. So let me create a new sheet and here I'll copy the bank details. So I have year here I'll copy this and paste it here. Let's call this sheet as worksheet or workbook. And now go to the customer page. So we have selected all the data. So back to the workbook. Now before you paste the data, make sure you select this cell and then go to the paste options here and just click on the drop-down and here you have the option of paste values only.
Why we are going for the paste values only? So you might be having a question. So if you go back to the data set and here if you check the particular cell here we have a formula right and apart from this we have a formula for majorly all the cells over here. So when you're copying and pasting the cells with formula then you will face the reference error in the worksheet. To avoid that reference error, we will use paste values only option so that it gives you all the values without any kind of formulas in the cell and without any complications.
So you have the customer's data and first name, last name and all the details that we had in the customer details. And now let's head back to the banking details. And uh we don't want these cells that I've inserted. So let me eliminate these cells or you can just delete them in this one as well. Yeah. Now what remains is we needed the account number, account type and rest of all the data entries which are present here. And we'll copy these and again the same approach. Go to paste special and here values only. Now we have two entries of account number.
We can delete one not a big deal. So there you go. So again before you perform any analytics the first important major step is to clean and segregate your data. Now you can see the rows and columns are not properly aggregated or not aggregation basically organized. So you have a simple shortcut for that. All you need to do is select any one of the cells and click and ctrl A which selects all the cells in one go. Then you have a short trick key. Remember this. Press Alt and release once. Then alphabet H. Press and release once.
And then comes the alphabet O. Press and release once. And lastly, alphabet I. Press and release once. There you go. Now, what this does is it organizes the cells width and height. So this one was for width and there's another one for height which is for alt h o a will be for height and alt h o i will be for width. Now we have organized the cells in width. Now shift control and arrow key for selecting all the header cells. There you go. Now the header cells are also done. Now select all the cells and we have the borders.
Now the data is correctly aggregated and another important step is save the data after each and every iteration so that nothing is lost on the go. Now we have saved it. Now we also have some important u statistical functions like sum, count, average, mean, median, mode. So let's also go through these ones. So um the name itself speaks not too much complicated. So we have yeah account balance. So let's freeze this particular column. You can freeze this by selecting the column and going to the view option. And here you have the freeze panes. Uh freeze the column.
You can select the columns that you want to freeze. I'll freeze all these. Or you can simply shift this particular column so that you have a better readability. Press and hold. And there you go. You have moved it over here. Now you have a reference column with numbers to implement all these statistics functions, aggregate functions which are sum, count, average, median, mode, minimum, range, standard deviation, variance and quarter, percentile etc. So let's go with this one after the other. Now let's go with sum. So sum function. So why are we doing this? So that we have a quick idea on sum, sum ifs and all those functions, right?
So sum and select the range. It will give you the sum of all the entries. Give it a small error sum function. Enter. Okay, there is a data type error. Okay, let's consider a different column. I think it is due to the currency format and here we are selecting the whole number format. Not to worry. We shall now explore with this particular function so that uh we can also have a filter go to data and here apply the filter to this particular column own and eliminate blanks. Let's paste it here. Now we'll have the eliminate blanks.
There you go. Now you can work on there you go. It will give you the sum of overall numbers present here. And you also have the count function which basically counts the total number of cells we have about 2,000 entries. So, um, basically out of 3,000 employees or 3,000, uh, account holders, more than 2,000 account holders have gone with taking a loan with the bank. Now, let's find the average uh, tenure period provided to all the uh, account holders. So you can do that by considering the average number average function and then providing the cell entries and you can find the average amount of time given to clear the loan is about 96 months and you can also convert that to EOS.
And now here you have the median function which is similar to average. So median of the cell entries have the median here. equals to mode and the numbers provided we have the values. So now what do you mean by mean, median and mode? So uh let's have a definition for it. So mean is the sum of all values divided by the number of values and uh it will give you or it will represent the average value of a data set. And similarly for median the median is the middle value when all observations are arranged in ascending or descending order.
If there is an even number of values the median is average of all the two side of middle values and uh lastly the mode. The mode is the value of that occurs most frequently in the data set. So the data set with have no more one mode or unimodel or multiple modes. So basically uh the maximum uh number of repetitions for a dedicated number is called the mode. So here you can see 120 is repeated most number of times apart from the other provided number of months per dedicated loans. Right? So maybe it could be a home loan or a kind of a car loan or a bike loan.
So most of the most commonly taken loans are uh given the 120 10 year period. So that's the most repeated one. So in our terms mode is 20, median is most uh widely considered as the average number and mean is similar to average. So that's how it goes. Now we have the next two functions which is minimum and maximum. So it is self-explanatory. So you can provide uh the function minimum min of all the provided numbers here which gives you the minimum number of tenure which is 12 months and also the max number. So you can just type down max and press tap to select the function.
Get the range of tenure numbers and you have the maximum as 240 months tenure period. So which is that one and you also have uh range function. So what is range? So range is a measure of how spread out or diverse or disperse the data values are. It shows the difference between the highest and lowest values in a data set. So it's like maximum value minus the minimum value. So you can perform it in two ways. You can select uh this cell which is the maximum value minus the minimum value or you can simply use the range function from the available formulas or you can just imply this particular one which we have gone with and the next one is the standard uh deviation.
So standard deviation measures how much the data values vary or spread out from the average. It tells you how consistent or scattered the number of your data sets are. So you can perform standard deviation using the formula std deev. So you have uh two different types of standard deviation. So it is based on the entire population and the other one is based on a sample. So you can just go with uh the standard formula which is standard dev here or the entire population one or the one which you have over here. Right? and you can provide the range of numbers of the entire population in your cell and you have the standard deviation value here.
Now next we have the variance. So variance measures how far each data point is from the mean on average. It represents the average of the square differences between each value and the mean. In short, it shows how spread out the data is just like standard deviation but without taking the square root. So you can also use the variance formula. So you have covariance and covariance P. We can go with this one. We use the coariance in a wrong. So next we have percentile and you can provide the percentile and K is the percent you want.
Let's say I want 25% of the range then you can provide 25. Enter. And you have that. So if I wanted 21% of the entire period provided for tenure then I can provide uh the number as 0.21 which gives me 21% of the provided uh tenure. That's how you uh implement the percentile and similarly Q1 and Q4. This is also the percentile functions where you can provide a quarterly basis function. If you want uh the first quarter you can provide 0.25 similar to this one you can go with 0.50 50 0 0.75 and 100% or 1.0.
So that's how it works. So basically you provide 0.25 here, 0.5 for this one, 0.75 for this one and 1.0 for this one. This is the quarter 1, quarter 1, quarter 4. That's how it works. And weights. Now let's go back to the uh that was a quick uh scenario on the aggregate functions or the statistics functions in Excel. Now let's go back to the uh pre-worked uh workbook and we are all good to go. Now let's also create a duplicate of this workbook. Move or copy. So why am I creating a copy? Because I want let's say uh I want to work on uh the loan section only.
Right?…
Transcript truncated. Watch the full video for the complete content.
More from Simplilearn
Get daily recaps from
Simplilearn
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.
![Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi/wuk0LP9eRo8/maxresdefault.jpg)
![Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi/Fc8HlmOoExk/maxresdefault.jpg)
![Applied Data Science With Python Full Course 2026 [Free] | Python For Data Science | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi/bhGuzBVtQO4/maxresdefault.jpg)
![Applied Data Science With Python Full Course 2026 [Free] | Python For Data Science | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi/GDqHxgUQj6k/maxresdefault.jpg)