Applied Data Science With Python Full Course 2026 [Free] | Python For Data Science | Simplilearn

Simplilearn| 08:07:23|Apr 26, 2026
Chapters18
Introductory welcome and what the course covers, including Python setup, core libraries, and hands-on learning with real-world data problems.

A practical, hands-on tour of data science with Python (NumPy, Pandas, Matplotlib, Seaborn) plus stats, time series, and real-world visualization patterns from Simplilearn’s 2026 course.

Summary

Simplilearn’s Applied Data Science with Python course guides you from Python basics through core libraries and visualization techniques. The instructor outlines a practical flow: install and set up Python, then dive into NumPy for array math, followed by Pandas for data structures, data cleaning, and filtering. The course emphasizes data visualization with Matplotlib, Seaborn, and Plotly-like concepts, illustrating charts such as line plots, histograms, scatter plots, box plots, heat maps, and more. Key topics include the mathematical foundations behind data science, linear algebra, probability, and statistics, plus practical data wrangling, feature engineering, and real-world datasets. In addition to theory, the course leans heavy on hands-on notebooks, exercises, and LMS materials, with interactive breaks and quizzes to reinforce learning. The program also covers time series basics (date ranges, resampling, time deltas), and introduces basics of model-building flow (through Python libraries) without venturing into deep ML/deep learning in this module. Throughout, the instructor stresses the importance of data integrity, reproducibility, and careful interpretation of visualizations. Expect a heavy emphasis on coding practice (NumPy arrays vs Python lists, broadcasting, vectorization, and indexing), and on converting real datasets (CSV/Excel) into actionable visuals and statistics. It’s designed for beginners who want a structured, practical path into Python data science, culminating in certificates and LMS resources for further projects and portfolio-building.

Key Takeaways

  • NumPy arrays are faster and more memory-efficient than Python lists because they store data contiguously in RAM (ndarray) and support vectorized operations.
  • Pandas introduces labeled data structures (Series and DataFrame) that simplify tabular data handling, with powerful indexing (label-based via loc and integer-based via iloc) and built-in statistics helpers.
  • Data visualization with Matplotlib and Seaborn is central: learn line plots, scatter plots, histograms, box plots, heat maps, and advanced grids/faceting to compare multiple subsets of data.
  • Broadcasting and vectorization are the core performance enhancers in NumPy: arrays of compatible shapes are automatically broadcast, allowing elementwise operations without Python-level loops.
  • Time series basics are covered: date ranges, date parsing, time deltas, and resampling, enabling meaningful trends and aggregations over time.
  • Statistical fundamentals (mean, median, std, percentiles, correlation) are integrated into Pandas workflows, making descriptive analytics a first-class capability.
  • The course combines theory with hands-on practice in notebooks and LMS materials, emphasizing data integrity, clean preprocessing, and responsible interpretation of visuals and outputs.

Who Is This For?

Essential viewing for aspiring data scientists and analysts who want a structured, beginner-friendly path to Python-powered data analysis, visualization, and basic statistics. Great for professionals transitioning from other domains who need a practical, project-driven introduction to NumPy, Pandas, and Seaborn.

Notable Quotes

"Data is the powerhouse and the data revolution is here, driven by AI and scalable storage; humans must verify AI outputs and maintain data integrity."
Sets the stage for responsible data science where AI accelerates work but human oversight remains essential.
"Broadcasting and vectorization are the core reasons NumPy is fast: compatible shapes allow elementwise ops without Python loops."
Explains a central performance concept in NumPy that students repeatedly encounter.
"A DataFrame is a tabular, label-based data structure; Series is a single labeled column, and both integrate with NumPy beneath the hood."
Clarifies Pandas’ two primary data structures and their relationship to NumPy.
"The five-number summary of a box plot—minimum, Q1, median, Q3, maximum—along with outliers, gives a compact view of distribution."
Highlights a key visualization tool discussed in the seaborn/matplotlib portion.
"Time series uses date ranges, resampling, and time deltas to turn raw dates into meaningful trends over time."
Emphasizes practical time-series techniques covered in the course.

Questions This Video Answers

  • How do I start with NumPy in Python for data science and why is it faster than Python lists?
  • What is the difference between a Pandas Series and a DataFrame, and when should I use each?
  • Which plots should I use to show distribution, correlation, and time-series trends in Python?
  • How do broadcasting and vectorization work in NumPy, and why do they improve performance?
  • What are the best practices for loading and cleaning data from CSV/Excel in Pandas?
PythonNumPyPandasMatplotlibSeabornData VisualizationStatisticsProbabilityLinear AlgebraTime Series","Data Cleaning","Feature Engineering","Jupyter Notebooks"],
Full Transcript
Hey everyone, welcome to the applied data science with Python course by simpler. Today data plays a huge role in almost every industry. It help organizations understand patterns, make smarter decisions and solve real business problem. So if you want to learn how to work with data using Python, this course is a great place to start. So in this course, we will take things step by step and build your understanding from the ground up in a very simple and practical way. Here's what we'll cover. We'll begin with the basics of Python and the learning setup. So you will get comfortable with the environment and understand how to start working with code. Then we'll move on to NumPy where you will learn how arrays work, how to perform mathematical operations, and how to handle data efficiently using dimensions, shape, indexing, vectorzation, and broadcasting. After that we will explore pandas which is one of the most important libraries in data science. You will learn how to work with series and data frames organize data filter it and handle things like categorical values and date time information. Next we'll look at data visualization using libraries like mattplot lib seaborn where you will understand how to use charts like scatter plots, histograms, box plot, pair plots and heat maps to explore patterns, correlation, outliers and missing values in data. We'll also cover the mathematical foundations of data science including linear algebra, vectors, matrices, probability. Now these concepts are very important because they help you understand how data science and machine learning work behind the scene and by the end of this course you will have a solid understanding of the core concept needed to start your journey in data science with Python. Also if you want to build a strong practical skills in Python and data science check out simply learns data science with Python course. This is designed for learners who wants to understand Python programming, data analysis, data visualization and machine learning in a very structured and beginnerfriendly way. You'll get to work with key libraries like NumPy, Pandas, Mattplot Lib and Seaborn and also gain hands-on exposure on important concepts like data cleaning, feature engineering, statistics and real world data analysis. On completing the course, you will receive a course completion certificate from SimplyLearn which can help you strengthen your profile and showcase your skills. Check the description below for the link and start your data science journey with Simply Learn today. Before we jump in, here's a quick quiz for you. Which Python library is mainly used for working with tables of data like rows and columns. Your options are NumPy, Pandas, Mattplot Lib, or Cabbond. Let me know your answers in the comment section below. Course is all about Yeah. Yeah, I got it. I got it. Applied data science with Python. Okay. So if we talk about this course in particular right the learning path that we are going to take is initially we will start with the basic course introduction that it is introduction to data science which focuses on data science and its applications that's going to be covered today. After completing this straight away we start with the advanced libraries of Python that is numpy. All right. So today we will start with the advanced libraries of Python that is numpy which focuses on concepts of numpy and its uses. Then after numpy we will move on to pandas which is definitely a basic library required for data analysis including types data structures using pandas and function. Then we move ahead to the data visualization library which focuses on visualization techniques and different types of charts. Another important aspect of data science is statistics and mathematics. So which focuses on fundamental of statistics, its type, data categorization, concepts of scalar, vector. Then we will move forward to probability distribution, how distribution is carried out. Then we will also understand in detail advanced statistics concepts, probability which will help in analyzing the results through hypothesis testing. And last but not the least, we would be practically loading the data set and performing analysis on it with the help of data wrangling and feature engineering. Getting my point? I hope today's introduction is helping you out to understand what is expected from you. How we are going to move forward in this data science journey. Yes. So can the audio be turned on so that we can communicate directly instead by chat. If you want to communicate with me you can always say to unmute you so that we can discuss also. So that's not an issue had if you want I can unmute you. We can have a discussion if you have any particular doubt but unmuting everyone can create a chaos in the session. Got it? Got it? Yeah. So again I'm telling you the format. Generally you can communicate through chat but if you have issues you can tell me to unmute you. We can discuss. You can even share your screen. I can help you out with that. keep my uh you know discussions very very birectional right and we will go in for break in the middle of the session that's after two hours and poll will be conducted even last 10 minutes we will keep it for discussion Q&A if you have any issues got it dharun deep tulapati soda sivagami yeah now tell me if you still have any doubts regarding the course the prerequis requis it. Please let me know if you are familiar with the format of simply learn. We give lots of hands-on exercises of Jupiter notebook. There is at the end you're supposed to complete the course and projects and of course the ebook and the reference material. So are we excited to start this journey? Yes. Learners, are we excited to start this journey? Yeah. Yeah. You will do the assignments right the daily assignments that we can discuss it on the uh session itself and the project is submitted at the end got it the yeah and I hope you are also aware about the LMS your learning management system from where you get the course material learners are you aware about that the learning management system right you can yes learners do you understand this or not LMS no just 15 uh days required just dika after the course ends then you will get two weeks to um submit the project. Okay. So this is the LMS. You can login into your LMS and this is the screen uh or the dashboard. Since I am taking lot of courses so you have to log into your applied data science course. You see this on the left hand side you have the instructor slides, notebook, lab guide, incremental capstone data sets all this reference material I request you to download and be ready with that. Now clear heads say and any extra material that I will share that will also be shared on the LMS. All right. So as you all are very much familiar now that what is data? So Deepak says I have five years of experience from data background as a business. Oh, you're already working into this domain. So you have a lot of knowledge. So you're working in Tableau or PowerBI. What kind of business analyst you're working like deeper? So says you already have a software background. So you which language are you familiar with? Can you just give me uh the background? HTML, web development, PowerBI. Okay. That's good. So you have idea about data analysis deep that's great good SQL and VBM my working experience is process engineering Sagnik Singh posess pursuing MBA in rural management from symbiosis is it XIMB is symbiasis is that uh Sagnik have two years of X consulting all right Aron says web application developer full stack net sequence 10 plus on a break now okay Aron I'm inity assurance. Lakshi is in quality assurance. Okay. Dan is pursuing be maybe at a profession level. I used Python for simple. Oh great heads great. I'm having 2.5 experience in sales manage in uh okay so oh so that was Xavier Institute of Management Bhneshwar. Okay. XIMv okay got it got it son. Got it. Great. Great to hear from you all and as we as I move along I'll I'll keep on asking please give me your background how is it practically going to be used to you so I I love interacting uh from my learners and you can always share your experience especially analyst analysis where this learnings can be really helpful you can always always share it uh with everybody uh because it's a learning from all of us and you can always say I want to unmute share my experience knowledge It would be beneficial to all. Okay. So if I talk about Tulapati, 13 years of experience in production support. Okay. Tulapati. That's good. That's good. Dika says five plus in CFD and thermal simulation. What do I understand by CFD? Devika. I don't know what is CFD. Navd says 20 years in IT infra and azour cloud. So Navd you are quite working in that and now you want to you know shift into this data analysis domain. Nav Nav is that the case says computational flu okay CFD is computational fluid dynamics. So how is it useful? What is computational fluid dynamics? I would like to hear something on that. She says I have 10 years of experience from banking. I have been into operations and techno functional roles. Okay. currently on a career break looking into application of machine learning in banking industry having knowledge of okay that's great you have knowledge but you want to uh now totally get into the data science era right so Dika can you just put us a light on what is this computational fluid dynamics how is it really uh useful data science is used for pre-post-processing automation of the CTF if you can help out with that if I can unmute you therea. Yeah, please unmute yourself and just shed a light on data analysis. This is something very new to me. Yeah. Yes. Thea we can't hear you. You also please check your mic settings. We can't hear you. It's unmuted. Please check out log out. Log in. I heads was able to later on do it. Something issue with your mic. There we go. Please check on that and the time the moment you are ready you can you can always tell it to me to unmute so that we can have a good discussion on that. Okay. Okay. So we all understand what is data. Data is nothing but raw facts and figure and I have been uh in 10 years of experience in automation. Okay. So Sudha. So now let me start with the course and then we'll do the interaction again. Now my question is is data part or is data now a big hit or is data has always been part of the human civilization? Yes. Is data always been part of the human civilization? That's my next question everybody. Always part. How can you say that? Tulapati the mani says yes it has always been but now it's been tracked and traced now. How is it tracked and traced now? Why is it a big hit now when it has always been part of a human civilization? Why is it hit now? Say deep why why is it a big hit now? And if I talk about um let's talk about ancient man. What was the data that ancient man used to use? Let's say we are just trying to understand that is data why is you know data now easy to track and trace is data has been always part of the human civilization. If I talk about ancient man where the man did not have anything it used to live in caves and make food by burning fire and doing that then also there was data. How was data stored in those days? Tell me head tell me deep. Yeah, we used to write it. It used to carve it on stones, not exactly paper, maybe on the leaves and bark. People who survey that kind of data are known as archaeologist. Yeah. They they even people who really study that ancient time. what was the how old were the coins and what is re written and you know the different statues associated with that. So they are known as archaeologists they have their own way of analysis as with the time you know especially with the invention of wheel and the steam engine. The industrial revolution became part of the human civilization. Agreed. Then it was next change which was there. The industrial revolution you know the clo the everything became more faster and in large production. Do you all agree? After the steam engine, the next big revolution was the invention of computers or the information technology, the internet, right? Which has, you know, where we all converge. And now that data that was being stored in terms of files, paper, you know, where all the documents were kept, right? Now they're all being stored into the form of digital data. That is why you see a big buzz you know around globally you know where people are talking about saving of energy where this data is going to get stored you know data centers the energy to save uh to run these huge amount of data. Do we all hear that now? Yes or no? So Deepak says that data has always existed but the scale, speed, tools have now have now make it transformation. Absolutely correct. So it has always transformed with uh you know evolution or revolution and why data science feels like a revolution. It's not about having data but about finally being unable to yes we are able to unlock his power because with the help of computers the data is now digitally stored right and now the this digital data is being generated in huge amounts. We understand now data in terms of big data with five e volume variety right veracity velocity and the value that we want to add to the data. So there is huge amount of data where we can do analysis and bring benefits to the companies. Got it? And now we are living in the fifth big uh revolution of human civilization which is known as the AI revolution. And the new oil for the whole world is not the crude oil but data. So the every country is running for that particular power because now whichever country will have you know the biggest power resources the energy resources for storing the data processing the data will become the main part right do we see that globally also it is such such a big impact is happening globally you know people are just want to want to become the big powerhouse right heads mani Okay, can I call you Padhya because there are two three Deepak so I think so Padya is a good way I can call you. Can I can you can I do that if you allow me? Yeah. Thank you. Thank you so much. Right. So now you know data is the powerhouse uh you know semiconductor devices because see ultimately it is the hardware and the software which will go together. Data is not something alone which is kept. So it is the hardware and the software where we combine together to move forward. Got it? Getting my point? And if we look at data more technically, please try to understand. If we look at data more technically, it is divided into two parts. One is known as the categorical part. This is known as the uh yes AI can replace a data scientist. It can do the analysis. But very important part of Adya we need to understand with all the AI that we see around is who is going to results of the AI right there is no verification when we talk about different LA uh you know the Gen AI tools the the automation happening. So human involvement at least if you ask me personally from my experience is extremely extremely important. We cannot be completely dependent on it. If you ask me from my personal view whether it's even a small response even a small answer that you get from chat GPT don't rely verification of the fact the answer the code or the result or the automation definitely needs to be done at the human level got it of course it is going to make the work faster right like for example making PPS designer PPS was a hardcore time but now you can generate it in a in hardly two minutes time but who's going to verify points covered or not? Got it heads say all getting my point learners please be very very clear with this point because that's the big revolution I see this was never part of the discussion do you think it was part of the discussion even in my last three sessions in my of my career no but now I have to talk about genai the output of it how is it affecting the whole thing Right. It was never part of the discussion even 3 years back or rather two years back also with evolvement of all the chat GPT now claude anthropic is going to take big wave in the market you know they're trying they're always talking about that so we will see these changes in front of our eyes you know we are witnessing the revolution it itself okay so will we also learn autonomous using Python here no autonomous Python here that's different course that's geni course okay no autonomous over here here we will learn the very basics how to code how to understand different libraries and how analysis can be performed we will not rely on any tools first learn the basics then only you will be able to analyze the results of the uh genai or the AI tool whether they are correct or Got it. Is this point now clear? Right. So I understand now learners have more confusion. There is so much to explore. So much to explore even at my level you know being in this field for the past 16 years. I have to learn a lot uh you know and there are so many tools which you know are there still uh open in the market which I have to learn explore and they are really really helpful to us. What level of statistics knowledge is needed? Yeah, statistics knowledge very basic statistic knowledge is required. Uh in terms of mean, mode, median, the graphs, the charts I think so that's more than enough and probability is going to play a major role you know in whole of AI machine learning probability has a major role to play. So few concepts of probability would also be covered. Got it? Yeah. Okay. So now if we talk about data, data can be divided into two main parts. First is categorical and second is numerical. Categorical data. What is categorical data? Which can divide it into categories. For example, what is your marital status? Are you married, divorced, widowed, etc. Single etc. Which political party do you belong to? Do you belong to the dominant Dominican Republican nationalist? What kind of party do you belong to? Eye color do you have? I eyes green, blue, brown, black. What kind of eyes color do you have? Right? So, categorical data. Then numerical data is further divided into discrete and continuous. Here I'm talking about absolute technical knowledge. How we understand data for analysis. And if I talk about discrete data, they are nothing but the counted item, the number of children defects per R. And if we talk about the continuous data, right? It is all about the weight, the voltage. That means the data can range infinity to plus. For example, the weight uh you know of any uh object can vary from 0 to 10.37 g to 115.34 kilogram or even more than that. So it's continuous that means it can take decimal values. Getting my point clear right. So now let's understand how practically data science is used in the real world or in for the business domain. Let's try to understand that particular point. So if you talk about data science, data science is basically you know used or the model that it works on it is known as the DI KW model. So what do we mean by DKW model? data. DKW model stands for D stands for data, I stands for information, K stands for knowledge and W stands for wisdom. Okay. So let's understand it with a simple example what these terms mean. Data is raw facts and figures. We understand for example sales of a car company for the last one year. So data is nothing for the sales of the car company for the last one year. Processed data please try to understand there is a slight difference between data and information that processed data is known as information. What does what do we mean by that? that now when I have the sales of the car of the company which are the months which have maximum sales maybe I can represent it with the help of dark blue and the green ones you know with the minimum sales of the month getting my point are you understanding? So this is where statistics comes into the picture. Finding maxima, minima, average, right? The uh the standard deviation, the variance of the data. Getting my point learners, right? Now if I talk about this data information, this is we are talking about last one year. But knowledge is comes with experience. Knowledge will always come with experience. I'm very knowledgeable. It doesn't mean that you worked very in one year and you know the experience. So rather than taking the data for the last year, I talk about the data for the last 5 years. What I try to observe is that the same pattern is getting followed that during the month of festivities maybe during Navaratra's Christmas time or those times the sales of the car increase. This is what I have. I don't observe it in my last year but for the last 5 years and there are few months where the sale is very very low. Right? So this is the insight. This is what I detect from this data the pattern or this is the insight that I get and using this insight the wisdom that means now how I can use this knowledge this wisdom this insight to impact my business. So now I can tell my sales and marketing team to apply more offers during the you know the maximum um during the m uh during the maximum months. So how my sales and marketing team can work to have more impact on the business. Did you get this point suda head? So how we are actually using the data to have impact on the business and I'll give you one very practical example which was you know it's it's like it's now old it's like 24 uh 24 years old right because now we are living in 2026 it's a 24 no it's a 22 year old um example still it has a lot of impact that there was a a hurricane Francis was about to hit the Florida's Atlantic coast. But it was Linda M. Dilman, Walmart's uh Walmart CIO. She pressed that it's better to analyze the data that hurricane Charlie struck several years earlier, several weeks earlier. So what she says is right. So the idea is that uh you know that she thought that one one uh uh hurricane which struck 2 weeks before right let's analyze that data that what people were buying what was the data or the history that shows and that thing can be applied to this hurricane also. So what do you think are the uh you know items that people would like to buy when you know I say there is a natural calamity. Now we see these events a lot. So there is a natural calamity about to hit. What are the things that you think that you know people would want to buy more right? Basically that was that's the idea. So what our you know idea says that when there is a natural calamity which is about to hit. So the Walmart's uh people would actually think that people would buy more of the bottles, the flashlight, the groceries, the food items, right? Exactly. The shelter materials. Absolutely correct. But in the actual scenario, the same thing happened. What happened is the New York Times reported that the experts mined the data and found that the stores would not indeed need certain products, just not the usual flashlight, but it was the strawberry poptarts increase in sales like the seven times the normal sales rate. And ultimately, yes, Miss Dilman agreed that the prehurricane top selling item was a beer. So can you imagine this thing? What was the item which actually came out to be a big hit and that proved a big profit to the organization? It was not what we think logically but what the data speaks after the analysis. Getting my point? Are you understanding the power of data analysis and data science? Now everybody is getting this point and it's 22 years old, right? Example. Okay. So here I've just prepared a few uh you know uh quick uh okay we'll do the Python questions later on. Let's let's start. Yeah. Yes. Sivi share it. The example says that the hurricane was about to hit on the Florida's coast. Right. But a hurricane had uh hit before that. So she said that let's uh you know start you know uh doing analysis of the data where it the hurricane hit it before. So during the analysis they found it was not the grocery items or the essential items which were a big hit or a sale but it was the strawberry pop-tarts or the beer item. The sales had increased seven times. Got it now? Now why it was that that's that's another level of exploration you can check it out uh on the net you know because this is an right example uh taken right but you know ultimately something beyond what our logic says you know that what the data speaks that is what is your role as a data scientist to explore something more than what we understand and what the data speaks got Malipuri, Davika, Sivagami, Hi, Karthik, Kirana, Kamill. Are you all getting my point? So if you have downloaded the reference material, I request you all to download it. And now we begin with our lesson number two to understand what is data science. Yes, learners. Are you understanding? And the PT that I'm sharing at my level, I will be sharing it after the session. Right? Whatever material is shared by me beyond the LMS, I will share it on the LMS. Okay, got it everybody? Are you are we good to go or still anybody has any doubt please let me know. All right, got it learners. So in this particular lesson, we will understand what are the basics of data science, how the different data science processes are steps are carried out, what are the different uh you know uh packages used for uh data for data science and different types of plots available for visualization. All right, so we begin. So now do we understand the term data science learners in a more better way? So data science is a multi-disciplinary field. Why multi-disciplinary field? Because it does not only involve computer science or programming language but statistics, maths, you know, linguistic, every every field is there and which uses scientific methods. We will don't draw conclusions randomly. that the conclusions are drawn on certain facts and experiments. It's you know we don't randomly accept any facts. So the field that uses scientific methods, processes, algorithms and systems to derive meaningful insights from structured and unstructured data. What do we mean by structured and unstructured data? Structured data refers to your tabular data such as your Excel files, your CSV file, SQL data are all structured data. And what do we mean by unstructured data? Anybody who understands this term? What do we mean by unstructured data? Yeah. Audios, videos, log files, all come under unstructured data. Absolutely correct, man. Absolutely correct. Clear. Is the definition now clear to everybody? What is data science? So using a search engine or making a purchase on the Amazon provides valuable data to the data sciencedriven software systems operating in the background. Data on interactions with online platforms is gathered to understand user preferences and suggest search results or items to buy. So the idea is to give profit to the business to give more meaning and insights to the whole idea. So data science as I told you is a combination of sub subject expertise, scientific methodologies and technology such as mathematical and statistical model scientific tools and method such as Python which can operate not only on Mac even on Windows even on Linux. Different design, different libraries are available and different data processing tools are also available which help in data science such as Tableau and PowerBI are also very very important tools. Even SQL is an important tool. Right. Right. Right. So now if we look at the application of data science which we see all around us for the first uh you know application is in healthcare. So we all wear smart watches right? So our smart watches capable of telling our health. They are able to calculate the BP, the temperature, how many steps we are doing and how many and it tells if you're sitting for long go and take a walk and is it helping out? Are are are the smart watches even you might have heard about uh the story where the smartwatch was able to predict that the person is having a heart attack and then the person was able to rush to the hospital and get itself cured that that's the prediction with the smartwatch had made have you have you heard that story you can check it out on internet so that's the advantage you know sometimes your BP is getting low your temperature is getting higher you stop activity so your smartwatch was capable enough you know because your movement and your body uh symptoms were different so it was able to predict that. So all the data gets collected it gets transferred to the server and can be analyzed and can help you in giving more informed decisions in improving your health in in you know in getting a more diet or more calorie conscious uh person. So that's how it can help in decision making. Getting my point? Is this example getting clear to everybody? Yes or no? Only to Arun. Only to Arun. What about others? Come on learners, you can respond. Okay, that's great. Another example that we see normally and we never realize that this is data science that whenever we are typing into the Google prompt or anywhere you know it keeps on recommending the words whether we want data science in healthcare or a data science in healthcare research paper these kinds of recommendations are there so which make it fast and real time analytics is made possible by modern and advanced infrastructure tool now there is a big change in this kind of technology also Now what is the technology that we generally use? Anybody who can give an insight this there is when we are doing any kind of search in today's uh scenario are we generally typing then what are we doing to then what are we doing to tell me can I call you Krishna not really autofilling no we are generally speaking talking to the machines okay now tell me AI is a general word they yes through voice We are now the machines are capable of understanding what we are talking to them. Isn't it? Come on. Uh you know search this for me, this image for me, that image for me. We are talking. We don't even want to type. Isn't it? Getting my point? Is this point getting clear to everybody or not? And this is again a a part of AI which comes under the domain of NLP. Please try to understand technically this comes under the domain of NLP that is natural language processing. Okay. So of course that's not part of this course. We are just here to understand data science and the journey is a little bigger that you have to understand than machine learning, deep learning, then NLP and then the computer vision etc. Okay. In finance domain again data science has a major role to play. Are you applicable for a loan or not? So when you want to file an application you file like what is your earnings? How many dependents do you uh are there in your family? Are you uh you know do you have a medical insurance or not? What is your civil score? So based on the data analysis, you know, your credit card, credit history, approved amount and risk, a decision can be taken whether you you will be granted loan or not. Clear? So somebody into the banking domain NLP civagami is natural language processing. All right. So I think so now the data science how practically it is impacting our lives that point is getting clear to everybody. Now let's understand the different steps in data science process. First is the problem definition. We need to understand what kind of problem we are looking at. Right? You should be clear with the definition. What like what is the project? What are the people involved in it? Right? Once we are clear that why do we want to do analysis of the data then we move on to data collection. Very important that now data is in different forms. There's variety of data structured and unstructured. Data coming from different sources. There's variety of data. So integration of the data is also important. And the third and the most important thing is integrity of the data or rather not the integrity authenticity of the data. That data needs to be reliable. If you are working fake data, will you ever get correct results? If you are working on fake data, will you ever get uh correct results? You will never ever get the correct results. Right. So we have problem definition, data collection and here we have data cleaning and exploration. Got it? Right. After you do the data collection over here. Here we begin with data science that practically we would work on data cleaning and exploration then move on to feature engineering that the categorical data needs to be converted into um numerical data data binning is important feature scaling is important. So till here we are going to work over here in the data science right this course will cover up to this particular point model building uh you know training are part of the machine learning and the deep learning course. Okay. So model building and training are the part of the machine learning and the deep learning course. Then we go ahead with model evaluation and final deployment. Clear? I hope you are understanding is this journey that this is what all we are trying to cover in this data science journey of ours. Yeah. Difference between data cleaning and feature engineering. Data cleaning is cleaning of the missing values, null values, duplicate values. Feature engineering is transforming the data so that the data can be fed into the model. Cleaning and exploration is only dealing with null values, missing values, duplicate values. But feature engineering is preparing the data so that it can be fed into the model. Got it? I'd say and whereas feature engineering as I told you involves scaling, bin, winserization, uh encoding these are the uh you know concepts that we will understand in feature engineering. So now are we clear? The first step is defining the goal or the question to be addressed through the data analysis forming the foundation or for the subsequent steps. Data collection is gather relevant data sets or information sources necessary to address the defined problem. Data cleaning and exploration pre-process the data by handling missing values, outliers, other inconsistencies and explore the data to gain insights. All right. But feature engineering is transforming the features data set so that it can be put into the model's performance. Model building and training. We need to understand different algorithm analysis so that we can fed into the data and ultimately it is evaluate, optimize and fine-tune the model for peak performance. Python is a preferred language for data science because it's a highlevel language, readable language, interpretable language and moreover it supports multiple packages such as numpy, pandas for data cleaning, exploration and visualization. For visualization we would be covering math plot liab and plotly in detail. Got it? So that is why this course is little advanced on the basic Python that we would straight away start from nump state straight away start from these uh libraries such as numpy pandas uh etc. Got it? As I've been telling you why or what is the biggest advantage of Python? It's an open-source interpreted highle language right that supports object-oriented programming, ease of use, simple syntax, scalability, availability of the wide variety libraries, compatibility with all major operating systems, creation of new data science libraries daily by vast number of online communities because it has an it's an open-source language language. So it has wide variety of community powerful visualization etc. clear. So the different packages that we are going to uh cover in this particular course are numpy. We'll start with numpy. It is a python library for scientific computing. Supports large multi-dimensional arrays, matrices and includes comprehensive mathematical library. Then we will move on to pandas which is efficient storage and manipulation of the structured data. After pandas, we'll move on to sci. It's a scientific python open-source library on top of numpy which is used for implementing the scientific formulas. Stats model is another library which is used to estimating many different statistical models and conducting statistical data exploration. So we have understood statistics is going to be an important part of data science. So there are certain libraries which help in implementing uh doing the statistics part of it. Then we have the scikitlearn which is widely used open-source machine learning library for Python and known for its simplicity also right so scikit we would not be doing in a lot of detail because it's mostly used for machine learning and if you talk about deep learning it is the pietorch and as well as the keras which is mostly used as the frameworks or the library yes but we would be covering in detail the data visualization ation library that is mattplot lib which is used for building static animated and interactive visualization for different graphs such as line plots, scatter plots, bar bar charts, histogram, pie charts etc. Got it Lana's and if we talk about seaborn seaborn is a data visualization library in Python that is built on mattplot. Then we will also cover plotly for creating interactive publication quality graphs and visualization and it is suitable for web based applications also. Clear? Are we getting all the points till here? Any questions? Any doubt? Till here learners just a question. Is it necessary to learn arrays? Where do we apply? Oh, we are going to learn a lot about arrays. Many and it has immensive applications. Okay. All the unstructured data, audios, videos, images all are stored in terms of arrays. Okay. So that's the first thing we'll start with. So nothing to worry. Okay. So let's let's do a quick revision of different graphs types of plots with example. Let's do a quick revision. So the first type of graph is a line plot. What does a graph consist of? A graph consists of x-axis and then yaxis. Right learners? A graph consists of x-axis as well as yaxis. And the line plot what does it do? It is nothing but it is always connected by straight lines often used to visualize trends or relationship between two variables over the time. So what are the you know graphs which show any changes during the time it could be the weather report. It could be stock market. It could be the sales over a month right which help us to take investment decision. Agreed learners? And if I just just to make the appearance better of the line line plot, I can add markers to it specific data points where there is a dip in the sales or when there is a rise in the sales or any kind of temperature or it could be any other entity. So a markup plot displays data even it could be your marks of a subject right result of any uh you know college school university right so a markup plot displays data points with markers useful for scatter plots and visualizing individual data observations. So a markup plot is used to display individual data points on the map such as marking specific locations for the survey. Great SA great. Then we also have this scatter plot. Scatter plot is nothing it is a collection of points plotted on two axis horizontal as well as vertical. Right? So over here both the x and the yaxis are numerical data. For example, if I want to find out the relationship between height and weight of a person uh you know the sales and the price of petrol. So where both the quantities are numerical then we use the scatter plot. So scatter plot analyzes the relationship between two variables like comparing the height and weight in the population. Clear? Are we understanding the different plots? And another important graph is the area plot which is also known as the stack plot. Please try to understand the area plot is also known as the stack plot plot because it is built on one on top of each other. An area plot represents data with shaded areas useful for showing cumulative totals or proportions over the time. So an area plot visualizes the cumulative data with changes over the time such as tracking total sales. So for example, if I want to keep track of the sales of a company, but you know this represents the first quarter sale and I think so we all are familiar with these graphs. We all understand these graphs. Let's say it's a very common popular graph which is known as the bar graph which is used for categorical data which shows the rectangular rectangular graphs that show vertical and horizontal data comparisons based on the other axis usually xaxis right so this is the data 1 2 3 4 5 it could be the sales the prices for comparisons right and grid plots where we uh divide the uh you know the map or the graph into different uh uh horizontal and vertical lines which help in assisting chart viewers in determining what value an unlabelled data point represents. So grid plots understand each value of the data points in detail and it also helps in enable sidebyside comparison of the multiple plots enhancing visual analysis. It gives the distribution of the data and this is for numerical data. Please try to understand this graph is used for numerical continuous data. Whereas if we talk about the bar plot, it is used for categorical data. Here is the technical difference. They both are not the same. All right learners, please try to understand things more technically. That bar charts are used for categorical data and whereas histograms are used for numerical continuous data and of course they give the distribution of the data set by dividing the values into bins. What are bins? Dividing it into these small groups, right? And representing the frequency of each bin with the bars. So histograms visualize the distribution of the numerical data like income levels or the exam scores and they help in finding the characteristics of the data the underlying patterns and guiding decision making process. And last but not the least I think so this is the graph which almost everybody uses that's the pie chart where you know the whole entity is taken as and parts of that entity are taken as its fraction or the pi right we all understand this graph which is a circular graph in which data are plotted when components and se as a segments of pi and the idea is it shows the proportions of the whole like the market shares or survey responses Got it learners? So finally a quick recap of what we have done in this lesson number two that data science involves the analysis and interpretation of the data to generate actionable insights. Now having understood the definition right and now after this understanding we will start working on the numpy library that is numerical python which is nothing but an open-source library predominantly used when working with arrays. Seaborn is a data visualization library in Python that is built on top of Mattplot lib and Python is a preferred programming language for data science projects across the industry. No. So, Numpy is the advanced numerical Python library available, right? And as we understand that Python is totally an open-source library. So, it has this open-source community which helps in moving with the numpy. So, I'll just send this link to you. Please check out this numpy link. Since Python is an open-source community, all the data things are available that NumPy helps us to create powerful n-dimensional arrays, numerical computing tools, open-source, interoperable, performance, easy to use. All these things are the practical use of NumPy library. Here you can run the code also and it can be used for pandas, stats models, signal processing, image processing, graphs, network. It has enormous uses. Got it. Mani I think so that was your question. Sagnik, Sivagami, Lakshi, are you all getting this point? Right. So this library has lots of uses. Now let's start exploring this library. Okay. So fundamentals of numpy. This is the link which is there. So numpy is a numerical python package which is free and it is an opensource library that is mostly used for mathematical operations in scientific and engineering applications. What are the other advantages? It is a Python library used for working with arrays. It consists of multi-dimensional array of objects and collection of functions for manipulating them. It conducts the mathematical and logical operations on array. So it's a very very powerful library which helps us to perform different mathematical operations on array. Another important point to be noted over here is the array object in the numpy is called the n dimension array. So numpy is a numeric python. It is a package for computation for creating homogeneous n dimensional array. Can you all tell me what does the term homogeneous mean? Can you tell me what does the term So the arrays are that they consist of same data type. What does the term homogeneous mean? That the arrays are of the same data type. We will understand it practically. So what are the different properties of arrays? Arrays are mutable. What does the term mutable mean in Python? We understand that arrays are mutable. What does the term mutable that I that can be changed or modifiable? Absolutely correct. Let's say by the users. So it's modifiable. Second, it is homogeneous. What does the term homogeneous mean? They are all of the same type. Absolutely correct. can be accessed using integer position as we can access different elements using list and tpple. Similarly, it can be accessed using integer position that is indexing. What are the two types? Yeah, what are the two types of indexing available in Python? Great man. What are the two types of indexing available in Python? Positive as well as negative indexing. Very good Arun. Arrays always deal with numeric data. And third, last but not the least, it has high performance in calculation. That's the beauty of numpy arrays. So now let's understand how are arrays different from the list. So the first and the foremost point is that list consists of heterogeneous data. That means it can consist of a numerical value, integer value, float value, boolean value as well as string. And it also so stores the pointer to the data location. Please try to understand. It also stores the pointer to the data location. And when I talk about numpy arrays, it only stores the data directly and that two in continuous memory locations one after the other. Therefore, accessing the elements are faster and easier. Is this point getting clear to And can anybody tell me which memory address are we talking about? What not memory address rather which memory uh uh does the data or the variable get stored? Excellent Many, you were very quick on that. Yes, it is the RAM, the volatile memory which stores all the variables. So how do we go about creating arrays? First and foremost, it is not part of the basic py Python. Therefore, we need to import the library. import numpy as np. So we need to import the library. Second most important thing is np do. array is the name of the function to create the array. So if I pass the value as 0 1 2 3 the array will get created. And how do I how do I know whether it's an array or not? By using the type function I can know what is the type of a and it belongs to class numpy dot nd array. ND array stands for nd dimensional array. Then the different attributes of the array are a dot endm. A dot endm gives the dimension of the array. It belongs to one dimension. Shape gives the number of rows and columns. So this data consists of four columns. So therefore four value. And the number of items is four. Therefore, the value of the length is also four. Are you all getting this code or do I need to repeat this? Yes, learners, are you all getting this code or do I need to repeat it? Everybody got this? How do we create arrays that we import the you know the library? Then np. array is the function to create array. And these are the function which give me the type, the dimension, the shape and the length of the array. Got it? Now right now do we understand the range function in Python? Do we understand the range function in Python? Yes, when I give range 40 that by default it will start from zero right and it will go till 39. It will start from zero and go till 39 right. So we have np dot arange function right and by using the shape attribute I am now changing these 40 elements into twodimensional array with rows equal to five five rows and columns equal to 8 it will consist of five rows and eight columns Can I change its shape to eight rows and five columns? Yes. Why? Because the number of elements is the same. 8 into 5 is 40. Can I change it to four rows and 10 columns or vice versa? Yes. Can I change it to four rows and four columns? No. Because it consists of 16 elements. Can I change it to 2 into 2 into 10? Can I change it into that into three dimension? So that's the beauty of arrays. So what do we mean by onedimensional array? Onedimensional array consists of only one axis. Therefore the shape consist of four since it consists of 0 1 2 3 columns and onedimensional arrays are also known as vectors. Getting my point? Onedimensional arrays are also known as vectors. Two-dimensional array it consists of axis 0 and axis one. Axis 0 refers to the number of rows. It consists of two rows. Axis one means columns. It consists of three columns. Therefore, the shape becomes 2, 3. And two-dimension arrays are no is known as a matrix. Right? 3D array you can consider it as a slice of bread. When you buy bread they are slices. So you can consider axis zero as the slices of the bread. Uh so there are four slices. Each slice consists of three rows and two columns. Right? So the 3D array the axis and shape are strongly connected with each other. Whatever is the dimension that many axis you will have and that will define the shape of the array. Clear? And if we looked at the different attributes of an array, it consists of its shape. Shapes gives me the number of dimensions of rows and column. Endm gives me the dimension of the array. D type gives me the data type of individual array. All right. in 32 32 represents the number of bits required by the operating system to store. So 32 refers to the number of bits and if we divide it by 8 why 8? What is one bite equal to? What is one by equal to? What is one bite equal to? So nobody knows in this batch 8 bits. So when I divide this uh 32 bits by 8 bits I get the answer of item size in terms of four bytes. Got it? Now are you understanding all the attributes learners? So have we understood the advantages of numpy that it provides an array object that is faster than the traditional Python list. It provides supporting functions. Arrays are frequently used in data science and they are stored in continuous place in the memory unlike list. We are understanding all these points. So now you can run this first code. Whatever code you feel is missing, you can ask me. I can give you the code or you can type in. Right? First try to find out the version of your numpy file. Tell me learners run this code and tell me the version of your numpy file quickly. Sivagami sagnik Tamil Daadi tell me the version of the numpy. None of you are able to run the code. Good man. Many has 2.13. Kamill has 1.2 4.4. Even older version than mine. They've installed Anaconda pretty 2.35. Aron that's pretty latest. It's double underscore. Yes, it's double underscore. Okay. So, what is the first and foremost thing? So, everybody is able to run the code. Nobody's getting any error. Everybody is able to run the code. Nobody is getting error. Great. Okay. So, what is the first thing that we are supposed to do? We are supposed to first import the numpy library. Do we see the code? Right. And then we are printing this array. The type of array it belongs to is n dimensional right uh sorry the type it belongs to class numpy nd array the dimension is one the shape is 3a the length is three and the data type is that they all are of integer data type are you able to run this code everybody and if I create a array of a string. Come on. This code you this code is not there in your file right. Copy it. So if I create an array of name right this also belongs to numpy. Nd array the dimension is zero. Why? Because there is only one um word associated with it. Right? So the dimension is only one uh sorry zero and the shape is again zero and the data type is uni code 11. So the string data type is stored as uni code 11 learners are you there with me? So if I have a array right now what do you see the let me remove this and now 1 2 and there is this one float type all others are integers. So what do you see the output it makes the integers as floating point. Why? Because arrays are homogeneous data structure. They contain the same data type. Right? So all the elements now become of float 64 type. Got it learners? All the elements become of float 64 type. And the moment I add a string onto it, you you see the output. Now all the elements have become of the string data type and now the data type is uni code 32. The elements over here is uni codes for these two. Got it? And now let's start creating numpy arrays and multi-dimensional array. So the array with zero dimension that is only with one value is known as a scalar value. Please try to understand this highlighted code. An array with one value is known as the scalar. So any one value I have put it as an integer floating value. So the zerodimensional array is known as scalar. onedimensional array with one axis. Can you go back to the data type? Which data type? The previous one, head. Yeah. Yeah. What is the confusion? The data type is uni code 32. And you might say that ma'am here it is uni code 11, right? So it automatically decide seems it's becoming a list instead of an array. No, no, this is array only. It it's like an array. But here it's a it's a uh it's a scalar value. Here it is a scalar value. Got it? Here the list is converted to an array. Now clear list are list homogeneous or heterogeneous data type list and tuple are homogeneous or heterogeneous? Heterogeneous but arrays are homogeneous. Okay, that's also one big point of difference. I have a question. Does one data type supersede the Yes. Yes. The floats are super superseding the integers and the strings are superseding the integer values. Do you see this? So if I define one of them as string, all of them are becoming string. Yes. Yes. Yes, good question, man. But the irony is, you know, but the irony is that being a numerical Python library, it gives more preference to the string as compared to the numerical value. Isn't it? Isn't it ironical? Since it is a numerical Python library, it it should have more advantage give more advantage to numerical value and nothing else. Right? Very very ironical. Yeah, that's that's what you know that that's the analysis which comes up. Clear now. Clear. H say is your uh issue resolved? Yeah. So here we have this scalar value. Then we have onedimensional array. One dimensional array is known as a vector. Twodimensional array is known as a matrix. Now a quick trick how to understand one-dimensional twodimensional that if you have these double squares that means it is two-dimensional and if you have these triple square brackets it's threedimensional one then one and if there is nothing then zero dimensional. Are you understanding this quick trip and trick Right? Are you able to see the output everybody? That if this is the output, it's a zero dimensional array. This is onedimensional. This is two rows and three column. Now look at threedimensional array. Since it consists of two matrices, the first axis becomes two. Second, each matrix consist of two rows. Therefore, the next dimension is also two. And each matrix consists of three columns. Therefore, what is the shape of threedimensional array? 2a 2a 3. Got it? That is why the shape is 2a 2a 3. That is it consists of two matrices with two rows and three columns. Sivagami soda tlapati. Are you all able to do it? Odkumar Deepak Vad Kamill. So are you feeling the need of u you know the basics is missing upadaya the beginner level are you feeling difficulty in the code the learners who were saying that they don't know the basics of python are you able to cope up with this code or not but who all is facing difficulty sagnik says to learn need to learn the basics are you finding making it difficult to interpret the code Sagnik. So my suggestion to you is rather than you know uh having struggling in this particular session it's better you take up the Python refresher course uh that will get make the concepts of Python more clear and then you can get back to this course. I would suggest that sadik the learners who are facing difficulty in loading uh understanding this code then I don't think so at this point going forward because the code is going to get more complicated with uh numpy pandas then visualization so I request to uh to all those learners who don't know the basic of python should go through the basics and then get back to the data science course Right? Or sagnic you would have to put in more effort to understand the basics. Got it? Anybody else who's facing difficulty who's finding this course tough? Tough very tough irrelevant. A quick feedback on that. Sudhanya Shivagami Tulapati U Deepak. Would we learn the applications of array with examples? Yes, we are learning the arrays with examples. Yes, because when we not here in this but as we move to linear algebra and when we understand uh deep learning then of course arrays will come into the picture. Rather tensors are all n dimensional array man it's not an error. It says that the memory has been exceeded. So that's the disadvantage that it is the list is unable to store the huge amount of elements. Have you got this? Have you understood this point practically? And secondly, secondly, percentage time. This is a magical function. the percentage time which helps us to calculate the time required to multiply these one cor elements with two 10 times. So this loop works 10 time for i in a range 10 and array into two right. So now when I run this I get this output in terms of 234 milliseconds. And if I want to see the output same output for the list what is the time it takes you see it takes much more longer than what arrays were taking. It's still not giving me the output. Let's see. So it takes 13.2 seconds. Do you see the huge amount of difference? Yes. Learners, are you able to see the So, which one is better? Arrays or list? Arrays. Arrays are much faster in communication. That is why arrays have been built. This library of homogeneous data points has been built. Clear? Everybody has got this point learners. Now let's move on to our next live uh you know uh file. Yeah that is 3.02A. What we are talking about over here is that we see that the time taken by performing the same function on arrays and list list take longer time to do the same operation as performed by arrays. So definitely arrays are much more faster and efficient than list. Now So what makes arrays faster than the list? You tell me man my told you the reason. What is the reason? I told you the reason. How are arrays stored in memory and how are list stored in memory? The memory allocation part. That numpy arrays are stored in one continuous place in the memory. Whereas list stores the pointer to the data. Now got it. everybody? Now I want everybody to be ready with this 3.02. So what are the different attributes of numpy array? The first is the endm dimension which gives the dimension of the array, the shape, the size, the data type as well as the item size of the array. Yeah. So, have you been able to load my file everybody? So, what is the first thing required to create numpy arrays? What is the first thing that is required to create these numpy arrays? We import numpy as np. Yeah. The library. Can you tell me the dimension of this array? Can you tell me the dimension of this array? Why? There are only how many square brackets are there? Two. So it has two dimension right? So the dimension is two. The shape is 2a 3 because it consists of two rows and three column. Size it consists of six elements. The data type is integer type. The item size is divided by 8 and array dot data gives me the memory location. If I look the look at the output. Yes learners are you there with me? So this is the array this is the type this is the dimension shape size array stores integer 32 divided by 8. So the size of one array element in bytes. So each element takes four bytes of memory and arrays data is at memory location. This no good to go. So now if this is my again twodimensional array size of one array in elements is eight for me. Yes man. Then this must be in 64 for you. This must be 64 for you. 64 divided by 8 is eight only eight bytes. Got it? So it depends upon your uh system. It depends on your operating system. Okay man. It's it's system dependent. Okay learners. Okay. So moving ahead over here. So dimension is completely based on the axis. Axis 0. What does axis 0 represent in a matrix? Tell me quickly what does axis 0 represent in a matrix or in a two-dimensional arno? No. It it represents the columns or sorry the rows. Axis 0 represents the rows. Okay. Many axis 0 represents the rows and axis equal to one represents the column. Very good. Shape gives me the size of the array. The output data type is always tuple. So what is the shape? It consists of two rows and three column. This consists of three rows and two columns. Got it? When I talk about the size, it is the total number of elements in the array. It is equal to actually the total multiply it 2 into 3 6. So here the shape is how many elements? 3a 4. If you look at this particular matrix consist of three rows and four columns. So what will be the size? 3 into 4 12. Yes. Learners, are you getting this? Moving on to the data type. It shows the data type of elements in the array. numpy in 32, numpy float 64 and numpy in 16 and numpy in 64 also. It all depends on the operating system used. Uh it completely depends on the operating system that how many bits are being used to represent that particular element. Yes, learners. Is this point getting clear to everybody? And of course item size. Item size shows the length of one array element in bytes. So float 64 64 divided by 8 because one bite is equal to 8 it is equal to 8 bytes. Complex 32 divided by 8 it is equal to four bytes. And numpy dotarray. data is an attribute offering direct access to the raw memory of the numpy array. Clear learns. Any questions? Any doubt till here? Dabati, Sagnik, Kamill, Sivagami, Nansi, Hets, Kamill, Badya, Dika. No, I hope these PPS are helping out in understanding the concepts better. The PPS that I prepared for you all, are they useful? Many. Yeah. So now let's understand some basic functions of the arrays. Right. The first function is transpose. What does transpose do? It interchanges the rows and column. No, they are not from the LMS H key. I will be uploading it after today's session. Okay. I'll be uploading it on the LMS after today's session. Okay? So, nothing to worry. There's nothing to lose, right? So, transpose is interchanging the rows and column. So, if you look at the shape of this array, it is 2, 3. And when I use the transpose it becomes 3 comma 2 right it becomes 3 comma 2. Clear? Right. Another function is flatten. It converts any uh array any n dimensional array to onedimensional array. There are two ways to perform this flatten function. One is in row major order or we can give order is equal to C with the parameter equal to C. Now what does row major means that first the elements of the rows are flattened and then the second uh you know elements of the second row like 1 2 and 3 4 column major order. This is like one uh elements of the first column and then the elements of the second column are fatten flatten using the parameter order is equal to f. Again when I say a do.flatten by default by default it is row major that is order equal to c that first the elements of the first row then the second row and so on. But if I have passed explicitly order equal to f then the elements of the columns get flattened. No order code for row major. Yeah by default it is row major only. But if you want to pass the parameter you can pass order is equal to c. Otherwise no no need. B is equal to A flattened. By default this is a row major. Yes. And then we have the reshape function where you can reshape the data from one dimension to one dimension to higher dimensions and from higher dimension to lower dimension. No restriction on that. So if you look at the shape of this data, how many rows are there? Six rows are there and one column. Six comma 1 can be reshaped to 2a 3 can also be reshaped to 3a 2. Agreed? Similarly, original shape is 3A 4. It can be reshaped to 6A 2, 2A 6, 4A 3, 1A 12, 12A 1. Right? See the multiplication factor has to remain the same. But I cannot reshape to 4. Right? I cannot do that. But can I reshape to 2a 3a 2? Can I reshape to 2a 3a 2? Right now I want to tell you one major difference. the difference between copy and view. Please look here learners, please try to understand the code. Right? We have created the array A over here and I have created a copy of A and I have substituted onto X. Now if I make changes in my original array, will there be changes in my X? No. Why? because X points to different memory location and A points to different memory location. So if I make changes in my A, it does not get reflected in four in uh X getting my point. But if I create a view of this B array, whatever changes I make in B that are reflected in my X because why? Because B and X are now pointing to the same memory location. So now let's understand these concepts practically. So getting back to 3.02 right? So first is the reshape function which helps us to create a new shape of the current elements of the array provided they are have to be of the same uh you know factors or rather the same elements. So if I have created this onedimensional array of 12 elements this is how I get the output. Learners are you there with me? Yes learners are you there with me? Now can I change this dimension of one dimensional array to 12A 1? Is it possible? Can I change it to 12A 1? Yes. Can I reshape it to 3A 4? Yes. Can I reshape it to 2a 6? Very good. And can I reshape it to three dimension with 3A 2A 2? 3A 2 means the number of slices or matrices. Two represents to each row and other two represents to the column. So these are my three matrices with two rows and two column. This is my three matrices with two rows and two column. Got it? Can I reshape it to further dimension that is 2a 3a 2a 1 1. So with reshape you can change it to lower dimension to higher dimension, higher dimensions to lower dimension. Agreed? Yes. Learners, are you there with me? Great. right and this if I give reshape equal to minus1 it will automatically flatten it back to one dimension okay now coming on to flatten function these are the parameters arguments that can be passed by default is C which is means the row major style as we have seen it. F stands for column major order. A and K are very less used because that was used in very older version of um Python rather used in Pascal and so A means to flatten array elements in column major order and if the photon contiguous and in memory a row major otherwise K means to flatten array elements in order two elements laid out in the memory. So as we have seen that a and b a is my array b is a dot flatten by default it will do it in row major order and this flatten is equal to order f which is going to be in column y so kamill now you want a break you're just about to end the session in the next 40 50 minutes till you want a breakl is asking for a break learners do you want a break or no maybe a 5 minutes break is helpful. I can give you that. Okay. So let's get back to the discussion. So this is my flatten. So by default we understand it is in row major order. First row and then the second row. And when I give order F, it is in column major order. This is clear. But the important point to be noted over here is that B is a copy. That means it is a new variable created in the memory. So what are we need to understand that flatten creates a copy of the system not a view of the array. What does that mean? That if you change elements in array B the elements in array A are also not changed. Clear? Is the difference clear? So flatten creates a copy of it. And how can we check? If B is equal to 10, we have make changes in B that is reflected in B but not in A. Clear? Are you all there with me? Everybody is getting this point. Yes, learners. Right now moving on to flatten function. It returns the copy of array flatten into onedimensional array. So flatten will always always create any n-dimensional array to one-dimensional array. So this is a threedimensional array. And when I do an array flatten dot order f, what does order f mean? column major order. So first and then seven then two and then sorry then four and then 10 then 2 and then 8 then 5 and 11. Getting my point? Everybody is able to run the code, right? So this is how we get this one D a array threedimensional flattened. So if I do it by default it is going to give me 1 2 3 4 5 right and when I run the transpose transpose interchanges the rows and column. So if this is my twodimensional array with two rows and three column it gets changed into three rows and two columns. clear learners. So if this is my array and now I reshape it into threedimensional array with four rows, three row, four slices or four matrices, three rows and one column. Yeah, I can specify the dimensions of the transpose. Yes, I can. Let's see for the first example. If you look at over here, this is my first array. And when I take the transpose of it, the 2a 3 changes to 3a 2. This point is Okay. Now look look at another example here. What am I doing here? I am creating a one-dimensional array and reshaping into a threedimensional one with the shape four matrices, three rows and one column. That point is also I've created a three-dimensional array. Now when I run the transpose of it, so it changes the 4 comma 3 comma 1. Now do you see the transpose? This has got interchanged with this and this has got interchange. So now I have one matrix with three rows and four columns. Got it? Mani, Devika, Kamill, Odla and as you can see every file has a assisted practice associated with it. Please look here learners. Please try to practice this assisted practice here like over here this is the temperature uh you know list given over here array and once we convert it into an array you can find out its dimension shape size the data type the item size. So with my files uh hetski the advantage is you get the solutions of these assisted practice also. So, will you all be able to do it the assisted practice? Please do that so that you can understand more and understand it in a much much better way. Dura is ready. Hitski is ready. Quick quick response learners. Quick quick. Karthik Keruba, Laksha, Madhuberta, Darun, Ud, Sudana, Sudana. great. Now look at over here. What are we doing? This was the question that I asked you in MCQ. You remember this question? This was the question that I had asked in MCQ. So what were we doing? that list one is equal to 1 2 3 and when I do an asterisk two it is not multiplying each it is repeating the list agreed learners are you getting this point but if I want to multiply two with each of the elements in the list how do I go about it if I want to multiply each of the elements. How do I go about it? I have to explicitly run the for loop. Yes, absolutely correct, man. We I have to explicitly run the for loop. So, what am I doing over here? I have first created an empty list for I in list one and I keep on appending each term by multiplying it by two into the list two and therefore that's how I get the element 2 4 and six I mean multiply it by two got it is this point getting clear to everybody Ready? Other way is do we understand this kind of notation? Anybody who understands this notation, what is this concept known as? Very good man. Only Manzi understands in this batch. Nobody else. This is known as list comprehension. It saves on lot of code and lot of extra memory space right I don't need to define another list empty list three I into two every element would get multiplied by two for every element in I just by writing this one single line I am able to multiply each of the elements now clear right so now do you see the difference. So now if I have to multiply or perform any arithmetic operations on the list, I have to explicitly run the for loop. But look at the beauty of array. When I just write a into any for loop directly I get the output or if I'm doing addition of the two I directly get the output. So isn't it giving me fast mathematical results? Right? And how and why is it giving me this is because it works on two main concepts. Please very very important point of numpy library that we are about to understand. It runs on two main important concepts that arithmetic calculation works on the concept of broadcasting. Please try to understand learners. The first point is broadcasting that is making the arrays of the same same shape. Broadcasting is the process that makes the arrays of the same shape. And second concept is vectorzation that is using for loop for element byelement operation. Broadcasting is making the arrays of the same shape and vectorization is using for loops for element byelement operation. Now again to make the concept clear, if it was a list, we have to explicitly run the for loop where each element gets multiplied to it and then we get the output. But in arrays, how do we go about it? I'm sorry for the pen. Again, I don't know on this slide, it always gives me this issue. I've seen that. Anyways, so for for arrays, how does it work? First, the broadcasting would happen. That is the arrays will become of the same shape, right? And then automatically at back end element by element multiplication will happen to get the output. To understand it even better, let's understand it with this example. I want everybody to be concentrated here. What is the shape of this array? What is the shape of this array? Shape I'm not asking the dimension. Hudsky, it's 4, 3. It consist of four rows and three column. What is the shape of this array? The second array. Many hats up. What is the shape of the second array? Again it is 4a 3. Why 5a 3? Sagnic 1 2 3 4. How come? Five. Right. So both the arrays are of the same shape. Yes. Both the arrays are of the same shape. That means no broading is required and element byelement operation is done. element byelement operation is done and then we get the output. So it is not necessary that every time the broadcasting will happen but definitely yes vectorization implicit for loop will run to perform the mathematical operations. Now look at the second example over here. What is the shape of this array? It's 4a 3 and the other one is 1a 3. Are they of the same shape? No. But the columns are matching. Do you see the columns are matching? And can I expand this one to four rows? Yes, I can broadcast one to four rows. And therefore now the two arrays become of the same shape. Sorry. And therefore vectorization or arithmetic addition happens. Just give me a minute. Something slow. My net is working slow. So this is four rows and one column. And what is the shape? One and three. Now you would say ma'am nothing is matching. Please try to understand if the if one of the shape or the axis is not equal to one the others have to match. But in this case it has four rows and one column. So one of them is one. Can I expand it to four rows? Yes. Again over here it has one column and three columns. Can I expand it to three columns? Yes. So broadcasting happens in both the arrays to make them of the same shape and size and therefore the arithmetic operations happen. So the important point to be noted over here is that it is not necessary that broadcasting will happen and it is not necessary that it will happen for both the arrays. It all depends on the dimensions. let's understand it with more examples. Please look here learners. Okay. So first of all we have NPA range with element zero two. The shape is zero oh sorry one row and three columns. This is a scalar value. So it again gets expanded to one rows and three columns and then the vectorzation element byelement addition happens. Are you all getting this point? How on dimensional and scalar values are getting added? Then np do.1's this is 3a 3 that means we have created a matrix of the shape 3a 3 np do once create all the values once and what is np a range this is 1a 3 now since three and three are if this would have been four it will not broadcast it will give you an error since three and three are matching and this one can be expanded to therefore So both the arrays become of the same shape and we get the output. So over here three rows and one column. Now do you see this? This gets expanded to three rows uh sorry three columns and this gets expanded to three rows to make them of the same shape and therefore happens. You have you got this? This is the most important concept of numpy library that why are arithmetic calculations fast? Because of the concept of broadcasting and vectorization. Clear learners. This one. This one. Yeah, np do. Means that we are creating an array of three rows and three column with all values equal to 1. That point is clear. N a range means that it will start from zero go till two and the shape is 1, 3. Now since the columns are matching over here, can I expand this one row to three rows? Yes. So it gets expanded to three rows. So now both the shapes are 3a 3 and I get the output better man any doubt there we come but where broadcasting will not happen where the columns will not match or if they are not one suppose if one is 1 comma 3 and the other one is 1 comma 4, right? Then if these two are not matching, then the broadcasting will not happen. 4, three, the prior one. This one. This one. Yeah. What is the issue? Tell me. The concept is that if the arrays are not of the same size, then broad will happen. Broadcasting will only happen if one of the element is one or the other two dimensions are matching. One of the element has to be one to be broadcasted. the element is one. Like this is one. It gets broadcasted. This is one row. It gets broadcasted. This is one column. It gets broadcasted. This is one comma 3. So one row gets broadcasted. If any one of them is one np a range five this is a scala value so five gets broadcasted into uh 1 comma 3 now clear this is if you see this is 1 comma 1 so it broadcasts into 1a 3 now clear hatsky it is not necessary that the broadcasting will always happen If the two arrays are of already of the same shape then vectorization will always happen. It is the vectorzation the implicit for loop will always run. Now better. So now if you look at this particular example is broadcasting happening over here. Tell me a getting multiplied by 2 or a + 2 is broadcasting happening over here. Yes. Because it will make it of the same shape. and then multiplication or addition will happen. So vectorization will definitely happen. Broadcast depends on the shape and size of the arrays. Clear? So the different arithmetic operations that can be performed in numpy arrays are np dot add using the plus symbol, np.tubract using the minus symbol, np.gative negative using this negative unary sign NP dot multiplied using aster NP do. Divide. What is the difference between this and this symbol? Tell me quickly. Single division and you know multiple division also known as flow division. What is the difference? This gives me the quotient. Nobody remembers. This gives me the right that is five. What is 5 / 2? It is 2.5 by single division. But if it is flow division, it will only give me two. And how do I get the remainder? using the mod right if I do 5 mod 2 the answer will be answer will be one the remainder modulus or the remainder clear is the difference getting clear between the operators that's part of the basic python so when I do an array with this shape and B. Now tell me is broadcasting happening over here? So one-dimensional array with zero dimensional array If I am adding on one dimensional array with zero dimensional array is broadcasting happening this is a vector this is a scalar. Yes. Is broadcasting happening? Quick response learners. Yes or no? Yes, it will become 10 10 10 and then element byelement addition. Agreed? You can use np dot add or a plus sign also. Let's understand subtraction. Yes, we are using np do.ts subtract or a minus b. Is broadcasting happening over here? What is the sha shape of these two? This is 2a 3. This is 2a 3. Is broadcasting happening over here? No, no need. But vectorization is element by element vectorzation happening 30 - 10, 40 - 20, 60 - 30. Yes, good. Hatsky, good. And see Kamill. Now what about here is vectorization happening over here? Sorry broadcasting happening for A and B in multiplication is broadcasting happening over here for A and B? No. Because again they are of the same shape. Vectorization definitely yes. What about division? A and B of the same shape and size is broadcasting happening over here. It's no or yes it is…

Transcript truncated. Watch the full video for the complete content.

Get daily recaps from
Simplilearn

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.