Applied Data Science With Python Full Course 2026 [Free] | Python For Data Science | Simplilearn
Chapters22
Overview of the data science course goals, practical workflow, and how Python will be used to clean, analyze, visualize, and model data.
A practical, start-to-finish intro to data science with Python, covering NumPy, Pandas, visualization, and the data science workflow from problem definition to modeling basics.
Summary
Simplilearn’s Tim (host) delivers a dense, hands-on tour of applied data science with Python. He outlines why data needs cleaning, visualization, and modeling to derive real insights, then anchors the course in practical tools like NumPy, Pandas, and Py plotting libraries. You’ll learn how to build and manipulate one- and two-dimensional data structures (Series and DataFrame), perform vectorized operations, and use built-in statistical helpers (mean, median, std, describe) to summarize data. The session also dives into time-series basics (datetime, resampling, date ranges) and touches on feature engineering (one-hot encoding with get_dummies, label encoding) and the difference between inputs vs outputs for modeling. Throughout, there are actionable demonstrations: from loading CSV data into a DataFrame, inspecting with info(), head(), and shape, to sorting, filtering, and dropping columns. Finally, the video underscores Python’s ecosystem (NumPy, Pandas, scikit-learn later, Matplotlib/Seaborn/Plotly) as the industry-standard toolkit for data analysis and ML, with practical notes on notebooks vs. VS Code, and a peek at future ML training via the next course in the series.
Key Takeaways
- NumPy arrays enable fast, elementwise math with built-in stats (mean, median, std, var) and support multi-dimensional shapes (1D lists, 2D matrices, 3D tensors).
- Pandas Series are 1D labeled arrays (like a spreadsheet column) and DataFrames are 2D labeled tables; both sit atop NumPy and unlock convenient methods (head, tail, describe, info, unique, shape).
- One-hot encoding via pandas get_dummies converts categorical text data into numerical columns, enabling models to utilize categorical information efficiently.
- Date handling with Pandas (to_datetime, dt accessor, resample) enables powerful time-series work, including creating date ranges and shifting with time deltas.
- DataFrame operations such as sort_values, set_index, drop, and vectorized arithmetic are designed for speed and readability; avoid row-by-row loops when possible in favor of apply or direct vectorized operations.
- Data loading and inspection (pd.read_csv, df.info(), df.describe()) are foundational: you should routinely inspect the data before modeling and use head() to sanity-check new datasets.
- Visualization basics with DataFrame.plot provide quick insights and are built on Matplotlib; more advanced plotting will be covered with Matplotlib, Seaborn, and Plotly in later lessons.
Who Is This For?
Essential viewing for aspiring data scientists and Python developers who want a concrete, tool-focused foundation in data cleaning, exploration, and preparation for ML. It’s especially helpful for those transitioning from raw Python scripts to a notebook-driven data science workflow.
Notable Quotes
"Data help businesses make smarter decisions every single day."
—Opening statement highlighting why data science matters in real-world applications.
"Python will be our friend here when we're doing data science."
—Emphasizes Python as the primary language for data science tasks.
"One-hot encoding is the name of this process of turning categorical data into numerical data."
—Defines a core preprocessing technique for modeling.
"A series is a one-dimensional array with an index; a DataFrame is a two-dimensional table with rows and columns."
—Clarifies the foundational data structures in Pandas.
"Resampling changes the frequency of time-series data (e.g., hourly to daily) by aggregating with functions like sum or mean."
—Explains a key time-series technique that will be essential later.
Questions This Video Answers
- What is the difference between a Pandas Series and a DataFrame?
- How do you perform one-hot encoding in Pandas?
- What is the purpose of pd.read_csv in data analysis workflows?
- How does Pandas handle time-series data and what is resampling?
- What’s the role of get_dummies in preparing data for ML models?
PythonNumPyPandasMatplotlibSeabornPlotlyDataFrameSeriesto_datetimeresample”“get_dummies”“one-hot-encoding”,
Full Transcript
Hey everyone, welcome to this course on applied data science with Python. Today data is everywhere from shopping apps and social media to healthcare, finance, even entertainment. Data help businesses make smarter decisions every single day. But raw data alone is not enough. What really matters is knowing how to clean it, analyze it, visualize it, and use it to solve real world problems. And that is exactly what this course is all about. This course is designed to help you move beyond the basic Python and start using it in practical data science workflow. You will learn how to work with data sets, create meaningful visualizations, build machine learning models, process text data, and understand network relationships using Python tools that are widely used in the industry.
Also, if you are interested in mastering the future of technology, do not forget to check out the professional certificate course in generative AI and machine learning, which is the perfect choice for you. This is offered in collaboration with the ENIT Academy and IT Conport and it's an 11 month live interactive program providing you hands-on expertise in cutting edge areas like generative AI, machine learning, tools like chart, GPT, DL2 and even hugging face. You will be gaining practical experience to 15 plus projects, integrated labs, live master classes delivered by esteemed ID Kpool faculty. Alongside you'll earn prestigious certificate from IT Kpool.
You'll also be receiving official Microsoft badges for Azure AI courses and career support through Simpler's job assist program. So what are you waiting for? Hurry up and enroll now. The course link is mentioned below. Now before we get started, here's a quick quiz question for you. Which Python library is commonly used for data cleaning and manipulation? Your options are pandas, seaborn, network x, nltk. Let me know your answers in the comment section below. Intro to data science. So, we're going to get uh just some background on what data science is, the what what it is as a field, um and then some of the um again some of the processes, some of the packages that we'll need, uh some of the the tools that we'll need um to work on data science.
So by the end of this lesson, we want to be able to talk about uh data science in general. Just define what it is and what um what we would be doing in data science. Like what is if you're a data scientist, what kind of things are you doing? Um mainly want to focus on the process too like what are the steps to solving problems systematically that we will end up following. Um okay. Um and then we'll take a look at the so I was saying um that we will basically go through the steps to solve problems.
I want to definitely highlight that process. Then we want to look at some of the packages are going to help us do different things in data science like data analysis, data manipulation, data visualization. So what are some common packages in the Python ecosystem that we will use to accomplish data science? Um and then uh lastly uh there's a description of some plots. Uh I'm going to uh skip those because we're going to have a whole uh section on visualization where we'll go through those pots uh in more detail. But kind of briefly, I can I can show what those look like, but we have a whole section dedicated to building those pods.
So, I'm not concerned about going through them right now, but those are at the end of this lesson. So, what is data science? Okay, so what is data science? It is we had to put a definition to it. It is a multidisciplinary field that uses scientific methods, processes, algorithms, systems to d this is the big part of it derive meaningful insights from structured and unstructured data. So that's the that's the big part of it is obviously data is in the name data science. So our goal with data science is to derive meaningful insights from data.
And so that can involve many things. That can involve building models which we will learn how to do later on when we get into machine learning. That can involve building out visualizations. That can involve um doing uh hypothesis testing. That can involve many different things. But we're deriving some type of insight or uh finding out something interesting about our data that we have through different methods. Um, so it's it's kind of I always like to say it's the science of working with data, which is um, you know, kind of a weird way of saying it because that's obviously the name data science, but it truly is the science of of extracting insight from data.
um is is this field and there's a lot involved in it um that blends a lot of different uh disciplines together. It's why we say it's multi-disiplinary because um we kind of bring together multiple uh aspects of math and stats, computer science, domain expertise um to be able to derive those insights. So um there's you know things there there's ideas and concepts borrowed from uh uh science like like doing hypothesis testing. There's um obviously math and stats a lot borrowed from there. Um there's a lot borrowed from visualization and analysis that we that we use in data science but we kind of blend that with technology right with computer science.
So being able to use Python is a big deal. That will be our our language of choice for doing anything data science. It's why we kind of reviewed it in the beginning. So we'll certainly use Python. Um we'll certainly use different tools within Python to process data. Those are data processing tools. Um so we kind of blend those together to form the field of data science. So that those scientific methods along with um tools working with data in from technology like Python blend those together we kind of get data So where do we see data science today?
Um there's a a lot of different um applications and we'll try to describe some of them to you which is uh for example like wearable devices. Uh so think about like Fitbits or Apple watches. They're always crunching data from their sensors, right? So there's some type of biometrics that are captured and sent over uh the internet essentially um to uh basically um allow us to do some some type of uh analysis or some type of derivation of insight from that data. uh and then we can kind of visualize that with some graphs or some meaningful metrics um so that the person wearing that device can make some sort of decision.
So there's some type of insight derived some and usually some type of algorithm being applied to that. Maybe a model's being built, working with the data that's captured from the wearable device, or we're just plotting that data, or we're summarizing it in some way, but making it really useful to the to the end user to make a decision off of. So, we're deriving insights there um from all of the data collected in the the wearable device. So, that's kind of one application. um search engines use data science to uh personalize results or offer recommendations as people type in their their queries.
So um essentially like you have suggestions, right? Like and these could be based on your previous browser history. Um all kinds of data like your cookies or your um what's trending in the in the world or like your region. Um though you probably have noticed this, right? When you're searching on on search engines, you get these suggestions. Those recommendations are kind of powered by data science, right? So data is going into that to derive some insight that hey, this is what we should s should suggest and that's uh it gets surfaced there. One of the things that we'll look into down the road when we get into machine learning, which is the next course after this, will be recommendation systems.
So, how do we actually build those models that do recommendation? But in order to build those, you really need data. Um, uh, so that's where we got to start is working with data to to um get in a position to model off of it. uh finance we see um usage of of data for instance um there could be like some type of model that's built to uh determine um a loan decision. So there could be the application and then there can be additional data that's gathered um via the details in the application um and then that all that data can be brought together and kind of analyzed through the use of a model that could predict yes we should give this loan or no we should deny this loan.
Um so again you're deriving some type of insight from that data um to make a decision right to make some sort of loan decision in finance or it like we see a lot of um fraud decisions made as well like is this transaction fraud or not fraud that's another big use case of data science so based on the data of transactions are they fraudulent or not fraudulent Okay, a lot of other applications. Those are just a few. Um, and as we go along throughout this course, we'll really study a bunch more and get into some more use cases as we go along.
This is just a preview. Um, but what I want to go to next is kind of the uh process by which we attack data science problems. So we know that um we should be deriving insights from data. That's what data science does. And the question is like how do we go about doing that? What is a good systematic way of doing it? So we're going to describe that in the process. Uh good question. Differences between data science versus data analytics versus data engineering. Yeah. So um the difference with data engineering, let me start with data engineering.
Data engineering is mostly centered around building like pipelines and building systems that can uh collect data. So it's more about um engineering systems to like collect data and house it and store it and facilitate data um transfer and data availability. So it's more about working with the data to make it available and collect it. Um, and it's it's a lot more engineering heavy as the name suggests because you're building pipelines and building systems that will um like scrape data from sources or pull data from APIs or do those kind of things that um will collect data for you.
Um that's a little bit different than data science. Data science remember is going to be generally like building models or doing visualization or both derive some sort of insight find some sort of pattern or find something interesting about the data that's not already not already or hasn't is not readily known um just by looking at the data. Um so with that being said there's a lot of overlap of data science data analytics but I would say analytics is a little bit more um static in the in the sense of a lot of analytics are just doing basic rule systems or basic visualizations.
Uh data science usually goes a step further and builds more sophisticated models to make some sort of prediction make some sort of decision off of. So data science generally involves more modeling than you would see in data analytics. Whereas data analytics goes a little bit deeper on doing like visualizations and doing more um like experimentation um statistics gathering. Does all that make sense? Like data science is is going to be he more heavily on the modeling side to to derive insights. That's a big deal about you know how to derive insights is is from building a model.
Yeah, that's a great question. I know that's there's a lot of those overlapping terms. This is a really good question. Any other questions? Okay. All right. Let's talk about the process. Okay. So, the process is outlined roughly here, but I'm going to take us through every single step. Um but if I had to break this down into um into basically a handful of key steps, it mostly starts at um here which is kind of initial problem formulation which involves collecting data. So usually you have problem formulation. So you need to know like what are you know do we want to build a recommendation system for movies?
Do we want to um figure out if transactions are fraud or not fraud? Like we're coming up with some problem that we want to solve and then going out and collecting data that might help us uh analyze that problem further. Um so we'll talk about that. Then comes the kind of uh data preparation phase I would call it. So data prep meaning that we are preparing our data for modeling um with with that goal in mind is our goal is to actually build some sort of model to tell us something to help us make a decision or derive some insight right and so um we will do a couple steps I'll describe what they mean in a moment of doing some data preparation in order to get ready for modeling then comes the actual modeling uh phase the actual modeling phase which involves um building, training, evaluating all of that.
Um and so there then comes the modeling phase and then comes the actual um like uh deployment of of the model meaning that we um use it in the real world uh to to um bring our insights into an actual uh system integrated into actual system so that those decisions can be can actually be made and used by by an end user let's Okay. Uh I believe you can call a data science process even if no model is built. Step four, five. Um you can but so I think that's fair is to say like if even if you're not building a model, so you're you're thinking just steps one, two, three, four.
Yeah, that's true. Um or even just one, two, three. I think that's fair. But most of the time in this in this uh course or in this program, we're really going to be building models. Um so I want us to have this in mind is like even if we stop here and don't do any model building, this is what we have our eyes to is the ability to the ability to uh model if we want to. But I think it's a fair point is that it could be um you know it's a fair point that you could just imagine getting your data together as really like um kind of a data science process is feature engineering in the future that will use the so so uh feature I'm going to describe what feature engineering is in a moment.
I'm going to go into that in further detail in the next couple slides. So just hang on when I when I we say feature engineering. We'll talk about it. We'll talk about it. Um I was thinking for AB testing and pro there's no model. Would you call this a data science project? Um I that's up for debate. Um AB testing sometimes that falls under analytics, sometimes that falls under data science. I it Yeah, sure. it we'll we'll talk about doing AB testing so even when there is no model so I think it's fair to say it's part of data science um which would kind of just be steps one two three four so yeah I I see where you're coming from I see where you're coming from it's it yeah traditionally AB testing is more like experimentation analytics I I agree it's not traditional data science but we will cover it um because it's an important aspect of of data science with with the idea that the you know the part of deploying a model and seeing what the results are would be like an experiment right it would be we think of it as an experiment where I build some model see what the results are on on a different group than a kind of a baseline control group that's not exposed to that model so it's important for us to know what it is you could argue the actual process of doing AB testing is not really data science.
But I if you add in that context of usually um people are interested in using it in conjunction with building a model and and exposing it to users as an experiment. Um does that make sense? Yep. Okay. Very good. All right. All right. Let's go into each of these steps and talk a little bit more about them. So, excuse me. Usually things start in the beginning with a a problem definition which is a goal uh or a question that will be addressed through collecting and analyzing and deriving insight from data. So that's the very first step and usually this this is actually something that you would work together with other uh uh colleagues usually to come up with.
So usually you may come up with this with like a product manager or with uh another engineering group or you know something like that to come up with a problem like hey we need to build a recommendation system for our users to get better recommendations for their movies or for their shows or for you know their products on our website. Um so so there usually is going to be a question or a goal you know or hey we want to come up with a sales forecast for the next um three quarters uh given the data we already have for this quarter um or the previous quarters.
So you know usually there's you have to start with a problem. So you have to start with a problem definition um that you want to you want to address um and and honestly that goes handinhand with the next step which is once you have a problem in mind you then have to collect data around that problem. So you have to gather the relevant data sets um which could also involve working with external um partners to do that. So maybe you have to work with data engineers to help you go out and collect that data or make that data accessible.
Um you may have to work with uh you know you may have to do it yourself and go and gather historical data that can help answer that problem. So it sometimes this is up to you. You have to uh go and and go out and collect that historical data or that data that's relevant to your problem. um that is totally possible but sometimes you're also working with external partners like a data engineer to make that possible to to go and collect the data but of course the key word here is relevant right the data needs to be relevant to the problem that you're solving and that can be a challenge so I know these are listed as the first two steps and they seem pretty straightforward but they can be the most challenging at times is defining the right problem and collecting the right data getting that available um is not always trivial.
But in this course, we'll usually assume that we have these two things. We'll usually assume that yes, there's a problem we we know we want to address um for practice purposes. We have that. And then we also have data already available for us. We don't have to go out and fish for it and collect it and scrape it from somewhere. We'll assume that's already been done and we're just using the data that we have available. So in this uh in this course um this these two steps will usually be done for us uh mainly just for um so that we can practice all the other steps.
But but in the real world that would usually be you know someone you're working together to formulate a problem with with external stakeholders and you're also gathering data either by yourself or with with the help of um maybe like a data engineer or someone like that. Okay. So again uh these two things we'll usually have uh um in this course which leads us to the next phase which is that data preparation phase. So this is where I said um we need to clean the data and explore it a little bit. So usually this process can take some time.
So this can be a very timeconsuming process. But the typical tasks we're doing here are getting a handle on any missing data. So figuring out a strategy to handle missing data, we'll talk about that. Um how to identify and handle outliers, we'll talk about that. Uh maybe duplicate data, inconsistent data that that um doesn't make sense, we can identify that, we can get rid of it or have some strategy to handle it. Um, so getting a handle on cleaning up our data is going to be an important task and that might take some time. That might take some time to work with our data and kind of do some we have to learn how to do the proper code, clean it up, get it to a good state.
And then once we do that, we can start to explore it a bit to gain insights. And this is where we'll usually use um visualization at our at our disposal. So maybe we'll build some graphs to quickly visualize, get some patterns, see what see what the data looks like, um figure out if there's any relationships in the data. Um that's where we'll learn about visualization that will really help us. So when we say explore, we're mostly talking about building um graphs to help us kind of tell a story about what we see in that data.
Um assuming that it's been cleaned up, right? Assuming that we have it cleaned up, we've removed all our missing values, outliers, inconsistent uh values. Uh we now have a clean set of data, we can start to explore it. So usually um doing visualizations. It could also be it's not only visualization, but it could also be summaries. So maybe it's really useful to tell a story like what is the average for all these users or what is the average sales for the last few weeks? um what is the median sales like those kind of statistics might be really useful summaries to tell a story about the data.
Okay. So we have our steps here. Problem definition, data collection, cleaning exploration. First three steps. Any questions about those first three steps? Good. Okay. And again, we're going to like as we go along, we'll we will we will deal with uh we will deal with these um problems. We'll do have actual examples that will deal with this process. So, we'll see this process from end to end. many times as we get into our examples later on. Uh, bronze, silver, gold. So, usually those are like data engineering terms to refer to uh this the basically how clean the data is.
Um, so like bronze is kind of like the rawest form. Um, it it's usually like data that has not been aggregated in any way. It's usually pretty raw. It hasn't been cleaned up in any way. Silver might be cleaned up but not really aggregated. So silver might be like we've removed missing values, we've done some um we've removed outliers. Uh so silver is a bit cleaned up but it's still kind of raw. And then gold would usually be like our final transformations have been applied to it. Like maybe we've done some averaging um we've done some transformations.
So usually th those are terms that you see to refer to the different like stages and of quality. Gold being like the highest quality like the final data set. Yeah. Yeah. Model will run on gold. Yep. Yes, that's true. We don't really, by the way, we don't really use those terms too often. I think you see those terms a lot in like data engineering. Don't really see them that much here just because the assumption is that we're always going to clean up like our models aren't going to be good unless we get to a gold state, right?
Unless we clean up our data, unless we do the right transformations, that makes sense. Like our models really aren't going to be useful until we reach that state. So, our we're always going to be pushing to like clean up and get to a good state. Okay. So, we have one, two, three. Let's look at the next few steps. So once our data is clean and we've explored it a little bit, this is step four which is feature engineering. So this is the other kind of data preparation step that I talked about. So feature engineering, what is that?
It is a um creation or transformation of new features. So you might ask what is a feature? A feature is just a variable that is an input to a model. Okay, a feature is just an input. So I want you to think of like in an Excel spreadsheet a feature is something like a column. It's like a it's an independent variable like a column that we would use to build a model off of like that will be one input. We would call it a feature that's going into the model uh as an input. Okay. So feature engineering is the process of kind of building new features and those can be by doing simple transformations like maybe we do some scaling like dividing by 10, multiplying by 10.
Those are simple scaling we can do to features. Um we could do more complex transformations like doing a linear transformation to it. Um we could take a square root, we could take a logarithm, we can take an exponential. um many different transforms we can do to our data to create new features. Um and and sometimes that makes sense to do that. Sometimes we don't need to do that much feature engineering, but that's something we will get a feel for as we start to do examples is like when does it make sense to do feature engineering and when when do we not have to?
Um so feature engineering will be more of an art than a science honestly. And um we're going to do plenty of examples where we do feature engineering to see what kinds of transformations we typically will do. Uh it's part of the current data. Yeah. So all everything we're talking about will be part of the current data we're working with. Is it always performed? No. Uh we don't necessarily always do feature engineering. Um but it is a step like we can and we should evaluate if we should. Okay. So will we always do it? No. But we have the ability to do it and it is a it's a step worth calling out because it can be very valuable to do.
So again, we haven't learned how to do that yet. So it may not make that much sense to us, but I'm calling it out as a very important step in the process. And as we start to do examples later down the road, um we're going to come back and spend some time on feature engineering because it is important step. It is an important step often to do it. Yeah, it's I I see how it can be a little confusing. Yeah, feature is in data science, feature is just a variable. It's like an independent variable that's uh an input to the model.
Yeah, it's a little bit different of a meaning than like a feature like a which is kind of a um a new piece of software you're adding to existing software, right? Little bit of different um terminology there. So this was step four and this was um data prep. So draw a line here because this is all like the data prep and then these couple of steps here are all modeling oriented. So once we have our data prepared, we've cleaned it up, we've done exploration to determine what we should keep, we've um done some engineering to do some scaling or transform the features that we have, we can now build a model.
And so this will all be um something we will learn in our next uh we will learn how to build models in our next course on ML. But just calling it out as you know that would be the next natural step is once we have our data cleaned up to derive an insight from it we may want to build a model off of it. And so that will involve um kind of defining a model training it on this data that we've prepared um as as step number five. And then step number six will be kind of an evaluation step of um determining if we uh have a good enough model by evaluating it.
you know this, by the way, this process isn't necessarily linear in the sense that we may iterate here and go back and um you know, repeat these steps. So, we may go back and forth and keep repeating these um by building a new model, determining if it's good enough, repeating and repeating and repeating that process. Um and we may even go all the way back up to here and build some new features if if the model isn't doing that well either. um that may be possible. So by no means is this process always linear. We may repeat especially this model training and building and evaluation steps.
We definitely can repeat these back and forth um until we kind of converge on a good enough model. That's something we'll discuss when we get into modeling. I don't want to go into details now but that is suffice to say like we can iterate those uh steps quite a bit and we can spend a lot of time on it for sure as a as a data scientist. And then oh by the way there there's one more. So there should be a step here on deployment which is kind of mentioned here but I would argue it's its own it's its own step but once we have the model um done then we can kind of deploy it.
Um now deployment means a lot of different things. There's a lot of different ways to do that. Um we won't get into that till much much later in in the program. Um it's it's not really going to be a focus for us at the moment. We're mainly going to focus in on all of the data preparation steps and then all of the modeling and then leave this part till kind of the very end of the program. Okay. But it it is important part like we need to make you once we build a model we do want to have it be useful in the real world.
So we need a way for it to be integrated into existing you know existing systems which can be different there different ways to do that. Um, what is the percentage of the model has to see for Oh, no, that's not a silly question at all. Um, it it's really like your tolerance for uh if you think it's good enough. So it I've for instance I've worked with people um where we've deployed models that have been like 70% accurate and that's okay because we just want to get something out there to to test with and to um have results.
So yeah like the the target percentage is is different depending on the problem. Um that by the way that's something we'll talk about when we get into our next course on models and evaluation because it turns out there's different ways to evaluate um the accuracy. Like sometimes we care more about it's not about overall accuracy but more about like are we limiting our false positives versus false negatives because sometimes a false positive is going to be more costly than a false negative or vice versa. like a false negative may be way more costly than a false positive.
So, we're going to have different ways to evaluate uh our models and that can lead us to different different thresholds. But like I I it doesn't have to be perfect. I've seen a lot of models deployed at the like 70% range, 60% range. Um that's okay. it. Mainly the reason to do that, by the way, is to get something out there so that you can iterate on it, right? You you get something into production and then you can iterate on it after the fact. So you don't want to spend forever waiting for it to be perfect.
All right. Any questions about that process? I know we haven't uh covered all the details of it but just remember right now it's you know we gather we we formulate a problem gather data we do data prep then we do modeling and the data prep is going to be something we will focus heavily on in this course. Okay for the moment the data prep will be something we we really key in on um in our last couple of lessons. Okay. Things like data cleaning, things like feature engineering. Any questions? Uh, will we cover precision recall?
Yeah, we will, just not now. Um, we'll cover that later in machine learning when we get to model evaluation. Yeah, really no need to cover it now. So, not we're not building any models yet. All right. So, let's go back and circle around to Python. I know, you know, we spent some time on it already, but just to reiterate that Python will be our friend here when we're doing data science. So it'll it's going to be the preferred programming language for anything data science and that's true in the industry. Um, Python is widely used mainly because it has so many great packages to help us work with data, namely NumPy and Pandis, which are the first two we will look at and then it has many others to help us build models like scikitlearn uh which we'll get familiar with and then it has others to do visualization that we'll study.
Um, so it basically has packages that do most of the tasks for us that we're interested in doing. So that you know that's why we'll stick with Python. It's really great for data science. So we've talked about this before it uh why we why people prefer to use Python is because it's open source interpreted. It has so many great packages that are uh that are oriented for data science and and can help us do data science really easily. Um a lot of people used to use R to do data science but people it's been a shift um over towards Python because of its flexibility.
Um Python can integrate with other systems pretty easily whereas Rs R is more difficult to use. Um yeah R is like another it's like a scientific uh um analysis language. um you know it's it's used in a lot of like statistics um a lot of statistics people like using R but uh for doing data science it's almost exclusively done in Python so there's really no you don't see R too often I've I've really never seen it I've only seen Python in the industry so no worries about the R historically R has been around uh R has been around for a while But um Python is by far in a way the most used uh data science uh language I've for sure.
Okay. So I want to briefly tell you about some of the packages that we are going to study in in our course um that are in Python that we will use to do data science. So I just want to briefly talk about them and then of course we're going to have um a couple of lessons dedicated to going into those like numpy and pandas and all the visualization libraries. So the first one is numpy. So numpy is uh short for numerical python. So that's that's why it's called numpy the numerical python. And it is a Python package for doing computing basically scientific computing uh using these uh array structures that NumPy has created.
Um and so many things are built off of NumPy arrays and the ability to operate on these numpy arrays. So, NumPy came around and um created these multi-dimensional arrays which are essentially like matrices um and and also had a lot of different um computing uh tools around the the matrices and arrays that so many other packages are built off of. So, we're going to learn pandas. Pandas is built off of numpy. So is uh mattplot lib which is for plotting and so many other packages are built off of numpy. So it's a really foundational um package for working with data because data will be stored in numpy arrays.
The numpy array is the kind of foundational data type of numpy and and so many things work with numpy arrays. Okay. So numpy will that's going to we're going to have a whole lesson dedicated to numpy coming up next. But NumPy is going to be the first place we're going to start just because it's so important for working with data. It's multi-dimensional arrays are so useful for storing and manipulating data. Um so it's it's pretty important. What is a for transform? It is a transformation of uh data into like a signal uh basically like a signal transformation.
So you extract you go from like a a basically like a time series into like a frequency series. It's used in signal processing. Yeah. Analog to digital. Yeah, pretty much analog to digital. Yeah, it's used in signal processing. Okay. So the second package that we will study, so we'll start with numpy. We'll start with that today. So right after this lesson, we'll dive right into numpy and start working with examples of the numpy arrays. But um right after that is the library pandas, which we'll spend a lot of time with. Pandas is a library built off of numpy.
So it it depends on numpy and it basically comes around and provides a more structure uh to manipulating data. So if you're like I said earlier, if you're familiar with Excel, pandas has a lot of functionality that mimics what you would do with a spreadsheet. basically like structured row column data um is is what pandas excels at. Um so pandas is going to be a really fundamental package for us to manipulate um data that's structured in kind of a row column matrix format but it's built off of um numpy. So it uses numpy under the hood to do all the manipulation but pandas provides its own data structures to kind of put data into almost like a spreadsheet format that we can manipulate.
Okay. So really pandas is going to be really really powerful for us to manipulate data and we'll use it all the time. So if if anything coming out of this course you guys will be pandas experts if anything else. Yeah, I mean of course you'll learn more than that but I think you'll come away as being really really good um users of pandas and numpy for that matter but but certainly pandas. So we're going to study numpy first and then we'll have a lesson dedicated to pandas right after numpy. So we'll have a lot more to say about it, but I just wanted to kind of preview that, you know, it's a really important package in the data science ecosystem because it helps us manipulate that structured data that's in like a row column format like a table.
Okay. Then another package is the sci package which is um short for scientific Python. It is another open source library that's built on top of numpy. So it uses numpy arrays as its underlying uh data structures to do the manipulations. Scy contains a lot of um scientific formulas and a lot of um scientific computing uh tools that we'll use especially when we get into hypothesis testing. So it contains a lot of like Z test, t test, distributions, things like that. Um so it's it's tailored for that. It also has things like the forier transform as well.
Um it has different linear algebra manipulations as well. Um so sci will be really useful uh when we get into our hypothesis testing and AB testing. It has those kind of uh um those distributions that we'll need to do our tests like a like a student t test or a z test or those kind of things we'll we'll use scypi for. So really important package we'll see later we do hypothesis testing. Um another one that is going to be useful from time to time is the stats models package. So it is one that um basically has a lot of statistics oriented things.
Um it it has um some basic models in there like like linear regression or logistic regression. Um we will generally favor a different package to do those kind of models but uh just calling it out that um stats models does have some useful stuff when it comes to doing statistical testing. So there are some like kiquare tests or ANOVA tests that we will borrow from stats models that sci um we can borrow from stats models. So we will use it when we get into hypothesis testing as well. So these last two, so sci and stats models are two packages that we'll use when we get into AP hypothesis testing.
so that brings us to scikitlearn. Now this is going to be our primary package for doing machine learning. Um, so this will be one that we'll build all of our models and machine learning off of when we get into our machine learning course. So we we won't really use scikitlearn in this current course, but when we get into machine learning, uh it will be our go-to package to do all of our uh machine learning with. It is a fantastic fantastic library that's been um developed over over years to contain all the basic models that we would ever want to build.
Second is really awesome. Um, so it can it can build models for so many different use cases. Um, and it's a really easy package to use. It has a really nice interface, really easy interface. So we will see that later on when we get into our next course on machine learning. But just calling it out that is a very popular uh data science library scikitlearn. So when we get into our modeling, we will use scikitlearn. when we do our data prep man manipulations we'll be using numpy and pandas. Finally for visualization um for visualization we will be using a library called mapplot lib.
So it is kind of the foundational um python plotting library that borrows inspiration from uh from from mat lab. So if you guys have ever used the mat lab plotting um it's actually very inspired by that hence the name Matt plot um from mat lab um but it's it's going to be our main tool for using uh for for building graphs. Okay so um it's a foundational library for building graphs almost every other library that does visualizations is built off of this one built off of mapp. So when we get into our visualization course, we will come back and do uh we will come back and talk a lot about MPA lib and practice with mapp quite a bit.
Excuse me. What is that course called? Uh machine learning. Our next the next course is called machine learning. And then another uh visualization library that we will lean on heavily is the seabor library. This is one that is built on top of mattplot liib. So matt mattplot liib is kind of like numpy. It's the foundation and then a lot of things are built on top of it. Seabor being one of them. Excuse me. Um, Seabor being one of them. And it it basically has just better aesthetics. It provides better not only like better aesthetics than just basic MPA lib.
It also has more um scientific kind of plots and more interesting plots than the regular ones you get out of the box with map paw lib. So it has really interesting histograms, file implies, heat maps. Um it can do statistical error like confidence interval bars. Um so it just builds better plots than than basic map liib. Map pod li is very basic. It can it's really easy to use. You know you can build a lot of plots with it as we're going to learn. But seabour is really nice. It makes things aesthetically pleasing. And so we'll also use seabour from time to time.
It's another plotting library that we'll get some practice with uh when we get into visualization. And another one of those is plotly. So we have seabor and plotly both built off of map plot web in order to do plots. Now plotly's specialty is for building interactive graphs. So when you build a plotly graph, you can actually um it'll pop up in your web browser kind of like Jupyter notebooks do and you can click around in the graph and mark down points. You can zoom in, you can zoom out. Excuse me, just getting over a cold here, so don't mind don't mind the coughs.
Um but you know, you can zoom in, you can zoom out, you can do a lot of interactions with plotly. So if you want to build an interactive graph, plotly is a good package. Again, it's built off of uh Mattplot liib. Uh so we'll get some practice with plotly. So so these three we're going to practice when we get into visualization. Seabor, Mattplot Lib is the basic foundation. Seabor and Plotly both build off of it. We'll get some practice with all three of those when we get into visualization. Um so the rest of these slides just go through some plots that we will be building later on when we get into our visualization.
So uh just wanted to brief briefly go through those uh just to show you some of the different types of plots we'll do. So the easiest kind is basically a line plot that connects different points. So this would be like if we were plotting out something over time like a stock price or uh a sales value over different quarters or or weeks um temperatures over time something like that. So basic kind of plot we'll be able to build that no problem. Um we can even mark different points on those. That'll be easy to do with Mattplot lib or seabour or potly.
That'll be really easy to do. So again, we'll we'll show you how to build these with code later on when we get into visualizations, but just showing you the possibilities right now. Uh scatter plots. We'll do these which have um different points kind of uh scattered throughout on on uh two axes here. Um, this is usually helpful to figure out how the data is kind of um, maybe clustered together or figure out if there's relationships between two variables, like if they tend to trend the same direction or in the opposite direction or if they're kind of just distributed all over the place.
So, we'll be able to build scatter plots. That'll be helpful. Um, area plots that show like cumulative areas on top of each other. We'll be able to show that. Um that'll be pretty easy to graph for different um maybe tracking total sales uh over successive quarters. Um showing different contributions of categories. We'll be able to do uh so we'll be able to do area plot basic bar plot. We'll be able to do uh again these all of these examples were built using map plot lib. So we'll be able to do that but they have equivalent versions in pli and seabor.
So um again th those are built off of map plot liib. So uh we can even put grids in the background to show uh to to kind of um assist the viewing of it uh to to give an idea of where the different points are in the grid. So that will be easy to do. Um histograms. Now, histograms are going to be extremely useful for us. We'll build histograms a lot because they will help us visualize how data is distributed, which is extremely important to know um you know, is it kind of distributed like this in this picture, which is kind of like a bell curve, or is it flat?
Is it um does it have kind of two peaks to it? Um knowing this distribution will be extremely useful to us. Um so the we will often build a histogram that kind of looks like this. So histograms will be extremely useful. Uh we can build piraphphs um which show different percentages. Uh so um you know there may be certain situations where that makes sense. We're telling a story of our data. It makes sense to use a pigraph. That'll be easy to do. Um the again these are all just examples of what's possible. We have to show you how to build these and we will when we get into the data visualization lesson which is what this kind of note says at the bottom.
Once we get into that lesson we will show you um how to do the code to build these. So just to wrap up this first introductory lesson, we uh have shown you what data science is, which is kind of the uh extraction of insight, deriving insight from data. Um and we have a bunch of different packages are going to help us do that. We also have a process which is going to help us do that, which is usually defining a problem, collecting data, doing data preparation, and then doing modeling after that. Um so you know basic foundations at this point what we're going to do is now go into uh numpy.
So we're going to um start with numpy and then go into uh pandas after that. So we're going to start studying these packages are going to help us do some of these different tasks in data science. Any questions at this point? Okay, I'm going to open up our next lesson then, which is actually going to be So, if you guys notice, the next lesson is actually going to be uh lesson three is broken into several different notebooks. So, we're going to be transitioning into uh notebooks for some of these guys. Um do you guys have those notebooks?
The lesson three uh notebooks. I can try to download I can try to share them with you guys if you don't or does anybody have the folder and want to share it for those that don't have it. It should be a collection of several notebooks. Yeah, let me download it. Okay. So, give me a moment. I will upload it. I have it right here. okay. I just uploaded it. Okay, so you guys should have it. So you want to open uh those notebooks. Again, you can uh open it wherever you want. Um you could do it in your own local uh Jupiter.
You could do it in the lab environment. You could do it in Collab. I recommend Collab just because it's so easy to work with, right? So I recommend opening them up in Collab. That's what I'm going to do. I think it's just so easy to work with. Uh so that that's what I'm going to be using. All right. Can everybody uh see the screen? I'm on the first notebook. The the 3.01 notebook is the one we're going to start with. The introduction to numpy. So if you have a moment, you want to open that one up.
Again, if you're working in Collab, you can upload the notebook. So you can uh go and like once you've extracted that folder, you can upload the notebook um into Collab uh using this like file upload notebook. Um that should work. Or if you have Google Drive, you can put it you can just upload that folder into your Google Drive and then you can just uh launch it through your Google Drive and it should open it in Collab. That works too. Are people able to open the notebook? Yeah. Again, it doesn't matter where you open it, just as long as you can and you can you can uh run some of the cells, you know, because that's we're going to be running them.
Good. Good. All right. So, let's talk about numpy. Now remember numpy is the open-source library that is used for doing um you know that is used for doing uh math and scientific uh computing on uh basically these arrays. So um we are going to take a look at the numpy array object as the first thing that we'll look at. Um now the numpy array object behaves very similar to a list. The so we learned about lists in our previous course and the the numpy array is very similar to a list. We can slice it like a list.
We can access elements like a list. It's ordered like a list. Um but it's a lot faster to do mathematics with the array. And it comes with a bunch of built-in functions like mean, median, mode, all these special things on the array that we don't get with a list. For instance, Python lists do not have a notion of an average. You can't calculate the average of a list without doing a manual calculation. So, but a numpy array has a mean function that comes built in that we can uh take the average um numpy has like an average function that we can take of an array or a median or a mode.
So, arrays are really advantageous to work with inside of numpy. Um so, uh let's take a look at some examples of a numpy array. So in the in the first uh cell here I want to point your attention to two things. One is that in order to use numpy we import it. Do you see how we import numpy and we do this thing called asmpp. Now what this is is we do an alias. This is called an alias. Alias numpy as np. We basically shorthand it to NP which is an industry standard. So anytime you're looking at code and you see NP something that is short for numpy.
So in the industry if you you know everyone is going to shorthand numpy to MP. That's just that's just what people do. So as is the way to alias and import so that when we use numpy in our code we don't have to type out the full word numpy. We can just do np. So that's why you see np here is because uh and really throughout our code we use np. You see it all over the place. It's it's a shorthand alias for the numpy package that we're using. So uh we are importing this package meaning that we are going to use it in our code but we are aliasing it to np.
Um this is the this is the industry standard to do. Most of these packages have a nice uh alias to them like pandis will have an alias. Uh mattplot lib will have an alias um just to make it shorter. Uh do you have to import it in VS Code? Um if you're using VS Code to run your notebooks, yes, you have to upload it there. You want to open it in VS Code? Yeah, you're going to have to you're going to have to open the folder where it exists. But that's if you're using VS Code.
Like you don't have to, but yeah, if you want to. Yes. So the next thing I want to point our attention to is building a numpy array. So notice that we can build this numpy array by doing np.ray. So np. array np.ray um builds a numpy array object. And what we're passing in is just a list of data. Okay. So we have a list of integers that we pass into this MP array which will build a numpy array out of this list. So numpy arrays can be built out of lists, they can be built out of tupils, they can be built out of other numpy arrays.
Um there's many ways to build a numpy array, but the most common is to pass in a list to convert a list into a numpy array. So here um here we are uh building a numpy array from a list which is pretty typical. By the way you guys remember I said that I'm going to be writing a lot of comments. I'll share these notebooks in our Slack after after the classes. but I encourage you guys to do the same thing is to write comments in your notebooks. Okay, try to write comments in your notebooks to outline what the code's actually doing.
How do you install numpy? Uh you just need to do so inside of a cell inside of a cell you can run this uh command like pip install numpy. Try running that inside of a Jupyter cell. Yeah, this this command is not going to work for you because this is like a generic this is on um this is on Windows like a generic Windows command. Um but if you're inside are you inside of a notebook, Mariel? If you're inside of a notebook, just run this inside of a cell. It should install it. Yeah, that works too.
You can open your terminal and do pip install. Um, if you do that, you'll probably have to restart your kernel. No. So, so Collab comes with NumPy already installed. Yeah. So, that's another advantage of of Collab is it already has that installed. We don't need to worry about it. Yeah. So, so if if it says requirement already satisfied, um, which is what this is going to say if I run this, um, it's because it's already installed. So it this means that I already have it installed. Yeah, you already have it installed. Yep. So in collab it already exists.
This one new cell and this command. If you can't get it to work in your VS Code, I really encourage you to to do collab as much as you can just again just to get something that works because NumPai is already installed in Collab. So there's really nothing you need to do extra. Thanks, Tim. That'd be great. That'd be great. All right. So, if we So, this builds um going back to this, this builds a numpy array off of a list. So if we run this code, what's happening is we are building an array and storing it in this array variable and we can print the array.
Now look at what the array looks like. It kind of looks like a list when we print it except we the way we can tell this is a numpy array is that it does not when we print it, it does not have the commas. Notice that the data in there does not have the commas. And that's because um it's being treated as a numpy array. So it doesn't have the commas. It's not at that point. It's now an array. It's not a list. So it it looks slightly differently. And you can even see when we print out the type that this array is actually a numpy n dimensional array which is the foundational data type of numpy.
So this is an NumPy ND array which is the foundational uh data type of numpy. So we have created a numpy array and we now you can see what its type is is this uh MP and D array. Were you guys able to run this first cell? If you run it, it does it's not going to do anything but show this. You should see this and then you should see it's printing out the the type. You should see those two things. And do we see how that this is creating a numpy array? So np.ray is how we that's the function we use to build a numpy array.
And we're passing in a list of data to build that array. Yeah, it might take a it might take a moment to start up the kernel. All right. Any questions on on this so far? What was indie array? It's short for in-dimensional array. It's the it's the numpy array object type. So that is the data type that we are working with now is a numpy n-dimensional array. It's a it's a generic numpy array data type. And you can see that because we we do type and we can see that we get um we get a numpy array numpy nd array as the type of this thing when we create a numpy array out of it.
So nd array is short for n dimensional array. what I wanted to do is go to the next cell and talk about how we can create some matrices essentially multi-dimensional arrays. So the this array that we've created so far is actually just a onedimensional array because it's it only has um it only has one dimension to it. basically has one list of data. But of course, we would be interested in working a lot of times with multi-dimensional data because that's typically like what a spreadsheet has, right? Rows and columns. So, um just to give you guys an example like NumPy actually supports zero dimensions which is basically a constant.
So a single number a single value is considered a z So a single value is considered a zero dimensions is just a single value. So if we built a numpy array and just passed in a single integer or it it doesn't have to be integer it could be float like 24.6 six, you know, whatever it is. Um, it that would be considered a zerodimensional array. But we've already built a 1D array which is just a single list basically a flat list with uh so just a if we use a list with um a list of uh I should say a single list of values is a 1D array.
So we've already we've already seen that it is a scaler. Yeah, we would call that a scaler. Yes. Uh good. Yes, that's true. Scalar. Perfect. Perfect. So, a single list of values is going to be a one-dimensional array here. Now, what gets interesting is now when we do a list that has list as its elements. So, this is a list of lists is now a 2D So I want you I want you guys to see that how we we're building a numpy array out of a list. But look at what the elements of the list are.
They're actually list themselves. So you see how within this overall list, the first element is a list that is 111. That kind of mimics basically like a row. So you think of it as like each list each list is like a row in a matrix in a matrix. So this two-dimensional array is really like a matrix, right? So so this is an interesting use case where you know of course we could have more than just these two. So we could have a third list here that is like um four, five, six and that would be valid as well.
So this would be basically a matrix that has three rows and um each each row has three basically three items in it. So we would think of it as basically having three columns, right? So it's like a 3x3 matrix, but it is two dimensions. It's a two-dimensional array. It has rows and columns at this point. So it has rows and columns. The two-dimensional array. Do let me ask you guys, do we see how this has two dimensions to it? It's a list of lists. So it has two dimensions. Does all the lists need to have the same rows?
What do you think? What do you think would happen if Let's try it. Let's try making this a smaller dimension. Do you think this is going to be allowed? Is this what you're asking? Like if this can this be a shorter dimension? Let's see. So yeah, this gives us an error. So yeah, it it you're exactly right. This will give us an error that the dimensions do not match. So this this is uh not allowed. But let's see if I do if I add in the six, this should now be okay. And there it is.
It's now this is now No more error. Okay, no more error. So, yeah, it's still going to be an error if if the shapes do not match. So, again, if we got rid of if we made this a smaller one, that's going to be an error. Um, and it's going to tell us that we are uh we have one dimension that is inhomogeneous, meaning it doesn't match. It's not the same. They have one dimension that's not the same. it its shape doesn't match. It's not correct. So therefore, we should correct that and make sure it is matching.
Okay, so that is a twodimensional array list of lists. And by the way, that doesn't have to stop there. We could keep going. So now we have a threedimensional array which has lists of lists of lists. So it basically has one of these matrices as each element. So see how this has basically an overall list. So this has um each it has a list where each element is a 2D array, right? Each element is a matrix. So here is one of those matrices is the first element. And then here is another 2D matrix that is the next element.
And this forms a threedimensional array. So if we print this out, we can see that the we get this 3D array where this first matrix, this matrix is the first item, this matrix is the second item and um on and on and on. didn't understand how 2D is different from 3D. Uh does it do you see how Okay. So, do you see how with the 2D basically we take this whole thing and that's just one element of the 3D. This this 2D matrix is one element and then we have another 2D matrix as the next element.
So we have matrices are now the elements of the 3D array. Whereas look at what's the elements of the 2D array. They're just lists. It doesn't have to be two. That's just the example we have. It doesn't have to be two. But um by the way, you one thing that gives away the one thing that gives away the dimensions is how many of these brackets we have. So you see how we have two brackets and see how this has three brackets. Yeah. Oh, you guys got it. You got it, Roberto. Perfect. Yep. You guys got it with the brackets.
Can I So what's an example of using a 3D array? Yeah. So something that uses a 3D array would be like a a batch of images. So let me give you an example. So an image is like a 2D array because it has pixels, right? It's basically an image is broken down like this that has pixels with whatever resolution. And so if we have a collection of those That is it's like we have a collection of these guys is a 3D array. Like if we have a hundred of those, it's like a 3D Does that example make sense?
Like a collection of images would be a 3D array because every image is a is a two-dimensional matrix of pixels. Yeah. A 3D array is a collection of matrices. Yep. Exactly. Does this does this example make sense though? Like this this is a good one. I think I'm glad you asked it because I think it's a good one for Thinking about what a 3D array is. Every element is a 2D matrix. What is the maximum 2D element in a 3D array? I'm not sure what you mean by that. Maximum 2D element. Oh, how many can you fit in a three?
You can have unlimited as as much as the memory will allow. basically as much as your memory will allow. You can have unlimited. You can have as many matrices in a 3D array as you want until you run out of memory essentially. perfect. All right, perfect. So, just to recap this, um, we have the numpy array. So, we're we are able to build numpy arrays using the nparray. And we're able to take a list of data and populate it into an array. Um, and and then what we're going to do is just build off of this to learn how to manipulate that array and do different things with that array coming up next.
All right. So, let's go to the next notebook. Let's go to 3.02. Well, before I do that, any questions? Any other questions about this? Uh, building an array. perfect. So, let's go to our next notebook. Okay. Do you guys have the 3.02 notebook? Do you have it up? That's the next one we're going to do. So, take a moment to pull that one up. Yep. I see some thumbs up. Nice. So, we're going to build off of that numpy array by taking a look at some attributes of arrays. So, so assuming we have an array, no matter how many dimensions it is, what are some attributes of this array that we can that are useful to us?
Okay. So let's take a look at an example where again here we import numpy and we make we're initially making a 2D array, right? So we make a 2D array. Um so then what we're going to do is print out a bunch of the uh attributes about this array and we're going to explain what they what they do. So the first is if we ever want to know how many dimensions an array has there's actually an attribute for that which is the which is called n dim. So if we just do end dim by the way we access let me call that out here is we access attributes of an object by using the syntax um object dot attribute.
So in this case we have our array is our object. So it would be like um i.e. i.e. array.shape would be an attri shape is an attribute of the array. Array is our object here. Okay. So that's our syntax to access different attributes. So the first attribute we're going to learn about is called in dim which gives us the number of dimensions. And so when we print that out you can see what it's going to be. It's going to be two. And that makes sense. It's a twodimensional array. Could the 2D array be? Yeah, that's fine.
That's fine. You could use that. That's still two-dimensional. Um so the first is nim which gives us the number this gives us the the um number of dimensions which equals two in this case. Okay. And that makes sense. We know it's a two-dimensional array based on the fact that our elements are 1D lists. So it's it's a we have a list of lists. It's going to be two dimensions. Now shape gives us shape gives us the um gives us the uh quantity. So it gives us the basically the number of rows and columns. So in this case what this is saying is we have two rows and three elements in each row.
So shape is giving us an idea of how many elements we actually have or what the shape of this matrix is. That's why it's called shape. So in this case we have a 2D array that looks like that kind of looks like this, right? Where we have 1 2 3 and then in this example four to five. So we have two rows, two rows and we have three columns. Two rows and three columns. Hence we get a shape of 2x3. Does that make sense? 2 by3. Yes, Roberto, that should be true. Yes, that should be true.
Yep, Okay, so shape shape is going to be incredibly useful as we go forward because um there's a lot of times where if we have an array, we actually just want to know how many rows and columns it has. Uh which is the shape. So the shape is a good attribute to know. Um does the bracket define as a 2D versus 3D array? It's it's the fact that it's it's two brackets defines it as a 2D array. It has two brackets here. It's a list of lists. This is 2D. It can. So, sorry. Yes, a 2D array can have more than two rows.
Yes, it's the fact that it has So, I meant to say I meant to say that the in this shape we're going to see two entries. Sorry. in the shape we're going to see two entries. Yeah, let me correct that. So the the dimensions will match how many entries we have here. So if we have two two entries uh because it's two twodimensional that's that's what the truth that's the thing that um so sorry Roberto I was uh wrong on what I told you. is when we have two entries here. That's because it's two dimensions, not the fact that this is a two.
It's two entries in the shape. By the way, how can we get a third row? Uh we could just add in another list here, right? So, if we add in another list that has three elements like um 7 8 9, this is now this is still a two-dimensional array. But what I want you to notice is what I want you to notice is that it's going to turn into this shape is going to turn different and this size is going to turn different but it should still be 2D. So see how this shape went from 2x3 to 3x3.
Yeah. The size is going to be the product. That's true. That's true. I was going to get to that next. What is the shape with three rows? It's just 3x3. Do you see that now, Mariel? Do you see it? It's 3x3 with this has a third row. It's now a Yep. Three rows. Three rows with three elements each and a 2D array. Perfect. Yep. So now I want to talk about the size. The size is the total number of elements total number of elements in the array. So uh in this case we have um in this case we have 3x3 so we have nine total elements.
So, this will always be the product product of the shape, right? Lots of questions. Let me see. I'm trying to keep up. Yeah, you can have as many rows, columns as you want. Yep, no limitation. Do you have a real world example of a 3D rate? Yeah, it's the one I gave earlier. It's like a batch of images would be which we are going we're going to work with images quite a bit when we get into deep learning is a 3D array because it it is an array that has as its elements these matrices that are that are pixels pixel matrices right of different resolutions that's a 3D array is a collection of because these are each 2D that's a real so images these are all images usually are in a 3D array uh all the arrays is in the same dimension needs to be of equal shape.
Yeah, they do. They So, so yeah, we saw that example earlier like if we try to So, if we try to change the shape like even in a 2D sense, if we got rid of this, this would be an error, right? This gives us an error. We can't we need this to match the size, right? We need this to match the size in order to build the array. Yeah. Okay. Do we get do we see what size is? Size just gives us the total number of elements. Okay. Total number of elements which is really just a multiplication of the shape.
There's three rows, three items in each row. So a 3x3 is nine total Okay. So that's the total number the size is the total number of elements. Now the the dtype is telling us what every member's type is. So maybe not that interesting but this is kind of the default. So this gives us what each elements each element's data type. So uh we can um grab in this case they're all integers but they're in N64. Um so we can also grab how many bytes each one takes up which is the item size attribute. This is each element's uh memory footprint each element's memory size which is going to be uh eight bytes.
Um and we can also get if we want to very rare we would actually need to access this but we can actually get the uh um memory reference memory reference for the array So array.data we we won't really ever need to worry about this. But uh this is actually really important for pandas to be able to access later on because everything in pandas is built off of numpy. It needs to manipulate the raw memory uh often in order to do different calculations with that data. So it typically will need access to that uh data attribute.
But we generally will never need to know what that memory address is. Any questions about these attributes? I think the one that we'll use the most is probably going to be shape. We'll probably worry about the shape the most uh when we're working with numpy arrays. But any other questions about them. Okay, perfect. Okay, let me show you a couple functions. Uh, all elements in a GDA must have the same data type. No, they don't have to. Just like a list, they don't have to. When we're working with data, they typically will, but they don't have to.
Yeah, you can have that. Try it out. Try it out for yourself. Yeah, you can definitely have that. All right, I want to show you a couple of functions we can do to manipulate the shape of an array. So the first one is we can actually reshape an array using the array.resshape function. So this is a function that we can pass in. So we can do arrayshape and then we put in the uh a tupole with the new shape. So in this case we're putting in uh 4, 3, which is to say we want to take this existing array that is a one-dimensional array.
So notice this is a 1D array and we want to turn it into a two-dimensional array, right? That is 4x3. Okay, four by3 meaning there should be four rows and three columns and do but first of all let me ask you guys do you think this is even possible? What do you think needs to be true in order to reshape this properly? What do you think? If I want to take a a flat 1D array and put it into something that's 4x3, how many elements do I need to do to do that? Perfect. You guys are right on top of it.
12. Perfect. Yep. I need 12 elements. So, what happens if I don't have 12? Do we think this is going to work? Yeah, let's try it. Error. And look at what the error tells us. I cannot reshape something of size 11 into shape 4x3. It even tells us directly we can't do that. So yes, that is definitely a prerequisite to using reshape is that you need this total number of elements to match the number of elements that you start with and then it will work. So if you're going to use reshape, you can reshape into any shape that you want to.
So we could even reshape this into 3x4. That would be okay. We could do 3x4 because that totals up to to uh 12. We could do 2x six. That would be okay. But could we do 2x 7? No. We don't know how to reshape something that is 14 into uh into into a shape 12. We only have 12 items. 2x3x two. Sure, we could do that. That'd be a three-dimensional. We could do that. And now we have a 3D array because we have uh each element is 3x two. We have two of them. So each matrix is 3x two and we have two of those.
So we could do that. So that's what reshape does is reshape can take an existing array and move it into a new shape assuming that the shapes align properly. So reshape can do that for us. So that's actually incredibly useful. We'll use reshape from time to time. And then we can actually do the reverse of reshape. So we can always take something and flatten it out into a 1D Okay, we can always do that as well. So the flatten function can take something and put it into so this will always always give us a 1D array.
So no matter what shape we start with, we can flatten it out into a one-dimensional version of it. Okay, by using the flatten function. So this this does a particular reshape that will completely flatten the array. So you can see it takes this uh three-dimensional array here and goes ahead and reshapes it or or it basically flattens it right into this exactly flat uh one-dimensional array. So flatten always returns a 1D array. Pretty pretty straightforward. Is there any benefits? Yeah. Sometimes we want to take something that is in one shape and move it into another because we're going to manipulate it.
Uh we we're going to assume it has a particular shape to manipulate it. We'll we'll see that later on when we get into deep learning, especially when work with images or text. It's going to be important to reshape things from time to time. So maybe not not maybe not the second, but we get into deep learning, we'll we'll go ahead and reshape Okay, one more I want to show you and then we'll take a break is that uh there is a transpose function. Now what this does is it swaps rows and columns. So it it transposes uh this into uh whatever was our rows.
So this one two three now becomes the columns. And so this was a 2x3 matrix. It now transposes into a 3x 2 matrix. So transpose swaps our rows and columns. And that can that's going to be useful down the road too for different uh algebra calculations we may need to do may need to transpose from time to time. Any questions on any of these uh functions? Hopefully they're they're not too bad. They're straightforward. They're just different reshaping. We have reshape, we have flatten, we have transpose. They just change the shape of the array. So we're going to start with doing some arithmetic operations.
Uh so just to show you guys that when you have data in numpy arrays, you can do elementwise operations, meaning that we can do operations that go element byelement, match them up and do uh some type of mathematical operation between them. So things like addition, subtraction, multiplication, division, we can do those uh between elements. Um, so for instance, we have these two arrays of the same size, the same shape, and we can go ahead and add them together. Meaning that like this uh position is going to be added to this position, this position is going to be added to this position, this position is going to be added to this position.
Okay. So when we do that, we get um basically 40 in every slot because we get 30 + 10 is 40. 20 + 20 is 40 and 10 + 30 is also 40. But look at the syntax of it. There's actually two different ways to do it. You can do the numpy.add and then you pass in a and b. So this is the um this is one way to do it. One way to add is to use MP.add and then you pass in your array one and array two. So you can do that. Um and so we we add those two and and store it in the result.
And notice that the result is a same shape array but just with each element added together from the original arrays. So that's one way to do it. The other is you could just do regular uh arithmetic. So you could just do a plus b. This is an alternative. Alternative is to just use standard arithmetic arithmetic operations. It it doesn't really matter which one you do. I've seen both. Um both will result in the same kind of array. So we could store um we could do something like result equals a + b and then um store that in the result and then print um print the result.
So it's same thing as before um we get the same array. So you can do either one np.add plus b either one will do that elementwise addition uh between the Okay, pretty straightforward. Um, we also have the same thing for subtract, multiply, and divide. So, for instance, when we have these, now we have a 2D array. So, this is now two-dimensional. And but the same exact thing is going to happen. We're going to go through and subtract. This is going to do um this is this is the same as a minus b. So this takes a and subtracts b from it.
So we have 30 - 10 is 20. 40 - 20 is 20. 60 - 30. Um and then we do 50 - 40. So we're subtracting those elements in the same positions to get a uh to get a result um to get a result of uh this 2D array that is a result of subtracting every element from the original arrays. So again you could do a minus b you could do np.subtract um either way should work. Uh 2D plus 3D could you do it? Uh, you could try it out. So, this is a 2D. Um, we could try it out.
So, let's copy this guy. Let's do Let's do this guy, which is going to be a Let's see. So, let's do uh a 3D array. So, let's do um let's do one of these guys. And then let's do another one, but let's just let's just change up the numbers. So, let's do 10, 15, 20, 25, 30, 45. Let's do um MP.subtract subtract A and B. Let's see what we get. So we actually do get a result and uh the reason we do is something called broadcasting which is um an interesting idea in numpy that what they do is they basically will force the shapes to to be aligned when you do a mathematical operation but you might get some unintended consequences of doing that.
Um for instance we get like we get some of these uh actually work where we get 30 minus 30 40 60. So we get these zeros here for this first guy. But notice that we basically take this and apply it to this. It basically takes this and and subtracts to this guy…
Transcript truncated. Watch the full video for the complete content.
More from Simplilearn
Get daily recaps from
Simplilearn
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.
![Business Analysis Full Course 2026 [FREE] | Business Analytics Tutorial For Beginnners | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi_webp/_X6etf9ucd8/maxresdefault.webp)








