RAG vs Fine Tuning vs Prompt Engineering: Use Cases And Key Differences Explained | Simplilearn

Simplilearn| 00:25:32|May 3, 2026

Chapters5

This chapter introduces prompt engineering, retrieval augmented generation (RAG), and fine-tuning, outlining the topics to be covered and the practical project structure for building and comparing these approaches.

RAG, fine-tuning, and prompt engineering each solve different AI challenges; use retrieval for up-to-date data, fine-tuning for consistent behavior, and prompts to guide outputs precisely.

Summary

Simplilearn’s video with the host breaks down three core AI techniques—prompt engineering, retrieval augmented generation (RAG), and fine-tuning—and shows how they differ in purpose and application. The presenter explains RAG as a two-step process: retrieve relevant documents first, then generate answers using that context. Fine-tuning is described as specialized training to align a model with a specific task, style, or format, often via label data and even lightweight methods like LoRa or QLoRA. Prompt engineering is framed as crafting inputs to coax better, more structured results without altering model weights. The host also provides practical project-structure examples for building a RAG pipeline, including data ingestion, chunking, embeddings, a retriever, and a generator, plus the typical prompts folder and config setup. Throughout, comparisons are made: use RAG for dynamic, source-backed knowledge; use fine-tuning for consistent tone and task performance; and use prompt engineering to guide outputs without changing the model itself. The video ties these concepts to real-world use cases, including an HR chatbot and customer-support scenarios, and it points to official guides from OpenAI and Google to deepen understanding. If you’re building AI apps, this session offers a clear mental model for when to apply each technique and how they complement one another in practical pipelines.

Key Takeaways

RAG (retrieval augmented generation) retrieves external documents first and then generates answers based on that context, reducing hallucinations and keeping information up to date.
Fine-tuning adapts a pre-trained model to a specific task or style by additional training on task-specific examples, improving consistency and format alignment.
Prompt engineering is about designing inputs to elicit desired outputs, often incorporating context, format, and examples without changing model weights.
LoRa and QLoRA are lightweight, parameter-efficient fine-tuning methods used to tailor models more cheaply than full fine-tuning.
In practice, use RAG when knowledge is dynamic or document-driven; use fine-tuning for stable, brand-consistent responses; and use prompt engineering to coax better performance from the base model.
Typical RAG project structure includes data/raw sources, processed text chunks, a vector store for embeddings, a src module with loader/chunk/retriever/generator, and a prompts folder with templates.
OpenAI and Google documentation emphasize prompts should include clear instructions, relevant context, and desired output structure to improve results.

Who Is This For?

Essential viewing for AI practitioners building production apps who must decide when to retrieve information, when to specialize a model, or when to craft smarter prompts. Great for teams deploying customer-support bots, internal HR assistants, or knowledge-grounded assistants.

Notable Quotes

"So rag stands for retrieval augmented generation."

—Definition of RAG and its core idea.

"prompt engineering is the process of designing and refining the input you give to an AI model."

—Definition of prompt engineering per OpenAI/Google sources.

"use rag when you need the latest or document based knowledge and use fine-tuning when you need consistent style, format or task performance."

—Practical guidance on when to use each technique.

"without fine-tuning the model may answer correctly sometimes but the style may vary."

—Illustrates the benefit of fine-tuning for consistency and brand alignment.

"first it searches relevant document or information from PDFs, databases, websites, knowledge bases or company file."

—RAG retrieval step explained.

Questions This Video Answers

How does Retrieval Augmented Generation (RAG) keep AI answers up to date with external documents?
What's the difference between fine-tuning and LoRa/QLoRA for model specialization?
How should I structure a RAG project folder for embeddings and vector stores?
What are best practices for prompt engineering to improve LLM outputs?
When should I prefer prompt engineering over changing the model weights for a project?

Retrieval Augmented Generation (RAG)Fine-tuning Prompt EngineeringLoRaQLoRAOpenAI prompt designGoogle prompt designVector storesEmbeddingsRAG project structure

Full Transcript

[music] Welcome to the video on prompt engineering versus rag versus fine-tuning. These are one of the most important concepts which is used in the modern AI applications. These term sounds technical at first but once you understand what each one actually does the difference becomes very simple. So guys if you want to learn about it then watch this video till the end. In this video, we will see how prompt engineering helps us improve the way we ask the model. How rag helps the model answer using external documents and real data. And also guys, how fine-tuning helps the model become more specialized for a particular task or style. Along with this concept, we'll also look at simple project structures and some practical examples which is going to give you better clarity about these terms. Now, here's the agenda of our today's session. First we are going to start with an introduction to prompt engineering, rack and fine-tuning. And after that we are going to deep dive into what exactly is prompt engineering, what is rack and what is finetuning. Then we will discuss key differences between all three. After that we are going to take a look at some of the real world examples and also we are going to see the sample project structures. Now before we deep dive into the session just a quick info guys. Simply learn has got professional certificate program in generative AI and machine learning. You are going to learn from IIT faculty and industry experts. After that, you're going to attend master classes delivered by IIT Gujhati faculty. Also guys, you're going to experience campus immersion at IIT Guhati. You're going to get alumni status from ENIC Academy IIT Guhati. So guys, hurry up now and join the course. The course link is mentioned in the description box. Now, here is a short quest to test your knowledge. Which one of the following is mainly used to help an AI answer using external documents or knowledge sources? And your options are prompt engineering, rag, fine-tuning or tokenization. Please mention your answers in the comment section below. So let's get started. Now guys, let us first try to understand what is rag. So rag stands for retrieval augmented generation. What is this guys? It is the way of making an AI model answer using external information instead of relying only what it already knows. Simple meaning a rag works in two step. First it searches relevant document or information from PDFs, databases, websites, knowledge bases or company file. Then what it does it it generates it uses the retrieved information to create the final answer. So instead of saying answer only from your training, rag says first look up the right information and then answer from that. Now why rag is so useful. So guys if I have to say rag is very useful because a normal model may forget details give outdated answer and guess what when it's unsure rag helps it because it can use the latest information use company specific documents and reduce hallucination and also it's going to give you answer from the trusted source it's very simple let's take one good example suppose you ask to your AI systems. For example, how many casual leaves do employees get per year? Now, let's say you ask this question to your uh tools like chat GBD. Normal AI model may give a generic HR answer. But with rag, the system first check your company's HR policy document, finds for full-time employee, they receive 12 casual leaves per year. for example and then it could answer something like this that employees receive 12 casual leaves per year. So guys the answer is based on the document not on guess. Let's take a real world example for example there is a company let's say XY Z and they have this HR chatbot. Imagine a company which is building an internal HR assistant. So employees ask question like how many sick leaves do I get or can I carry forward casual leave or how do I apply for maternity leave or what is a notice period without rag the AI may give very generic internet style answers. It can get confused from one company policy with another and also it may give you incorrect answer. But with rag the system searches the company HR policy PDF finds the exact section which is related to the question gives an answer based on that section. What happens behind the scenes it uses HR policies and it stores it in a system. So let's say if the user ask can casual leave be carried forward then rag retrieves this line that casual leaves cannot be carried forward. So this is how rag is working in our system and that's why it is very much important in modern AI application. Now let me give you a rag flow which will help you to understand its architecture. Suppose a user is asking a question. Can a casual leave be carried forward? Now your AI model is actually searching the document or the company PDF about what is the policy or the leave policy all over there. Then that's called as the retrieval search. Now that is called as retrieval step or the search step. It searches for the HR policy documents for the relevant information. Then it retrieves the matching policy and it says that casual leaves cannot be carried forward. Then after that there is a generation step at which it just gives you that AI has retrieved to create the final response and this is happening internally and finally it gives you the answer that no the casual leaves cannot be carried forward. So this is how a rag workflow works. Now let me also show you how will the project structure will look like. So guys let's say inside the folder you have data inside data you have another subfolder called raw and let's say here you have the complete knowledge source which is company policy dot where you have highlighted the external information regarding what are the policies in the company. Now next up alo we have the next folder which is called as processed. So let us create one more folder. Now let's say we have the next folder as processed and inside this let's say we have the file called chunk.json. So this becomes our another file. I will explain you each one of them what is the purpose for each of these files. Then we'll create another subfolder called as vector stores. And in this vector store we can name it as something like all over here. And inside the subfolder we can create another file which we call as fis. And inside this we can have another folder called src which is the main folder all over here. Now inside src we are going to make our files for example like loader. py then we will make something like all over here chunker py next is embitter py then we have something called as retriever py then we have something called as gener py after this we have something called as ragpipe py finally we are going to create one more subfolders and uh here we can have one more file all over here we will call it as utils py Now this is done. Now we will create one subfolder all over here and we will call it as something called as prompts. And inside prompts we are going to give the respective file name all over here. And inside this we can have the files like rackom.txt. Then next file would be app. py and next file would be config. py. And finally we could add something called as requirement.txt to know what are the dependencies involved. So we can just name it all over here. After that we can have one environment variable file to store all our secrets. And next we can have something called as test query. So guys this is the overall folder structure if you see for a rag application. Now what each folder does let me explain it to you. So if you see the / data folder all over here this stores your document like your original files like PDFs, text file or documentation then you have something called as processed. So this folder contains clean text chunks or extracted content. It is going to have the extracted content like for example HR policy PDF, company handbook, product manual or FAQ document. Then you have something called as vector store or fises. So this stores the embeddings and the index files. For example, fiss index, chroma, pine cone metadata or revate storage. Now this is sounding little bit technical but this is where the system searches for relevant charts. Then you have something called as src and this contains the main rag logic. For example, loader. py which loads the document from files and reads the PDF, extracts the text, loads txt or docx. Then I have something called as chunk py which breaks large text into smaller chunks. For example, it splits every 500 words and it overlaps to 50 words. Then you have something called as emby which convert text chunks into embeddings. For example, like it is going to have OpenAI embeddings, hugging face embeddings, sentence transformers. Then you see some file name as retriever. py. This file finds the most relevant chunks based on user question. For example, the similarity search top three matching chunks. After that, you have something called as generator. py. It sends the retrieve context plus user question to the LLM. For example, it is going to build the final prompt and also it is going to get the final answer from the model. Finally, I have this interesting file name which is called as rag pipeline. py. So, this is missed. Let me create that file pipeline. py. Now, this is a file which connects everything together. For example, here the flow is going to be something like this. So first of all it is going to load the query. After that it is going to retrieve the chunks. After that it is going to generate answers. So this is the overall pipeline that is going to be present in your rack pipeline. py. Now any helper functions which you are going to require in this project will be present in the utils. py. For example like let's say you have a task like cleaning text, logging or formatting output. So for all those things you will have those tools present all over here. Then this is one of the most important folders which is called as prompts. Now prompt stores the prompt templates. For example all over here let's say you're going to give it like you are a helpful assistant. Answer only from the provided context. If the answer is not found say not found in the document. So this is going to be present something like this. Now next one of the most important file which is app. py and this is the main entry point. It can be a CLI app, it can be a streamlit app, it can be a gradio app or a fast API backend app. So this is the most entry point. Finally, you have something called as config. py which stores configuration values. For example, like what is the model name? uh what is the chunk size or it could be like the overlap and what is the vector DB path and finally you see there is a requirement txt so all the Python libraries which you'd be requiring to build a rag application will be present in this respective uh file and finally guys we have something called as test query py which is a small script to test the rag system with the sample question for example the simple workflow of the structure could be something like like this. Let's say for example we have the document present all over here. Then it goes to the loader. It goes to the chunker which breaks down the given text into chunks. Then it goes to the emitter and after that it goes to the vector store. So this is how a rag project is going to look like. Now let us move to the next part and try to understand something called as fine-tuning. Now guys you can see I have opened the official documentation based on this which is given by Google for what is prompt repeat. So guys you can see all over here I have opened the official Google developers guide where it says about fine-tuning is a way to make a pre-trained model better at specific task format or style by training it and further you can provide your own examples. Now, OpenAI describes it as the base model giving it the kinds of inputs and the outputs you expect in your application and getting a model that performs better on those task. Now, Google's vortex AI docs similarly says the same thing. It describes supervised fine-tuning as adapting model behavior with label data set by adjusting the model weights. You can leave it guys, repeat. I know guys it sounds bit technical but let me explain you in very simple way. So guys, we all know that a large language model already knows general language patterns because it was pre-trained on lot of data and fine-tuning it gives it an extra round of learning on smaller and targeted data set so that it becomes better at something specific such as replying in your company stone, following a fixed output format or doing a particular classification or summarization task. then it handles a narrow business workflow more consistently. In very easy words, you can think of it like there is a base model which is like a smart graduate and fine-tuning is a job specific training. So the model is not learning from the scratch. It is just being made specialized. Now what actually changes uh with fine-tuning the model's internal parameters or learned behavior are adjusted based on training examples. In practice, current platforms also support lighter weight approaches such as supervised finetunings and parameter efficient methods like LoRa or Qura which are designed to adapt models more cheaply than full fine-tuning. Now guys, let's say there is a company who wants an AI assistant for customer support. So let's say it has this AI assistant. Now without fine-tuning the model may answer correctly sometimes but the style may vary. Let's say for example one answer could be long another could be casual and another answer could miss out the company format. Now with fine-tuning okay what you can do is you can let's say fine-tune the answers. For example let's say there is a input which is asked by a customer. So let's say over here let's say for example let's say customers are asked okay for refund status and let's say the desired output could be all over here it could be polite it could be short or it could be most importantly brand aligned. So we won't miss that because while fine-tuning we have to consider these parameters and also it should follow the company's standard format. After such examples, the model becomes much more consistent at producing that kind of response and this matches the common use of fine-tuning for a specialized task. Let's say also the kind of output style that LLM should give and also focusing on organizationbased behavior. Now let me also compare how rag is kind of bit different with fine-tuning in this regard. So if if I have to say about this thing then we know that rag gives the model the right information. Repeat. So guys based on the past example we have got a brief idea that rag gives the model the right information at runtime by retrieving it from the document while fine-tuning changes how model behaves because it has been additionally trained on examples. So guys, use rag when you need the latest or document based knowledge and use fine-tuning when you need consistent style, format or task performance. Let's say uh you have an HR bot that must always answer in this format. So guys, let's say we have an HR bot all over here and it must answer based in this tone. For example, like the policy answers, action employee should take and HR contact note. Now, if you fine-tune the model on many HR examples in that exact structure, it can learn to answer in that style repeatedly. But if you leave the policy changes often, you would still need a rag or any other retrieval system for latest policy text. This is because fine-tuning is better for behavior and formatting while retrieval is better for dynamic knowledge. So in one line if I have to say fine-tuning means teaching a pre-trained model to behave better for your specific use case by training it on your own examples. Whereas rag you give your model the latest context by looking up at the knowledge sources external databases or APIs. So I hope so you would have got a brief idea regarding rag and fine-tuning. Now let us try to move ahead and explore prompt engineering. Now guys, I have opened the official documentation of prompt engineering and if you are interested to learn about it, I would definitely suggest you to go through this official documentation. Now what exactly is prompt engineering guys? Prompt engineering is the process of designing and refining the input you give to an AI model so that it produces more useful and accurate results. So guys, now let us try to understand what exactly is prompt engineering. So I've opened the official documentation of OpenAI and let me tell you what OpenAI says. It says that prompt engineering is a process of designing and refining the input that you give to an AI model. So it produces more useful, accurate and well structured output. OpenAI describes it as using strategies to improve results from prompts. And Google describes prompt design as creating prompts that elicit desired response from large language models. Now it might sound bit of technical but let me explain you something in very simple word. It means that it is basically how you ask AI. The model may be powerful but the quality of the answer often depends on how clearly you describe the task, how much context you provide or whether you are specifying the format or not or whether you are including examples or parameter. These are all common prompt design and best practices. So guys you can refer to OpenAI's prompt engineering guide or you can refer to Google prompt design guide. So they have released their official documentation on that and it is very helpful for you to understand. So you can see this uh given link all over here. It says prompt design strategies and you are going to get Gemini API. Similarly you can search for OpenAI prompt design strategy and you can see they have also listed out the same thing. So if you go all over here to this documentation you could see uh the respected type of a prompt and following that what is a generated output. Then you have the same thing you have the partial input completion and there are a lot of things and I would suggest you go through it such that you get a better idea regarding the scene. Now in very simple words first let me tell you what exactly is prompt engineering. Let's say I go to chat GPT and in the chat GPT I just give a very random question. Let's say explain or you can say this in simple English and let's copy some of the technical things like let's say so let's try to see the documentation for low rank fine-tuning algorithm and I I'm going to ask Chad GPT to actually explain this and let me copy all of these thing and I will say this now it is getting uploaded and I just give a respective prompt. Now in very simple words I would say that this is a very vague prompt. Now how I can improve this is like this way. So you can say something like this. you are an AI expert and I am referring to IBM officials documentation for LoRa which is a fine LLM fine-tuning technique. So use the IBM information and explain with real world example. Now you could see all over here how my prompt design has improved so much. Now I'm actually trying to give some context to our LLM and then it is going to return the respective response. Now there are lot of strategies how to actually improve your prompt design. So for that I have suggested you to refer these two things. Now similar these kind of thing is called as prompt engineering. So you are not exactly training the model but you are guiding it in a very proper way. Now let me also tell you why prompt engineering matters. So guys a well ststructured prompt engineering can improve response quality, accuracy, completeness and formatting. And prompt engineering is usually an iterative process where you test and refine prompts based on the outputs you get. So guys, what usually goes inside a good prompt? According to the prompt design guidance from OpenAI and Google, effective prompts often include clear instructions, relevant context, examples when useful and the desired structure of the format. So here you are going to get something like as task. So in the prompt engineering you are going to get something called as row. Then you need to give the proper task. Then you need to give the context of it. After that you're going to define what are the constraints involved in this and then you give a proper prompt engineering. Now if you compare prompt engineering versus rag versus fine-tuning then it's very simple. You can consider prompt engineering as the instructions to improve your required response and fine-tuning is improving models learned behavior through additional training. So prompt engineering changes the request not the model itself and rag and fine-tuning solve different problems. So in a nutshell prompt engineering is a practice of carefully writing and refining prompts so that an AI model gives the kind of answer you want. Now guys that was all for today's session. I hope so you would have got a brief idea regarding what is prompt engineering, what is fine-tuning and what is rack. And also guys, if you like these kind of videos, then I request you that do not forget to hit the notification bell icon such that you don't miss out any video from our end. Thank you guys for watching this video.