How to Setup OpenCode & PI Agent with Llama.cpp (Qwen 3.6 Local LLM)

Tony Xhepa| 00:14:32|Apr 18, 2026
Chapters10
Overview of using local models with pi agent and open code, and how to install necessary tools across platforms.

Tony Xhepa shows how to run a local LLM stack using Qwen 3.6 (35B) with Llama.cpp, wired to OpenCode and PI Agent, plus practical tips for server setup, context size, and installing skills.

Summary

Tony Xhepa guides viewers through setting up a local model workflow with OpenCode and the PI Agent, centered on a Llama.cpp-based Qwen 3.6 35B model. He demonstrates installing dependencies with simple one-liner commands and explains why Llama.cpp offers speed advantages over LM Studio. The video walks through selecting a model (Qwen 3.6 35B, 4-bit quantization, near 22 GB), comparing benchmarks against Gemma variants, and configuring a local server via Llama CPP. Tony then shows how to wire the local model into PI Agent by editing models.json, including token windows (62k context, 32k max tokens) and local-only flags. He switches to OpenCode, creating an open_code.json to point to a local Llama.cpp server, and runs the server with context flags to obtain a usable web UI. The tutorial covers practical steps like starting the local server, adjusting port URLs, and validating token usage when chatting with the model. Finally, Tony expands the toolkit by adding “brainstorming” and “find skills” through MPX, and demonstrates how OpenCode and PI Agent share skills (e.g., front-end design, cloud code) to enrich a local AI assistant. The overall takeaway is a reproducible, end-to-end workflow for running a fast local LLM with configurable agents and extensible skills.

Key Takeaways

  • The Qwen 3.6 35B model (4-bit quantization, ~22 GB) dominates benchmarks versus Gemma variants in Tony Xhepa's setup.
  • Llama.cpp provides faster local inference compared to LM Studio, with a usable local server and a web UI via OpenCode integration.
  • A local PI Agent setup requires editing models.json (including context window 62k and max tokens 32k) to point at the Qwen 3.6 35B model.
  • OpenCode.json is used to connect a local Llama.cpp server to the OpenCode provider, enabling a web UI and local-prompted responses.
  • MPX skills like brainstorming and find skills can be installed to extend the agent's capabilities, then attached to both OpenCode and PI Agent.
  • The workflow includes installing dependencies with streamlined commands usable on macOS, Windows, and Linux, and swapping between OpenCode providers and local models.
  • Token usage visualization shows the model consuming a portion of the 62k context window (e.g., ~17% used for a 10k+ token prompt).

Who Is This For?

Essential viewing for developers who want a local, fast AI stack: combining OpenCode, PI Agent, and Llama.cpp with Qwen 3.6 35B. Great for those who want a reproducible path to run local LLMs with customizable tools and skills.

Notable Quotes

"Hello friends, Tony here. Welcome."
Opening greeting sets the casual, hands-on tutorial tone.
"we can work open code is great. Has also built-in agents with pi the install."
Tony notes the integration between OpenCode and PI Agent.
"I'm going to work with llama cpp and a local model which is the quen 3.6 35 billion."
Identifies the main model choice and local setup focus.
"you can copy this command and llama server is going to start a local openi compatible server with a web UI"
Shows how to launch the local server and access the UI.
"which tools do you have? I have access to four main tools. Read, bash, Edit, Write."
Demonstrates the tools exposed by the OpenCode/PI Agent combo.

Questions This Video Answers

  • How do I set up a local Llama.cpp model with Qwen 3.6 35B and run it with OpenCode?
  • What are the key steps to connect a local Llama.cpp server to OpenCode.json for a PI Agent?
  • Which MPX skills are most useful for enhancing a local AI agent with OpenCode and PI Agent?
  • What is the significance of a 62k context window and 32k max tokens in a local LLM setup?
  • How do I install and run the local server for a local OpenCode/PI Agent workflow on Windows, macOS, or Linux?
OpenCodePI AgentLlama.cppQwen 3.6 35B4-bit quantizationlocal model serverOpenCode.jsonmodels.jsonskills (brainstorming, find skills)MPX
Full Transcript
Hello friends, Tony here. Welcome. In today's video, I'm going to show you how to customize your agent with local models and I'm going to work with this pi agent and also with open code. Uh with open code, yeah, we can work open code is great. Has also built-in agents with pi the install. Yeah, it install open code. You just copy this command or by mpm bun bro and parro just this command and you can install in every machine so Mac OS windows and Linux the same for pi just copy this command and install in your machine and in this example I'm going to work with llama cpp and a local model which is the quen 3.635 6 35 billion. Okay, I'm using the 4bit quantization which is this one. Medium 22 GB. And you can scroll down and see if the benchmarks here for example with the here is the quen 3.6 35 billion and 3.535 billion. The 3.6 which is this color dominate in every benchmark as you can see here. terminal bench 51 3.5 is 40 and Gemma which is uh yeah here we have Gemma 4 26 billion is very low then we have a Gemma 4 which is 31 billion which is this one is okay but it's not compared to this 3.6 six. Okay. Also here, Gemma also here in every benchmark wins this quen 3.6 and quen 3.6 is trained until 26. So let's close this and if you want yeah you can install this llama CPP and then go here and for example select I have chose this unslot but you can choose what you like and then for example choose what I chosen here and use this model and you can use this model with llama CPP or LM studio. LM studio is has better UI but Llama CPP is more faster. So here it's going to suggest you to install llama cbp and then you can copy this command and llama server is going to start a local openi compatible server with a web UI or just run in your terminal by saying llama cli and the name dash hf for the hugging face and the model but as I told you I'm going to work to use this with pi agent and also here you have pi it's going to suggest you install lama CPP start the server and then configure the model in pi so install the pi and then you need to go to py agent and create this models that just so let's show you I am in the cmax I'm going to cd and then we need to go let's zoom it we need to go to dot and then we have py and agent and let's open this with VS code. I'm going to open with VS code and here I have created in this directory in the agent directory the models.json file and just paste this code from here. Now I have modified a little bit. So I have added the ID which is suggesting here and then I said name to bequen 3.6 6 35 billion local reason in the true input text context window to be 62k and max tokens 32k cost 000 but yeah this we can remove because this is local model and this is for pi agent. Now if you want to work also with open code you can go to open code documentation we have providers and here we have the list of providers for this uh open code and if I come here and just say open code okay here we are by default we have this big pickle okay but we can switch the this one which I have added as a favorite but let's show you also what you can on the providers scroll down and we have llama cpp also you have this lm studio but we need the llama cpp now you need to go and create this open codejson and paste this code there but where is that open codejson file for the pi we are inside the dot and we have here the pi but also we have a open code we are not going to add here the configuration For the open code, we're going to add in the config. And then we have open code. Here we are going to open this with the code editor. And then you need to create open code.json file and here paste this file this uh code. Okay. Okay. So here I have added we have options base URL is going to be 127001 / 880/ version one and also I have added the name. Now the name mod is what you have here. So just copy this one or if you have another model copy there and then I have added also the name to change that name to be quen 3.6. 6 35 billion and let's say also here local and limit context 62,000 62,000 save and then if we come here or let's go the her directory and if we open the open code but first we need to run this model and if we say for example use this with llama CVP start a local server. We need to say llama server and this one just copy. We have also some uh flags if you want. Let me just start my server. So llama server and I have added some flags. So ngl d c which is for the contextb and t dash ginga. I'm going to hit enter and here we have the link for that we have this URL. Okay. If you change the port to be something else, you can change that here. I'm going to close this. And now let's choose in the open code to choose this model. So model Gwen 3.6. I'm going I'm going to say just high here. And I want you to look to the tokens. is going to spend only because we set high and the model is going to answer us. And the same thing I'm going to do with the pi agent. So let's say or let's just work first with this one and then close. Okay. So here we have how can I help you? And just by saying hi, how can I help you? We have a context 10,000 or 10.7K tokens. So 17% used because we have only 62,000 tokens. Okay. Now let's close this and let's open the PI agent. I'm going to zoom in here and I'm going to say yeah also you can see Pi is working on this one. You can change the model here also session models and right now I have only this one which is this quen 3.6 login for example / login and you log to anthropic github copilot google and gravity and chatd okay now if I say hi to this one we are on this one 3.6 six. Just say hi. We're going to see the token is going to be used by this one. 2.5% of 62 and is only 1.5 token K tokens. Okay. Here, how can I help you? But yeah, we have a bot here. If I say which tools do you have? So which I'm going to ask which tools do you have? So I have access to four main tools. Read, bash, is going to run terminal commands like ls, grab, find and so on. Edit to make precise target that edits the existing files and write. So create new files or completely override the existing one. And also for this one we have 1.6k tokens read 1.5 and 3.2%. Now if I open open code we're inside we're using this uh quen 3.6 and I'm going to do the same prompt. So which tools do you have? And as you can see now the open code has more tools has read edit write globe grab bash task web fetch and question tools. Now if we want to work with Laravel and so on, I suggest you to go let's go to skills.sh and here you can find skills. So for example, you can install this find skills or front- end design from the anthropic case of front end design web design. Also I'm going to show you the brainstorming. This is very popular. You can see has 110k uh installs. And if you click here, here is how you can install. So MPX scales at there's their skill brainstorming. And you can see GitHub stars is almost 160k stars. Okay. So now I right now I don't have the brainstorming skill installed. So I'm going to copy this and let's go here. Let's close this one also. Let's close the pi and I'm going to paste this one. We are inside the her directory. If I say mpx install. So I'm going to say y. Yes. And yes here which agents do you want to install? We have universal. So in in dot agents/sklls always included the amp undergravity client codeex cursor open code or additional agents and we have augment then I'm going to scroll down we have also the cloud code we have open cloud and we have also others just I want to show you yeah we have the pi okay pi pi in the py skills. So I'm going to say also pi and the open code uh cloud code sorry because the open code is universal is here and I'm going to hit enter now if you want so installation scope in the project install in current directory if we are inside the project it's going to install in the project directory but I prefer in home directory okay and it's going to install in the aent skills brainstorming. I'm going to hit enter install the find skills. Yes. Done. So now let's CD and if I CD to say dot we have this agents and we have then we have skills and we have brainstorming skill and also find skills. Let's go to her directory and I'm going to open now the open code. And now I'm going to add what skills do you have? We have brainstorming and also find skills only two skills. The same thing if I open the pie and let me just say what skills do you have? And as you can see, I have two skills, the brainstorming skill and also the find skills. This is how you can add skills in your agent. Now, yeah, you can search for more skills here. And this is the find skill which is the most downloaded skill, but we installed that because we installed the brainstorming. Now, also this front end design is very good. So, I'm going to also install that one. So, just paste that in. And I'm going to do the same thing. I'm going to Yeah, as you can see, now we have selected the cloud code and also pi. If we scroll. Yeah, I'm going to hit enter. I'm going to choose global sim link recommended and proceed with installation. So if I uh run again by and ask what skills, you can see we don't need to ask because here are the skills brainstorming find skill and the front end design. Okay friends, that's it all about this video. What I wanted to show you how you can uh configure your PI agent and also the open code to work with local model in this case the quen 3.6. So all the best and I'm going to see you in another video. Thank you very much.

Get daily recaps from
Tony Xhepa

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.