Build Your Own AI Coding Assistant: A Step-by-Step Guide

Aaro Alhainen
4 min readFeb 12, 2025

--

Image co-created By MS Copilot

Inspired by the recent news about Deepseek models I tested how well they actually are performing on my local machine. For this testing, the local machine I use is a MacBook Pro rocking with an M3 Pro chip. Being surprised by the performance I never expected I started to think what I could use these models for. I used GitHub Copilot for almost two years, but due to the nature of my projects, it didn’t demonstrate its value as effectively as I had hoped. Based on that experience I started to think could I make my own fully local copilot clone without any additional cost 🤔

In addition to the cost more important is security and lately from the news we have seen that you should be very cautious about what you will feed into these models since the only thing you can be sure of is that the data you send to these models is stored somewhere and you have no control over it anymore. These are just not your things but if you use these tools at work all the info regarding customers or other internal things is so precious that you don’t want to give it a way or in most cases it is not even want question but instead you can’t.

For testing the AI models I used LM Studio. It is a very easy-to-use tool for that and offers a nice way to test different models with ease. LM studio has also this feature that you can enable the server to provide the selected model via API similar to OpenAI. The only problem is that LM Studio is licensed so you can only use it for your personal things and not work. For work, you need to request an approval/license. After quick googling the good alternative is to use Ollama which is also open source! By following these steps you can get it to work on your machine as well:

1. Lets install Ollama

To get started install Ollama on your machine. After that open the terminal window and type ollama --help to confirm that the installation succeeded. You should see the following:

Ollama is installed correctly

2. Downloading and installing the model

The model we are interested in is called qwen2.5-coder which is pre-prompted to behave well in a coding context. To get the model run ollama pull qwen2.5-coder:7b. This command will pull a 7 billion parameter version of that model to your laptop. This 7b model requires at least 8 GB of available RAM to run so depending on your RAM availability you could go smaller or even bigger. For this tutorial, we are sticking with the 7b model since it offers quite a good system requirements-performance ratio. To test the model you can run ollama run qwen2.5-coder:7b. You should see the following prompt where you can ask anything you would like:

Ollama running correctly

This confirms that the model is fully working and usable on your system.

3. Add Continue extension to your Visual Studio Code

The next step is to prepare Visual Studio Code to utilize this model. Let's start by installing the Continue extension into VS Code.

Continue extension in VS Code marketplace

4. Configuring the Continue extension

After installing the extension you can see in the bottom right corner text “Continue” with a check mark. The next step is configuring the extension to use the local model we installed a moment ago. To open this configuration file click the “Continue” -button from the bottom right corner and then select “Configure autocomplete options”:

Continue config.json configuration file

In this config.json we want to set “models” and “tabAutocompleteModel” to point to our models. Here the important part is to set the “provider” to “ollama” which tells the extension which kind of API and where it should look it for. Then “model” will tell Ollama which model it should use to respond to the request. Below you can find the changed parts:

{
"models": [
{
"model": "qwen2.5-coder:7b",
"provider": "ollama",
"title": "Qwen-coder"
}
],
"tabAutocompleteModel": {
"title": "Qwen-coder",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
},
...
}

Now just save the config.json file and we are ready! 🎉 Now you can see the model in action:

You local coding assistant in action

Now you have your own local coding assistant running! If this post was helpful and you liked it please drop a like and see you next time! 😄

--

--

Aaro Alhainen
Aaro Alhainen

No responses yet