Using Local AI Models
The AI Assistant can connect to locally hosted AI models instead of cloud services. This offers several advantages:
- Privacy — Your project data never leaves your machine
- No API costs — Run as many queries as you want for free
- Offline use — Works without an internet connection
- Full control — Choose and customize your own models
Supported Local Providers
Section titled “Supported Local Providers”| Software | Provider Setting | Default Host | Default Port |
|---|---|---|---|
| Ollama | Ollama | localhost | 11434 |
| LM Studio | LlamaCpp | localhost | 1234 |
| llama.cpp server | LlamaCpp | localhost | 8080 |
Setting Up a Local Provider
Section titled “Setting Up a Local Provider”Ollama is the easiest way to run local AI models on your machine.
-
Install Ollama
Download and install Ollama from ollama.com.
-
Pull a model
Open a terminal and download a model that supports function calling:
Terminal window ollama pull llama3 -
Configure Paquet Builder
- Open the AI Settings dialog in Paquet Builder
- Select Ollama as the provider
- The default host (
localhost) and port (11434) work automatically - Enter the model name (e.g.,
llama3) in the Model field
-
Start chatting
The AI Assistant will connect to your local Ollama instance. Ollama starts automatically when a request is received.
LM Studio provides a user-friendly desktop application for downloading and running local models.
-
Install LM Studio
Download and install from lmstudio.ai.
-
Download a model
Use the built-in model browser to search and download a model. Look for models labeled with “tool use” or “function calling” support.
-
Start the local server
- Go to the Local Server tab in LM Studio
- Load your downloaded model
- Click Start Server — it will listen on port 1234 by default
-
Configure Paquet Builder
- Open AI Settings
- Select LlamaCpp as the provider
- Set the port to 1234
- The Model field can be left empty (LM Studio uses the currently loaded model)
For advanced users, llama.cpp provides a lightweight, high-performance server.
-
Download or build llama.cpp
Get the latest release from the GitHub repository or build from source.
-
Start the server
Terminal window llama-server -m your-model.gguf --port 8080 -
Configure Paquet Builder
- Select LlamaCpp as the provider
- Set the port to 8080 (or whatever port you used)
- Leave the Model field empty
Recommended Local Models
Section titled “Recommended Local Models”For the AI Assistant’s function calling (tool use) to work properly, the model must support structured tool calls. Here are recommended models:
| Model | Size | Function Calling | Notes |
|---|---|---|---|
| Llama 3 (8B) | ~5 GB | Good | Best balance of speed and capability |
| Llama 3 (70B) | ~40 GB | Very Good | Requires 48+ GB RAM or powerful GPU |
| Qwen 2.5 (7B) | ~4 GB | Good | Strong tool use, fast inference |
| Mistral (7B) | ~4 GB | Good | Reliable function calling |
Limitations of Local Models
Section titled “Limitations of Local Models”Custom Host and Port
Section titled “Custom Host and Port”If your local AI server runs on a different machine or a custom port:
-
Open AI Settings
-
Modify the Host field (e.g.,
192.168.1.100for a server on your local network) -
Modify the Port field as needed
Hybrid Approach
Section titled “Hybrid Approach”You can switch between local and cloud providers at any time by changing the provider in AI Settings. A practical approach is to:
- Use a local model for quick, simple tasks (changing a setting, listing components)
- Switch to a cloud provider for complex tasks (building custom action workflows, full project setup)