Why Self-Host AI?
Cloud AI services charge per token and process your data on their servers. Self-hosting with Ollama means your conversations never leave your infrastructure. You pay nothing per query, and sensitive business data stays private.
Hardware Requirements
The amount of RAM you need depends on the model size:
| Model Size | RAM Required | Example Models | Quality |
|---|---|---|---|
| 7B parameters | 4–8 GB | Llama 3.1 7B, Mistral 7B, Gemma 2 7B | Good for simple tasks, summaries, code help |
| 13B parameters | 8–16 GB | Llama 3.1 13B, CodeLlama 13B | Better reasoning and writing quality |
| 70B parameters | 40–48 GB | Llama 3.1 70B, Mixtral 8x7B | Near commercial-grade quality |
GPU acceleration: A dedicated GPU dramatically speeds up inference. Without a GPU, responses will be slower but still functional for smaller models.
Step 1: Deploy Ollama
In Panelica, go to Docker → App Templates and deploy Ollama. This sets up the inference engine that runs the AI models.
Step 2: Deploy Open WebUI
Next, deploy Open WebUI from the same App Templates page. This gives you a polished ChatGPT-like interface that connects to Ollama.
Step 3: Download a Model
Open the WebUI and go to Settings → Models. Type a model name and click download:
llama3.1:7b— Meta's latest open model, great starting pointmistral:7b— Excellent for European languagescodellama:13b— Optimized for code generationgemma2:9b— Google's open model, strong reasoning
The model will download once and be stored locally. Subsequent uses are instant.
Step 4: Start Chatting
Select your downloaded model from the dropdown and start a conversation. The interface supports:
- Multi-turn conversations with context memory
- System prompts to customize behavior
- Code highlighting with syntax-aware formatting
- File uploads for document analysis (supported models)
- Multiple users with separate conversation histories
Performance Tips
- Start with a 7B model to test your hardware, then try larger models
- Keep one model loaded at a time to avoid memory pressure
- Monitor memory usage in Panelica's Monitoring dashboard
- If responses are too slow, try a smaller or quantized model variant (e.g.,
llama3.1:7b-q4_0)
Summary
Running your own AI assistant costs nothing beyond server resources. With Panelica's Docker templates, you can deploy Ollama and Open WebUI in minutes, giving your team a private, unlimited AI assistant with zero per-query fees.