Run Your Own AI Assistant with Ollama and Open WebUI

Why Self-Host AI?

Cloud AI services charge per token and process your data on their servers. Self-hosting with Ollama means your conversations never leave your infrastructure. You pay nothing per query, and sensitive business data stays private.

Hardware Requirements

The amount of RAM you need depends on the model size:

Model Size	RAM Required	Example Models	Quality
7B parameters	4–8 GB	Llama 3.1 7B, Mistral 7B, Gemma 2 7B	Good for simple tasks, summaries, code help
13B parameters	8–16 GB	Llama 3.1 13B, CodeLlama 13B	Better reasoning and writing quality
70B parameters	40–48 GB	Llama 3.1 70B, Mixtral 8x7B	Near commercial-grade quality

GPU acceleration: A dedicated GPU dramatically speeds up inference. Without a GPU, responses will be slower but still functional for smaller models.

Step 1: Deploy Ollama

In Panelica, go to Docker → App Templates and deploy Ollama. This sets up the inference engine that runs the AI models.

Step 2: Deploy Open WebUI

Next, deploy Open WebUI from the same App Templates page. This gives you a polished ChatGPT-like interface that connects to Ollama.

Step 3: Download a Model

Open the WebUI and go to Settings → Models. Type a model name and click download:

llama3.1:7b — Meta's latest open model, great starting point
mistral:7b — Excellent for European languages
codellama:13b — Optimized for code generation
gemma2:9b — Google's open model, strong reasoning

The model will download once and be stored locally. Subsequent uses are instant.

Step 4: Start Chatting

Select your downloaded model from the dropdown and start a conversation. The interface supports:

Multi-turn conversations with context memory
System prompts to customize behavior
Code highlighting with syntax-aware formatting
File uploads for document analysis (supported models)
Multiple users with separate conversation histories

Performance Tips

Start with a 7B model to test your hardware, then try larger models
Keep one model loaded at a time to avoid memory pressure
Monitor memory usage in Panelica's Monitoring dashboard
If responses are too slow, try a smaller or quantized model variant (e.g., llama3.1:7b-q4_0)

Summary

Running your own AI assistant costs nothing beyond server resources. With Panelica's Docker templates, you can deploy Ollama and Open WebUI in minutes, giving your team a private, unlimited AI assistant with zero per-query fees.

Why Self-Host AI?

Hardware Requirements

Step 1: Deploy Ollama

Step 2: Deploy Open WebUI

Step 3: Download a Model

Step 4: Start Chatting

Performance Tips

Summary

Run your servers on a modern panel.

Related articles

Email Warm-Up: Build Sender Reputation for a New IP

Wildcard DNS and Wildcard SSL: When and How to Set Up

How to Set Up a Catch-All Email Address