Tutorial

Run Your Own AI Assistant with Ollama and Open WebUI

March 12, 2026

Back to Blog

Why Self-Host AI?

Cloud AI services charge per token and process your data on their servers. Self-hosting with Ollama means your conversations never leave your infrastructure. You pay nothing per query, and sensitive business data stays private.

Hardware Requirements

The amount of RAM you need depends on the model size:

Model SizeRAM RequiredExample ModelsQuality
7B parameters4–8 GBLlama 3.1 7B, Mistral 7B, Gemma 2 7BGood for simple tasks, summaries, code help
13B parameters8–16 GBLlama 3.1 13B, CodeLlama 13BBetter reasoning and writing quality
70B parameters40–48 GBLlama 3.1 70B, Mixtral 8x7BNear commercial-grade quality
GPU acceleration: A dedicated GPU dramatically speeds up inference. Without a GPU, responses will be slower but still functional for smaller models.

Step 1: Deploy Ollama

In Panelica, go to Docker → App Templates and deploy Ollama. This sets up the inference engine that runs the AI models.

Step 2: Deploy Open WebUI

Next, deploy Open WebUI from the same App Templates page. This gives you a polished ChatGPT-like interface that connects to Ollama.

Step 3: Download a Model

Open the WebUI and go to Settings → Models. Type a model name and click download:

  • llama3.1:7b — Meta's latest open model, great starting point
  • mistral:7b — Excellent for European languages
  • codellama:13b — Optimized for code generation
  • gemma2:9b — Google's open model, strong reasoning

The model will download once and be stored locally. Subsequent uses are instant.

Step 4: Start Chatting

Select your downloaded model from the dropdown and start a conversation. The interface supports:

  • Multi-turn conversations with context memory
  • System prompts to customize behavior
  • Code highlighting with syntax-aware formatting
  • File uploads for document analysis (supported models)
  • Multiple users with separate conversation histories

Performance Tips

  • Start with a 7B model to test your hardware, then try larger models
  • Keep one model loaded at a time to avoid memory pressure
  • Monitor memory usage in Panelica's Monitoring dashboard
  • If responses are too slow, try a smaller or quantized model variant (e.g., llama3.1:7b-q4_0)

Summary

Running your own AI assistant costs nothing beyond server resources. With Panelica's Docker templates, you can deploy Ollama and Open WebUI in minutes, giving your team a private, unlimited AI assistant with zero per-query fees.

Share: