TextGen by oobabooga¶
Source: github.com/oobabooga/textgen
Author: oobabooga
Date: Active development (5,600+ commits, 112 releases)
Core Project¶
TextGen is an open-source desktop application designed to run large language models (LLMs) locally on consumer hardware. It provides both a privacy-first web interface and an OpenAI/Anthropic-compatible API, enabling users to self-host chat, vision, tool-calling, and web-search capabilities without relying on third-party cloud services or sending data externally.
Key Features¶
- Local-First & Private: No telemetry; all inference runs on your own machine.
- Multiple Interaction Modes: Supports
instruct,chat-instruct, andchatmodes. - OpenAI-Compatible API: Drop-in replacement for OpenAI/Anthropic APIs, enabling integration with existing tools.
- Vision & Tool-Calling: Supports multimodal inputs (vision) and agentic tool use.
- Web Search: Built-in web search capabilities within the UI.
- Training & Extensions: Supports model fine-tuning, image generation via
diffusers, and extensions for TTS, voice input, and translation. - Cross-Platform: Available for Linux, Windows, and macOS (Intel and Apple Silicon).
Installation Methods¶
1. Portable Desktop App (Easiest)¶
Download a pre-built portable release from GitHub Releases. Includes all dependencies (CUDA, Vulkan, ROCm, CPU-only). Compatible only with single-file GGUF models.
2. One-Click Installer (Web UI in Browser)¶
Run the OS-specific start script (start_windows.bat, start_linux.sh, start_macos.sh). Creates a local environment and launches the UI at http://127.0.0.1:7860. Supports environment variables for silent/automated installs.
3. Full Conda Installation (Most Flexible)¶
For users needing multi-file model support (Transformers, EXL3), training, or extensions:
- Requires Conda (Miniforge recommended) and Python 3.13
- Install PyTorch 2.9.1 with hardware-specific wheels (NVIDIA CUDA 12.8, AMD ROCm, Apple MPS, CPU-only)
- Clone repo and install from requirements/full/
- Requires ~10GB disk space
Model Support¶
| Format | File Structure | Portable Build | Full Install |
|---|---|---|---|
| GGUF | Single .gguf file in user_data/models |
✅ Yes | ✅ Yes |
| Transformers | Multi-file folder in user_data/models/<name>/ |
❌ No | ✅ Yes |
| EXL3 | Multi-file folder | ❌ No | ✅ Yes |
Users are directed to resources like LocalBench for recommended GGUF quantizations and a VRAM calculator for memory planning.
Architecture & Backends¶
- llama.cpp (GGUF inference)
- Transformers (Hugging Face models)
- ExLlamaV3 (Optimized inference for specific architectures)
- Supports CPU, NVIDIA CUDA, AMD ROCm, Apple Metal (MPS), and Vulkan
Privacy & Philosophy¶
TextGen is explicitly built for users who want full control over their AI workloads. By running locally, users avoid data leakage, API rate limits, and subscription costs. The project emphasizes ease of use for beginners while offering deep configurability for advanced users through command-line flags, persistent settings, and an extension ecosystem.
Ecosystem¶
- Documentation: GitHub Wiki
- Community: r/Oobabooga
- VRAM Calculator: Hugging Face Space