Text Generation WebUI 本地 LLM 介面

專案簡介

Text Generation WebUI 是一個功能完整的 Gradio 網頁介面，用於運行各種大型語言模型。支援多種模型格式和載入器，是本地 LLM 實驗的最佳平台。

GitHub Stars: 46K+

主要功能

多格式支援 - GGUF、GPTQ、AWQ、EXL2
多載入器 - llama.cpp、ExLlamaV2、Transformers
擴充系統 - 語音、圖片生成、RAG
角色扮演 - 自訂角色和對話模式
API 相容 - OpenAI API 格式

安裝

一鍵安裝

1
2
3
4
5
6
7
# Linux/macOS
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh

# Windows
start_windows.bat

手動安裝

1
2
3
4
5
6
7
conda create -n textgen python=3.11
conda activate textgen

pip install torch torchvision torchaudio
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt

啟動

基本啟動

1
python server.py

訪問 http://localhost:7860

常用參數

1
2
3
4
5
6
7
8
# 啟用 API
python server.py --api

# 指定監聽位址
python server.py --listen --listen-port 8080

# 啟用共享（公開存取）
python server.py --share

模型下載

透過 WebUI 下載

前往「Model」頁籤
輸入 HuggingFace 模型名稱
點擊「Download」

命令列下載

1
python download-model.py TheBloke/Llama-2-7B-GGUF

模型存放

1
2
3
4
text-generation-webui/
└── models/
    ├── llama-2-7b.Q4_K_M.gguf
    └── mistral-7b-instruct/

載入器選擇

載入器	格式	特點
llama.cpp	GGUF	CPU 友好，量化支援
ExLlamaV2	EXL2/GPTQ	GPU 最佳效能
Transformers	HF 原始	完整功能支援
AutoGPTQ	GPTQ	GPU 量化推理

載入器設定

1
2
3
4
5
# models/config.yaml
llama-2-7b:
  loader: llama.cpp
  n_gpu_layers: 35
  n_ctx: 4096

對話模式

Chat 模式

標準聊天介面，支援：

系統提示詞
對話歷史
參數調整

Notebook 模式

自由文字編輯，適合：

故事創作
程式碼生成
長文編輯

Instruct 模式

指令格式對話，使用模型特定模板：

Alpaca
Vicuna
ChatML

角色設定

建立角色

1
2
3
4
5
6
7
8
9
# characters/assistant.yaml
name: Security Expert
context: |
  You are a cybersecurity expert with deep knowledge
  of penetration testing, vulnerability assessment,
  and secure coding practices.  
greeting: |
  Hello! I'm your security consultant. How can I help
  you with your security concerns today?  

使用角色

前往「Parameters」→「Character」
選擇角色檔案
開始對話

API 使用

啟用 API

1
python server.py --api --api-port 5000

OpenAI 格式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5000/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="llama-2-7b",
    messages=[
        {"role": "user", "content": "What is SQL injection?"}
    ]
)

原生 API

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import requests

response = requests.post(
    "http://localhost:5000/api/v1/generate",
    json={
        "prompt": "Explain XSS attacks:",
        "max_new_tokens": 200,
        "temperature": 0.7
    }
)

擴充功能

常用擴充

1
2
3
4
5
6
7
8
9
# 語音輸入/輸出
extensions/whisper_stt/
extensions/silero_tts/

# RAG
extensions/superboogav2/

# 多模態
extensions/multimodal/

啟用擴充

1
python server.py --extensions whisper_stt silero_tts

效能優化

GPU 設定

1
2
3
4
5
# 調整 GPU 層數
--n-gpu-layers 35

# 設定 context 大小
--n_ctx 4096

記憶體優化

1
2
3
4
5
# 使用 4-bit 量化
--load-in-4bit

# 減少 context
--n_ctx 2048

Text Generation WebUI 本地 LLM 介面

使用 Text Generation WebUI 在本地運行各種開源 LLM，支援多種載入器和擴充功能

專案簡介

主要功能

安裝

一鍵安裝

手動安裝

啟動

基本啟動

常用參數

模型下載

透過 WebUI 下載

命令列下載

模型存放

載入器選擇

載入器設定

對話模式

Chat 模式

Notebook 模式

Instruct 模式

角色設定

建立角色

使用角色

API 使用

啟用 API

OpenAI 格式

原生 API

擴充功能

常用擴充

啟用擴充

效能優化

GPU 設定

記憶體優化

相關連結

延伸閱讀