Stable Diffusion XL LoRA Trainer
Fine-tune Stable Diffusion XL models with LoRA adapters using the diffusers library and Kohya-ss training scripts. Manages dataset preparation, training configuration, and checkpoint merging for custom image generation.
What it does
Stable Diffusion XL LoRA Trainer
Fine-tune Stable Diffusion XL models with LoRA adapters using the diffusers library and Kohya-ss training scripts. Manages dataset preparation, training configuration, and checkpoint merging for custom image generation.
Installation
Use the upstream install or setup path that matches your environment:
- git clone https://github.com/Stability-AI/generative-models.git
- pip install hatch
- pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
- pip install --no-deps invisible-watermark
Requirements and caveats from upstream:
- python scripts/sampling/simple_video_sample_4d2.py --input_path assets/sv4d_videos/camel.gif --output_folder outputs (after downloading sv4d2.safetensors from HuggingFace...
- Run inference: python scripts/sampling/simple_video_sample_4d2.py --input_path <path/to/video>
- Run inference: python scripts/sampling/simple_video_sample_4d2.py --model_path checkpoints/sv4d2_8views.safetensors --input_path assets/sv4d_videos/chest.gif --output_folder outputs
Basic usage or getting-started notes:
-
To run SV4D 2.0 on a single input video of 21 frames:
-
Low VRAM environment : To run on GPUs with low VRAM, try setting --encoding_t=1 (of frames encoded at a time) and --decoding_t=1 (of frames decoded at a time) or lower video resolution like --img_size=512.
-
The 5x8 model takes 5 frames of input at a time. But the inference scripts for both model take 21-frame video as input by default (same as SV3D and SV4D), we run the model autoregressively until we generate 21 frames.
-
Extracted from upstream docs: https://raw.githubusercontent.com/Stability-AI/generative-models/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,905 chars)