Stable Diffusion XL Pipeline
Orchestrates SDXL image generation via the Stability AI REST API with ControlNet conditioning, IP-Adapter style transfer, and automatic prompt enhancement using CLIP interrogation.
What it does
Stable Diffusion XL Pipeline
Orchestrates SDXL image generation via the Stability AI REST API with ControlNet conditioning, IP-Adapter style transfer, and automatic prompt enhancement using CLIP interrogation.
Installation
Use the upstream install or setup path that matches your environment:
- git clone https://github.com/Stability-AI/generative-models.git
- pip install hatch
- pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
- pip install --no-deps invisible-watermark
Requirements and caveats from upstream:
- python scripts/sampling/simple_video_sample_4d2.py --input_path assets/sv4d_videos/camel.gif --output_folder outputs (after downloading sv4d2.safetensors from HuggingFace...
- Run inference: python scripts/sampling/simple_video_sample_4d2.py --input_path <path/to/video>
- Run inference: python scripts/sampling/simple_video_sample_4d2.py --model_path checkpoints/sv4d2_8views.safetensors --input_path assets/sv4d_videos/chest.gif --output_folder outputs
Basic usage or getting-started notes:
-
To run SV4D 2.0 on a single input video of 21 frames:
-
Low VRAM environment : To run on GPUs with low VRAM, try setting --encoding_t=1 (of frames encoded at a time) and --decoding_t=1 (of frames decoded at a time) or lower video resolution like --img_size=512.
-
The 5x8 model takes 5 frames of input at a time. But the inference scripts for both model take 21-frame video as input by default (same as SV3D and SV4D), we run the model autoregressively until we generate 21 frames.
-
Extracted from upstream docs: https://raw.githubusercontent.com/Stability-AI/generative-models/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,858 chars)