QWEN3-8B-VL Image/Video Caption (Uncensored)
About this model
QWEN3-8B Image/Video Caption (Uncensored)
Version 2 - This version is highly attuned to NSFW content. However do to image only training it may generate some video captions as image.
This version requires 24GB or more of VRAM
Full Finetune (NOT A LORA MERGE) of the 8B parameter model (Vision Frozen)
BF16/TF32 training unfortunately do to the size of the model Adam8bit needed to be used.
Version 2 Can use nearly any LLM prompt - Version 1 should use the prompt given in whole or in part.
Details regarding training of version 1 can be read about here.
Note: No image size safety is built in I have captioned 4k images which will be processed to a very large tensor shape - however reduction to 1k images is recommend
I have an Ampere series card and can not convert this to FP8 or NF4 in high quality. If you have experience converting models with Linux and Transformer Engine DM me.
Tags
Related Models
Similar AI models you may like
Hand Detailer/Segmentation - ADetailer
CLIP-L & CLIP-G Full FP32 (Zer0Int & Simulacrum)
【TOOL】ComfyUI Installer
ComfyUI inpaint workflow | Updated for anima
ComfyUI Multi-Subject Workflows
SingularUnity Super Simple Workflows - Wan I2V 60FPS + Upscale / Hunyuan V2V 48FPS + Upscale.
More than 50 workflows for Perfect Images! Flux, Pony, SDXL, Kolors, Upscale,..