Photanima

Photanima is an experimental finetune of Anima Base v1.0 to see whether it is a viable architecture for photography. Spoiler alert: it totally is.

Turbo LoRA baked in. If you're on a 30-series GPU, I recommend using this with the INT8 Toolkit + INT8 Lazy Torch Compile node for wicked fast gen times. All demo images generated with that combo. These are raw outputs; no upscaling or post-processing.

Demo images contain workflows with custom sigma curve and ODE sampler. These both help significantly with realism. Alternatively, you can download a standalone workflow here.

❤️ If you enjoy Photanima, you can help offset the cost of training:

Buy liftweights a Coffee

🤓 Technical details

v2 is trained on ~2000 images for 45,000 steps. This is an expansion of my Snakebite 2.3 dataset with around 700 new images and captions reworked for Anima. Training took approximately 48 hours on a Geforce 3090.

Pros:

Extremely fast.
Extremely good prompt adherence.
Anatomy is pretty stable. If it screws something up, changing your steps by +1/-1 usually fixes it.
Supports up to nearly 2MP with little-to-no distortions.

At first, I noticed that Photanima's style was inconsistent - it had a tendency to regress toward a cartoony/CGI look as my prompts became more complex. I was able to mostly overcome this by splitting Photanima into constituent content, style-early, and style-late blocks, then boosted the style blocks well past a strength of 1.

"Style-early" maps to blocks 7, 8, and 9 - these do alter composition to a degree, so we can't boost them as hard as "style-late."

Images are pretty consistent now, but there are some notable drawbacks.

Cons in v2:

It loses a little knowledge of certain artistic terms like silhouette.
Microdetail quality is somewhere between SDXL and ZIT. Honestly, it's really good for a 2B model. Two-step upscaling with Anima doesn't help much, but I'm sure the results would be amazing if you sent a Photanima image to a different model for refinement. Or if that's too much work: just add a little film grain. It does wonders and requires no extra VRAM.
Text capabilities are not as good as those of base Anima. Anything beyond 3 or 4 words is likely going to require numerous re-rolls.
Excessive fluff tags like masterpiece, absurdres, hyperreal tends to fry the image. The model is photographic and highly aesthetic by default, so there's no need to drive it harder in that direction.

🛠️ Recommended Settings (v2)

8-10 steps.
er_sde sampler on "ODE" mode.
Custom sigma curve or simple scheduler: "1.0, 0.93, 0.9, 0.825, 0.55, 0.5, 0.2932, 0.29, 0.2, 0.0000"
CFG 1.
Preferred resolution: 1040x1520 or 832x1216.
For maximum realism, begin your prompt with real life photo. If that's not enough, add photo \(medium\) and increase its strength until satisfied. You can usually go up to a crazy strength value like 5 or 6 without breaking the image.

Tips:

If textures are overcooked, lower steps to 6.
You can reduce the first number on the sigma curve to 0.95-0.99 to improve realism. This reduces saturation and adds a little noise, but makes the model less stable.

Check top of post for standalone workflow.

Base model settings:

30-50 steps.
Euler sampler.
Simpler scheduler.
CFG 4-6.
Use a bunch of fluff tags like masterpiece, score_9, absurdres, best quality, highres, photo \(medium\), real life. Note: do not do this with Turbo.

🗺️ Roadmap

I'm pretty excited about the potential of Anima, but let's be clear: I'm not claiming that this checkpoint is a "ZIT killer." The correct model to compare this against is SDXL/IL - and I'm confident that Anima can dethrone it with enough community effort.

Directions I'd like to explore next:

(✅ Done in v2) There are a handful of Anima "detailer" LoRAs on Civitai. These are not intended for photography, but with enough block pruning, you never know. The right mix could go a long way.
I suspect further increasing the dataset to ~3k images would help resolve remaining issues related to certain textures or model biases.
(✅ Done in v2) I'm eagerly awaiting the release of Anima Turbo 1.0. The current Turbo solution is based on Preview3 and I think it's holding back this model's potential a little.
I'm also looking forward to Anima support in OneTrainer. It will make trying experimental configs a lot less of a hassle compared to kohya-ss. For this v1 run, I stuck with safe values (prodigy, 1.0 LR, no fancy flags.)

Thank you. As always, I look forward to your feedback. Please share the model and upload some images to help it gain traction.

About this model

🤓 Technical details

🛠️ Recommended Settings (v2)

🗺️ Roadmap

Tags

Related Models

Juggernaut XL

Pony Diffusion V6 XL

CyberRealistic Pony

CyberRealistic

epiCRealism XL

Nova Anime XL

Realism By Stable Yogi (Pony)

Realism Illustrious By Stable Yogi