# sogni-gen — AI Image & Video Generation

> OpenClaw plugin powered by Sogni AI's decentralized GPU network.
> Repo: https://github.com/Sogni-AI/openclaw-sogni-gen

## What It Does

Generates AI images and videos from text prompts or reference media. Users ask you to "draw", "generate", "create an image/video", or "animate" something and you produce it.

## Install

```bash
openclaw plugins install sogni-gen
```

Then create Sogni credentials:

```bash
mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_USERNAME=your_username
SOGNI_PASSWORD=your_password
EOF
chmod 600 ~/.config/sogni/credentials
```

Sign up at https://app.sogni.ai/ if you don't have an account. You get 50 free Spark tokens daily at https://app.sogni.ai/

## How to Generate

### Images

```bash
# Basic — returns a URL
node {{skillDir}}/sogni-gen.mjs -q "a cat wearing a hat"

# Save to file (then send via message tool with filePath)
node {{skillDir}}/sogni-gen.mjs -q -o /tmp/generated.png "a cat wearing a hat"

# Bigger image
node {{skillDir}}/sogni-gen.mjs -q -o /tmp/out.png -w 1024 -h 1024 "a dragon eating tacos"

# Higher quality (slower)
node {{skillDir}}/sogni-gen.mjs -q -m flux2_dev_fp8 -o /tmp/out.png "portrait of a wizard"
```

### Image Editing (needs a reference image)

```bash
# Edit an existing image
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/photo.jpg -o /tmp/edited.png "make the background a beach"

# Use last generated image as input
node {{skillDir}}/sogni-gen.mjs -q --last-image -o /tmp/edited.png "make it pop art style"

# Restore a damaged photo
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/old_photo.jpg -o /tmp/restored.png "restore this vintage photo, remove damage and scratches"
```

### Videos

```bash
# Text-to-video
node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "ocean waves at sunset"

# Image-to-video (animate an image)
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "camera slowly zooms in"

# Looping video
node {{skillDir}}/sogni-gen.mjs -q --video --looping --ref /path/to/image.png -o /tmp/loop.mp4 "gentle camera pan"

# Longer video (10 seconds)
node {{skillDir}}/sogni-gen.mjs -q --video --duration 10 --ref /path/to/image.png -o /tmp/video.mp4 "camera orbits around"

# Sound-to-video (lip sync / talking head)
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/face.jpg --ref-audio /path/to/speech.m4a -o /tmp/talking.mp4 "talking head"

# Motion transfer from another video
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/subject.jpg --ref-video /path/to/motion.mp4 --workflow animate-move -o /tmp/animated.mp4 "transfer motion"
```

### 360 Turntable

```bash
# Generate 8 angles of a subject
node {{skillDir}}/sogni-gen.mjs -q --angles-360 -c /path/to/subject.jpg "studio portrait"

# 360 video (looping mp4, requires ffmpeg)
node {{skillDir}}/sogni-gen.mjs -q --angles-360 --angles-360-video /tmp/turntable.mp4 -c /path/to/subject.jpg "studio portrait"
```

### Check Balance

```bash
node {{skillDir}}/sogni-gen.mjs --json --balance
```

## Image Models

| Model | Speed | Best For |
|-------|-------|----------|
| z_image_turbo_bf16 | ~5-10s | Default, general purpose |
| flux1-schnell-fp8 | ~3-5s | Quick iterations |
| flux2_dev_fp8 | ~2min | Highest quality |
| chroma-v.46-flash_fp8 | ~30s | Balanced speed/quality |
| qwen_image_edit_2511_fp8_lightning | ~8s | Fast image editing (auto-selected with -c) |
| qwen_image_edit_2511_fp8 | ~30s | Higher quality editing |

## Video Models (auto-selected by workflow)

| Workflow | Model | Speed |
|----------|-------|-------|
| t2v (text-to-video) | wan_v2.2-14b-fp8_t2v_lightx2v | ~5min |
| i2v (image-to-video) | wan_v2.2-14b-fp8_i2v_lightx2v | ~3-5min |
| s2v (sound-to-video) | wan_v2.2-14b-fp8_s2v_lightx2v | ~5min |
| animate-move | wan_v2.2-14b-fp8_animate-move_lightx2v | ~5min |
| animate-replace | wan_v2.2-14b-fp8_animate-replace_lightx2v | ~5min |

## Key Flags

| Flag | What It Does |
|------|-------------|
| -o /path | Save output to file |
| -q | Quiet mode (suppress progress) |
| -w, -h | Width/height in pixels (default 768x768) |
| -m MODEL | Choose a specific model |
| -c IMAGE | Context image for editing (repeatable, max 3) |
| --video, -v | Generate video instead of image |
| --ref IMAGE | Reference image for video |
| --ref-audio FILE | Audio for lip sync (s2v) |
| --ref-video FILE | Motion source for animate workflows |
| --looping | Seamless A-B-A loop (i2v only) |
| --duration SEC | Video length (default 5s) |
| --fps NUM | Frames per second (default 16) |
| --last-image | Reuse last generated image as input |
| --json | Machine-readable JSON output |
| --balance | Show Spark/Sogni token balances |
| --extract-last-frame VIDEO IMAGE | Extract last frame from a video file |
| --concat-videos OUTPUT CLIPS... | Concatenate multiple video clips |
| --list-media [images\|audio\|all] | List recent inbound media files |

## Agent Behavior Guidelines

0. If the user includes the keyword "photobooth" (case-insensitive), always use `--photobooth` with `--ref` to the user face image. Do not fall back to `-c` edit flow for that request.
1. When the user asks to "draw", "generate", "create", or "make" an image: generate an image and send it.
2. When they ask to "animate", "make a video", or "create a video": use --video mode.
3. When they send a photo and ask to edit/change/modify it: use -c with their image.
4. When they send a photo and ask to animate it: use --video --ref with their image.
5. When they send a photo + audio and ask for lip sync: use --video --ref IMAGE --ref-audio AUDIO.
6. Always use -q (quiet) and -o (output to file) so you can send the result back.
7. After generating, send the file to the user via the message tool with filePath.
8. If you get "Insufficient funds", tell them: "Claim 50 free daily Spark points at https://app.sogni.ai/"
9. For transition/animation videos, always use this plugin's built-in flags (not raw ffmpeg). Use `--looping`, `--extract-last-frame`, or `--concat-videos`.
10. Default to 768x768 for images. Video sizes must be divisible by 16 (min 480px, max 1536px).

## Finding User-Sent Media

When users send images/audio via Telegram, WhatsApp, or iMessages, use the built-in `--list-media` flag:

```bash
# Recent inbound images (default)
node {{skillDir}}/sogni-gen.mjs --json --list-media images

# Recent inbound audio
node {{skillDir}}/sogni-gen.mjs --json --list-media audio

# All recent media
node {{skillDir}}/sogni-gen.mjs --json --list-media all
```

Do NOT use shell commands (`ls`, `cp`, etc.) to browse user media directories.

## Example Conversations

User: "Draw a sunset over mountains"
You: Generate image, send it.

User: *sends photo* "Make this look like a watercolor painting"
You: Use -c with their photo, edit prompt, send result.

User: *sends photo* "Animate this"
You: Use --video --ref with their photo, send video.

User: "Make a video of a cat playing piano"
You: Use --video (t2v), send video.

User: *sends photo + audio* "Make this person say this"
You: Use --video --ref photo --ref-audio audio (s2v), send video.

User: "Show me a 360 view of this" *sends photo*
You: Use --angles-360 --angles-360-video with their photo, send video.
