# 🍌 openclaw-paperbanana

An [OpenClaw](https://github.com/openclaw/openclaw) skill that generates publication-quality academic diagrams and statistical plots from text descriptions using [PaperBanana](https://github.com/llmsresearch/paperbanana).

> **Turn methodology descriptions into professional figures. Turn data into publication-ready plots. All from chat.**

## Demo

All figures below were generated by this skill from text prompts and JSON data:

| Architecture Diagram | Performance Comparison | Training Convergence |
|:---:|:---:|:---:|
| ![Architecture](demo/architecture.png) | ![Comparison](demo/comparison.png) | ![Convergence](demo/convergence.png) |
| *gpt-5.2 + gpt-image-1.5* | *gpt-5.2 + Matplotlib* | *gpt-5.2 + Matplotlib* |

The full demo paper (`demo/paper.pdf`) was compiled from these figures with LaTeX.

## Features

- **📐 Diagram Generation** — Methodology figures, architecture diagrams, pipeline illustrations from text descriptions
- **📊 Plot Generation** — Bar charts, line plots, scatter plots from CSV/JSON data using Matplotlib code generation
- **🔄 Iterative Refinement** — AI critic evaluates each iteration and provides feedback for improvement
- **📝 Evaluation** — Compare generated diagrams against human references (Faithfulness, Readability, Conciseness, Aesthetics)
- **🔧 Run Continuation** — Refine previous generations with natural language feedback
- **🤖 Auto-Triggering** — OpenClaw automatically invokes the skill when you ask to generate figures

## Supported Providers

| Provider | Env Var | Cost | Quality |
|----------|---------|------|---------|
| **Google Gemini** | `GOOGLE_API_KEY` | Free | Good |
| **OpenAI** | `OPENAI_API_KEY` | Paid | Best (gpt-5.2 + gpt-image-1.5) |
| **OpenRouter** | `OPENROUTER_API_KEY` | Paid | Any model |

Auto-detection priority: Gemini → OpenAI → OpenRouter. Override with `--provider`.

## Installation

### 1. Clone to your skills directory

```bash
cd /path/to/your/openclaw/workspace/skills
git clone https://github.com/GoatInAHat/openclaw-paperbanana.git paperbanana
```

### 2. Configure API keys

Add to `~/.openclaw/openclaw.json`:

```json5
{
  skills: {
    entries: {
      "paperbanana": {
        enabled: true,
        env: {
          // Pick one (or multiple for fallback):

          // Google Gemini — free, good quality
          GOOGLE_API_KEY: "AIzaSy...",

          // OpenAI — paid, best quality
          // OPENAI_API_KEY: "sk-...",

          // OpenRouter — paid, access to any model
          // OPENROUTER_API_KEY: "sk-or-...",
        }
      }
    }
  }
}
```

### 3. Verify `uv` is installed

The skill uses [`uv`](https://docs.astral.sh/uv/) for zero-config dependency management. Install if needed:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

That's it. No `pip install`, no virtual environments — `uv` handles everything automatically on first run.

## Usage

Once installed, OpenClaw will automatically invoke the skill when you ask to generate diagrams or plots. You can also call the scripts directly.

### Generate a Diagram

```bash
uv run scripts/generate.py \
  --context "Our framework uses a three-stage pipeline: an encoder, attention module, and decoder..." \
  --caption "Overview of the proposed architecture" \
  --iterations 3
```

**From a file:**
```bash
uv run scripts/generate.py \
  --input method_section.txt \
  --caption "System architecture overview"
```

**Options:**
| Flag | Default | Description |
|------|---------|-------------|
| `--iterations N` | 3 | Refinement rounds |
| `--auto-refine` | off | Loop until critic is satisfied |
| `--provider` | auto | `gemini`, `openai`, or `openrouter` |
| `--format` | png | `png`, `jpeg`, or `webp` |
| `--no-optimize` | off | Disable input optimization |

### Generate a Plot

```bash
uv run scripts/plot.py \
  --data '{"Model":["A","B","C"],"Accuracy":[85.2,91.3,88.7]}' \
  --intent "Bar chart comparing model accuracy"
```

**From CSV:**
```bash
uv run scripts/plot.py \
  --data-file results.csv \
  --intent "Line plot showing loss over epochs"
```

### Evaluate a Diagram

```bash
uv run scripts/evaluate.py \
  --generated output.png \
  --reference human_drawn.png \
  --context "The methodology text..." \
  --caption "Figure caption"
```

### Refine a Previous Generation

```bash
# Continue the most recent run with feedback
uv run scripts/generate.py \
  --continue \
  --feedback "Make the arrows thicker and use more distinct colors"

# Continue a specific run
uv run scripts/generate.py \
  --continue-run run_20260228_143022_a1b2c3 \
  --feedback "Add labels to each component"
```

## How It Works

PaperBanana uses a multi-agent pipeline:

```
Input Text → Retriever → Planner → Stylist → Visualizer → Critic
                                                  ↑          ↓
                                                  ← Feedback ←
```

1. **Retriever** — Finds relevant reference diagrams from the built-in example set
2. **Planner** — Generates a detailed diagram description from your methodology text
3. **Stylist** — Refines the description with academic styling guidelines
4. **Visualizer** — Generates the diagram image (or Matplotlib code for plots)
5. **Critic** — Evaluates the output and suggests improvements for the next iteration

For **methodology diagrams**, the Visualizer uses image generation models (DALL-E 3, gpt-image-1.5, or Gemini).
For **statistical plots**, the Visualizer generates and executes Matplotlib code — producing true vector graphics.

## Model Configuration

Override default models via environment variables:

```bash
# OpenAI
OPENAI_VLM_MODEL=gpt-5.2          # Vision-Language Model for planning/critique
OPENAI_IMAGE_MODEL=gpt-image-1.5  # Image generation model

# Gemini
GEMINI_VLM_MODEL=gemini-2.0-flash
GEMINI_IMAGE_MODEL=gemini-2.0-flash-preview-image-generation

# OpenRouter
OPENROUTER_VLM_MODEL=google/gemini-2.0-flash-001
OPENROUTER_IMAGE_MODEL=google/gemini-2.0-flash-001
```

## OpenClaw Integration

This skill follows the [OpenClaw AgentSkills format](https://docs.openclaw.ai). When installed:

- **Auto-triggering**: OpenClaw reads the SKILL.md description and automatically invokes the skill when you ask to "generate a diagram", "create a figure", "make a plot", etc.
- **Output delivery**: Scripts print `MEDIA:/path/to/image.png` which OpenClaw auto-attaches to the chat response.
- **API key injection**: Keys configured in `skills.entries.paperbanana.env` are injected into the script environment automatically.
- **Zero-install deps**: `uv` with PEP 723 inline script metadata handles all Python dependencies in isolated environments.

## Project Structure

```
paperbanana/
├── SKILL.md              # OpenClaw skill definition (frontmatter + instructions)
├── scripts/
│   ├── generate.py       # Diagram generation + run continuation
│   ├── plot.py           # Statistical plot generation
│   └── evaluate.py       # Diagram quality evaluation
├── references/
│   └── providers.md      # Provider comparison + configuration reference
├── demo/
│   ├── paper.tex         # Demo LaTeX paper
│   ├── paper.pdf         # Compiled demo paper
│   ├── architecture.png  # Generated architecture diagram
│   ├── comparison.png    # Generated bar chart
│   └── convergence.png   # Generated line plot
├── LICENSE               # MIT
└── README.md             # This file
```

## Requirements

- **OpenClaw** (any version with skill support)
- **`uv`** ≥ 0.4 ([install](https://docs.astral.sh/uv/getting-started/installation/))
- **Python** ≥ 3.10
- At least one API key (Gemini is free)

## License

MIT — see [LICENSE](LICENSE).
