Gan Style Harness

Low Risk

by @affaan-mVerified Source

4.1229 installsv1.0.0Updated May 25, 2026

How to Use

Run in Claude Code terminal

Step 1: Add Marketplace

/plugin marketplace add affaan-m/ECC

Step 2: Install Plugin

/plugin install ecc@ecc

About

GAN-inspired Generator-Evaluator agent harness for building high-quality applications autonomously. Based on Anthropic's March 2026 harness design paper.

name: gan-style-harness description: "GAN-inspired Generator-Evaluator agent harness for building high-quality applications autonomously. Based on Anthropic's March 2026 harness design paper." origin: ECC-community tools: Read, Write, Edit, Bash, Grep, Glob, Task

GAN-Style Harness Skill

Inspired by Anthropic's Harness Design for Long-Running Application Development (March 24, 2026)

A multi-agent harness that separates generation from evaluation, creating an adversarial feedback loop that drives quality far beyond what a single agent can achieve.

Core Insight

When asked to evaluate their own work, agents are pathological optimists — they praise mediocre output and talk themselves out of legitimate issues. But engineering a separate evaluator to be ruthlessly strict is far more tractable than teaching a generator to self-critique.

This is the same dynamic as GANs (Generative Adversarial Networks): the Generator produces, the Evaluator critiques, and that feedback drives the next iteration.

When to Use

Building complete applications from a one-line prompt
Frontend design tasks requiring high visual quality
Full-stack projects that need working features, not just code
Any task where "AI slop" aesthetics are unacceptable
Projects where you want to invest $50-200 for production-quality output

When NOT to Use

Quick single-file fixes (use standard claude -p)
Tasks with tight budget constraints (<$10)
Simple refactoring (use de-sloppify pattern instead)
Tasks that are already well-specified with tests (use TDD workflow)

Architecture

                    ┌─────────────┐
                    │   PLANNER   │
                    │  (Opus 4.6) │
                    └──────┬──────┘
                           │ Product Spec
                           │ (features, sprints, design direction)
                           ▼
              ┌────────────────────────┐
              │                        │
              │   GENERATOR-EVALUATOR  │
              │      FEEDBACK LOOP     │
              │                        │
              │  ┌──────────┐          │
              │  │GENERATOR │--build-->│──┐
              │  │(Opus 4.6)│          │  │
              │  └────▲─────┘          │  │
              │       │                │  │ live app
              │    feedback             │  │
              │       │                │  │
              │  ┌────┴─────┐          │  │
              │  │EVALUATOR │<-test----│──┘
              │  │(Opus 4.6)│          │
              │  │+Playwright│         │
              │  └──────────┘          │
              │                        │
              │   5-15 iterations      │
              └────────────────────────┘

The Three Agents

1. Planner Agent

Role: Product manager — expands a brief prompt into a full product specification.

Key behaviors:

Takes a one-line prompt and produces a 16-feature, multi-sprint specification
Defines user stories, technical requirements, and visual design direction
Is deliberately ambitious — conservative planning leads to underwhelming results
Produces evaluation criteria that the Evaluator will use later

Model: Opus 4.6 (needs deep reasoning for spec expansion)

2. Generator Agent

Role: Developer — implements features according to the spec.

Key behaviors:

Works in structured sprints (or continuous mode with newer models)
Negotiates a "sprint contract" with the Evaluator before writing code
Uses full-stack tooling: React, FastAPI/Express, databases, CSS
Manages git for version control between iterations
Reads Evaluator feedback and incorporates it in next iteration

Model: Opus 4.6 (needs strong coding capability)

3. Evaluator Agent

Role: QA engineer — tests the live running application, not just code.

Key behaviors:

Uses Playwright MCP to interact with the live application
Clicks through features, fills forms, tests API endpoints
Scores against four criteria (configurable):
1. Design Quality — Does it feel like a coherent whole?
2. Originality — Custom decisions vs. template/AI patterns?
3. Craft — Typography, spacing, animations, micro-interactions?
4. Functionality — Do all features actually work?
Returns structured feedback with scores and specific issues
Is engineered to be ruthlessly strict — never praises mediocre work

Model: Opus 4.6 (needs strong judgment + tool use)

Evaluation Criteria

The default four criteria, each scored 1-10:

## Evaluation Rubric

### Design Quality (weight: 0.3)
- 1-3: Generic, template-like, "AI slop" aesthetics
- 4-6: Competent but unremarkable, follows conventions
- 7-8: Distinctive, cohesive visual identity
- 9-10: Could pass for a professional designer's work

### Originality (weight: 0.2)
- 1-3: Default colors, stock layouts, no

Compatible Tools

Claude CodeCursor

Gan Style Harness

About

name: gan-style-harness description: "GAN-inspired Generator-Evaluator agent harness for building high-quality applications autonomously. Based on Anthropic's March 2026 harness design paper." origin: ECC-community tools: Read, Write, Edit, Bash, Grep, Glob, Task

GAN-Style Harness Skill

Core Insight

When to Use

When NOT to Use

Architecture

The Three Agents

1. Planner Agent

2. Generator Agent

3. Evaluator Agent

Evaluation Criteria

Compatible Tools

Tags

Related Skills

RAG Engineer

"orchestrate-batch-refactor"

Docx Official

Azure AI Agents Persistent Java

Azure Search Documents Ts

Agent Framework Azure AI Py