Blog 13 min read

How to Add Screenshot Capabilities to Your AI Agent with OpenClaw

Install the SnapRender screenshot skill from ClawhHub and give your OpenClaw agent the ability to see any webpage. Covers setup, three real use cases, and building custom skills.

SnapRender Team
|

How to Add Screenshot Capabilities to Your AI Agent with OpenClaw

OpenClaw agents can capture website screenshots by installing the screenshot skill from ClawhHub. One line in your config, and your agent can see any webpage, analyze visual layouts, monitor changes, and act on what it finds. The whole setup takes about five minutes. Here's how to wire it up and three real use cases I've built with it.

What Is OpenClaw?

OpenClaw is an open-source framework for building autonomous AI agents. You define an agent's personality, goals, and schedule in config files, point it at an LLM, and let it run. The agent executes tasks on a cron schedule, maintains its own memory, and can call external tools through a skill system.

The architecture is straightforward:

  • AGENTS.md defines your agent's tasks, prompts, and schedules
  • SOUL.md sets the agent's personality and behavioral boundaries
  • openclaw.json configures the runtime: model, skills, timeouts, environment variables
  • Skills are pluggable capabilities: web search, file operations, API calls, browser control, screenshots

ClawhHub is the community marketplace where developers publish and discover skills. Think of it like npm but for agent capabilities. You browse the registry, find a skill that does what you need, install it, and your agent gains that ability immediately.

This is what makes OpenClaw different from writing a Python script that calls APIs. Skills are declarative. The agent decides when and how to use them based on its task context. You don't write the orchestration logic. You just give the agent the tools and describe the goal.

Installing the Screenshot Skill

Browse ClawhHub at clawhub.ai and search for "snaprender." The official skill is snaprender published by the SnapRender team. Install it from the command line:

clawhub install snaprender

This downloads the skill package and registers it in your openclaw.json. You can also add it manually:

{
  "name": "my-agent",
  "model": "sonnet",
  "skills": [
    {
      "name": "screenshot",
      "version": "^1.0.0",
      "config": {
        "api_key_env": "SNAPRENDER_API_KEY",
        "default_format": "png",
        "default_width": 1280,
        "default_height": 720,
        "cache_ttl": 86400
      }
    }
  ],
  "env": {
    "SNAPRENDER_API_KEY": "${SNAPRENDER_API_KEY}"
  }
}

Set the API key in your environment:

export SNAPRENDER_API_KEY="sk_live_your_key_here"

That's it. Your agent now has access to a capture_screenshot tool. Whenever the agent needs visual information about a webpage, it calls this tool, and the skill handles the API request to SnapRender behind the scenes.

How the Skill Works Under the Hood

The screenshot skill exposes a tool interface to the LLM. When the agent's task requires visual context, the LLM generates a tool call, and the OpenClaw runtime dispatches it to the skill.

The flow:

  1. Agent receives a task prompt (e.g., "Check if the homepage hero section has changed")
  2. The LLM decides it needs to see the page and generates a tool call:
    {
      "tool": "capture_screenshot",
      "parameters": {
        "url": "https://example.com",
        "format": "png",
        "width": 1440,
        "height": 900,
        "full_page": false
      }
    }
    
  3. The screenshot skill receives the call, constructs the SnapRender API request, and sends it
  4. SnapRender renders the page in a real Chromium browser and returns the image bytes
  5. The skill passes the image back to the agent as a base64-encoded attachment
  6. The agent's vision model analyzes the screenshot and continues its task

The skill supports all SnapRender parameters:

  • url (string, required): Page to capture
  • format (string, default png): png, jpeg, webp, or pdf
  • width (number, default 1280): Viewport width
  • height (number, default 720): Viewport height
  • full_page (boolean, default false): Capture entire scrollable page
  • block_ads (boolean, default true): Remove ads before capture
  • no_cookie_banners (boolean, default true): Dismiss consent dialogs
  • dark_mode (boolean, default false): Force dark color scheme
  • device (string): Device preset (e.g., iPhone 15 Pro)
  • hide_selectors (string): CSS selectors to hide elements
  • cache_ttl (number, default 86400): Cache duration in seconds

Because the skill wraps a managed API, there's no Chromium to install, no browser pool to manage, no memory leaks to debug. The agent just says "I need to see this page" and gets an image back.

Use Case 1: Web Monitoring Agent

This is the most practical starting point. Build an agent that monitors competitor websites daily and alerts you when something changes.

Create the agent configuration:

<!-- AGENTS.md -->

## web-monitor

You monitor competitor websites for visual changes. Every day, capture screenshots of each URL in your watch list. Compare today's screenshot with yesterday's by describing what you see in each. If you notice significant changes (new hero section, pricing changes, new product announcements, layout redesigns), write a summary and send a Telegram alert.

Watch list:
- https://competitor-one.com (homepage + pricing page)
- https://competitor-two.com/features
- https://competitor-three.com

For each URL:
1. Capture a full-page screenshot at 1440x900
2. Save a description of what you see to ~/logs/monitor/{domain}/{date}.md
3. Read yesterday's description from the same directory
4. If anything significant changed, send a Telegram message with the details

Ignore minor changes: different ad content, time-based greetings, rotating testimonials. Focus on structural changes, pricing changes, and new feature announcements.

schedule: "0 8 * * 1-5"
model: sonnet
timeout: 900

The cron runs Monday through Friday at 8 AM. The agent captures each page, writes a natural-language description of what it sees, diffs against yesterday's description, and only alerts on meaningful changes.

Here's what the agent's log output looks like after a run:

[2026-03-12 08:01:23] Task: web-monitor started
[2026-03-12 08:01:25] Tool call: capture_screenshot(url="https://competitor-one.com", width=1440, height=900, full_page=true)
[2026-03-12 08:01:29] Screenshot captured: 1440x4200px, 2.3MB
[2026-03-12 08:01:34] Analysis: Homepage has new hero section with "AI-powered" messaging. CTA changed from "Start Free" to "Get Started Free". Social proof section added below fold with 4 customer logos.
[2026-03-12 08:01:35] Compared with 2026-03-11: Hero section is NEW (was product screenshot, now AI messaging). CTA text changed. Social proof section is NEW.
[2026-03-12 08:01:36] Alert sent: Competitor One redesigned their homepage with AI positioning.

The key insight: you don't write the comparison logic. The vision model handles it. You describe what you care about in the prompt, and the agent figures out how to detect it.

Use Case 2: Visual QA Agent

After every deploy to staging, this agent screenshots your key pages and checks for visual regressions. It doesn't need pixel-diff tooling. The vision model catches broken layouts, missing images, overlapping text, and styling issues the same way a human QA engineer would.

<!-- AGENTS.md -->

## visual-qa

You are a visual QA engineer. When triggered, screenshot the following pages on the staging environment and check for issues:

Pages:
- https://staging.yourapp.com/ (homepage)
- https://staging.yourapp.com/pricing
- https://staging.yourapp.com/features
- https://staging.yourapp.com/docs/getting-started
- https://staging.yourapp.com/dashboard (use auth cookie from env)

For each page, capture at three viewports:
- Desktop: 1440x900
- Tablet: 768x1024
- Mobile: 375x812

Check for:
- Broken layouts (overlapping elements, content overflow, missing sections)
- Missing images (broken image icons, empty spaces where images should be)
- Text issues (truncation, unreadable contrast, wrong fonts)
- Interactive elements that look broken (buttons without text, empty dropdowns)
- Responsive issues (content that doesn't adapt to viewport)

Write your findings to ~/reports/qa/{date}.md with screenshots referenced for each issue found. If you find critical issues (broken layout, missing sections, unreadable text), send a Telegram alert.

If everything looks good, log "All pages passed visual QA" and send no alert.

schedule: manual
model: opus
timeout: 1800

Trigger it from your CI/CD pipeline after a staging deploy:

openclaw run visual-qa --trigger deploy

The agent captures 15 screenshots (5 pages x 3 viewports), analyzes each one, and produces a QA report. Opus is the right model choice here because visual analysis benefits from the strongest reasoning capability.

For the screenshots themselves, the skill calls look like:

{
  "tool": "capture_screenshot",
  "parameters": {
    "url": "https://staging.yourapp.com/pricing",
    "width": 375,
    "height": 812,
    "format": "png",
    "device": "iPhone 15 Pro",
    "block_ads": true,
    "no_cookie_banners": true,
    "cache_ttl": 0
  }
}

Notice cache_ttl: 0. For QA, you always want a fresh render, never a cached version from a previous deploy.

Use Case 3: Research Agent

Build an agent that researches topics by capturing and visually analyzing web pages. Unlike text scraping, the agent can understand charts, diagrams, pricing grids, and visual hierarchies.

<!-- AGENTS.md -->

## researcher

You research topics by reading web pages visually. When given a research topic:

1. Search the web for relevant pages (use the web-search skill)
2. For the top 5-8 results, capture full-page screenshots
3. Analyze each page visually: read the text, interpret charts/graphs, note the page structure
4. Write a research summary to ~/research/{topic-slug}.md

Focus on extracting data points and visual information that wouldn't be available from text-only scraping (chart data, comparison tables, architecture diagrams).

When you encounter pages with important charts, capture at 2x device scale factor for readability.

schedule: manual
model: sonnet
timeout: 1200

The device_scale_factor: 2 trick produces a 2880x1800 image from a 1440x900 viewport, making fine details readable by the vision model.

Building Your Own Screenshot Skill for ClawhHub

The built-in screenshot skill covers most cases, but you might want custom behavior: automatic before/after comparison, archival to S3, annotation overlays, or combined capture-and-analyze in a single tool call.

ClawhHub skills follow a standard structure:

my-screenshot-skill/
  manifest.json
  index.js
  README.md

The manifest defines the skill's metadata and tool interface:

{
  "name": "screenshot-archive",
  "version": "1.0.0",
  "description": "Captures screenshots and archives them to S3 with metadata",
  "author": "your-github-handle",
  "tools": [
    {
      "name": "capture_and_archive",
      "description": "Screenshot a URL and save it to S3 with timestamp and metadata",
      "parameters": {
        "type": "object",
        "properties": {
          "url": {
            "type": "string",
            "description": "URL to capture"
          },
          "tags": {
            "type": "array",
            "items": { "type": "string" },
            "description": "Tags for organizing archived screenshots"
          },
          "note": {
            "type": "string",
            "description": "Human-readable note about why this capture was taken"
          }
        },
        "required": ["url"]
      }
    },
    {
      "name": "compare_with_previous",
      "description": "Capture a URL and return it alongside the most recent archived version for comparison",
      "parameters": {
        "type": "object",
        "properties": {
          "url": {
            "type": "string",
            "description": "URL to capture and compare"
          }
        },
        "required": ["url"]
      }
    }
  ],
  "config": {
    "api_key_env": "SNAPRENDER_API_KEY",
    "s3_bucket_env": "SCREENSHOT_ARCHIVE_BUCKET",
    "s3_region_env": "AWS_REGION"
  }
}

The implementation handles the tool calls:

// index.js
import { S3Client, PutObjectCommand, ListObjectsV2Command, GetObjectCommand } from "@aws-sdk/client-s3";

export default class ScreenshotArchiveSkill {
  constructor(config) {
    this.apiKey = process.env[config.api_key_env];
    this.s3 = new S3Client({ region: process.env[config.s3_region_env] });
    this.bucket = process.env[config.s3_bucket_env];
  }

  async captureScreenshot(url, options = {}) {
    const params = new URLSearchParams({
      url,
      format: options.format || "png",
      width: String(options.width || 1440),
      height: String(options.height || 900),
      full_page: "true",
      block_ads: "true",
      no_cookie_banners: "true",
      cache_ttl: String(options.cache_ttl ?? 3600),
    });

    const response = await fetch(
      `https://app.snap-render.com/v1/screenshot?${params}`,
      { headers: { "X-API-Key": this.apiKey } }
    );

    if (!response.ok) {
      throw new Error(`Screenshot failed: ${response.status} ${await response.text()}`);
    }

    return Buffer.from(await response.arrayBuffer());
  }

  async handleToolCall(toolName, parameters) {
    if (toolName === "capture_and_archive") {
      return this.captureAndArchive(parameters);
    }
    if (toolName === "compare_with_previous") {
      return this.compareWithPrevious(parameters);
    }
    throw new Error(`Unknown tool: ${toolName}`);
  }

  async captureAndArchive({ url, tags = [], note = "" }) {
    const imageBuffer = await this.captureScreenshot(url);
    const domain = new URL(url).hostname;
    const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
    const key = `screenshots/${domain}/${timestamp}.png`;

    await this.s3.send(new PutObjectCommand({
      Bucket: this.bucket,
      Key: key,
      Body: imageBuffer,
      ContentType: "image/png",
      Metadata: {
        url,
        tags: tags.join(","),
        note,
        captured_at: new Date().toISOString(),
      },
    }));

    return {
      image: imageBuffer.toString("base64"),
      archived_to: `s3://${this.bucket}/${key}`,
      message: `Screenshot captured and archived. Tags: ${tags.join(", ")}`,
    };
  }

  async compareWithPrevious({ url }) {
    const currentBuffer = await this.captureScreenshot(url, { cache_ttl: 0 });

    const domain = new URL(url).hostname;
    const prefix = `screenshots/${domain}/`;

    const listResult = await this.s3.send(new ListObjectsV2Command({
      Bucket: this.bucket,
      Prefix: prefix,
      MaxKeys: 1,
    }));

    let previousBuffer = null;
    if (listResult.Contents && listResult.Contents.length > 0) {
      const latestKey = listResult.Contents[listResult.Contents.length - 1].Key;
      const getResult = await this.s3.send(new GetObjectCommand({
        Bucket: this.bucket,
        Key: latestKey,
      }));
      previousBuffer = Buffer.from(await getResult.Body.transformToByteArray());
    }

    const result = {
      current_image: currentBuffer.toString("base64"),
      message: previousBuffer
        ? "Current and previous screenshots attached for comparison."
        : "No previous screenshot found. This is the first capture for this URL.",
    };

    if (previousBuffer) {
      result.previous_image = previousBuffer.toString("base64");
    }

    return result;
  }
}

Register the skill in your agent's config:

{
  "skills": [
    {
      "name": "screenshot-archive",
      "path": "./skills/screenshot-archive",
      "config": {
        "api_key_env": "SNAPRENDER_API_KEY",
        "s3_bucket_env": "SCREENSHOT_ARCHIVE_BUCKET",
        "s3_region_env": "AWS_REGION"
      }
    }
  ]
}

The agent now has two specialized tools instead of one generic screenshot tool. It can archive captures with tags and notes, and it can pull the previous version for side-by-side comparison without you writing comparison logic in the prompt.

Publishing to ClawhHub

Once your skill works locally, publish it so other OpenClaw users can install it:

# Log in to ClawhHub
openclaw auth login

# Validate the manifest
openclaw skill validate ./skills/screenshot-archive

# Publish
openclaw skill publish ./skills/screenshot-archive

ClawhHub validates the manifest, checks that all declared tools have implementations, and runs a basic test if you include a test script. After publishing, anyone can install it with clawhub install screenshot-archive.

The registry page shows download counts, version history, and compatibility info. Good skills get starred by the community, which helps them surface in search. Some screenshot skill ideas worth building for ClawhHub:

  • screenshot-diff: Pixel-level comparison using Sharp, returns a highlighted diff image
  • screenshot-pdf-report: Captures multiple pages and compiles them into a single annotated PDF
  • screenshot-accessibility: Captures a page and runs axe-core, returns both the image and violations

Why OpenClaw and SnapRender Work Well Together

The pairing makes sense when you think about what each does.

OpenClaw handles agent orchestration: scheduling, memory, tool dispatch, multi-step reasoning, and autonomous execution. It gives agents the ability to think and act.

SnapRender handles visual capture: real Chromium rendering, device emulation, ad blocking, cookie banner removal, caching. It gives agents the ability to see.

ClawhHub connects them: the screenshot skill is a thin adapter that exposes SnapRender's API as an agent-callable tool. Install once, use everywhere.

The cost model works too. A monitoring agent capturing 5 pages daily across 3 viewports generates about 450 screenshots per month. That fits inside SnapRender's free tier (500/month). A more active agent doing research and QA might hit 1,000-2,000 per month, covered by the Starter plan at $9. Check pricing at snap-render.com.

Getting Started

Five steps, five minutes:

  1. Install OpenClaw: npm install -g openclaw
  2. Initialize a project: openclaw init my-agent
  3. Install the screenshot skill: clawhub install snaprender
  4. Get a free SnapRender API key at snap-render.com
  5. Add the key to your environment: export SNAPRENDER_API_KEY="sk_live_..."

Write your first agent task in AGENTS.md, run openclaw start, and your agent can see the web. Browse ClawhHub for other skills to combine with screenshots: web search, file storage, notification systems, database access. The agents you can build get interesting fast when they have multiple skills working together.

Related posts:

Try SnapRender Free

500 free screenshots/month, no credit card required.

Sign up free