2026 OpenClaw in practice: Ollama batch inference on a rented remote Mac

Mar 31, 2026 · ~8 min · MacCompute Team · Guide

This runbook shows how to run repeatable batch inference with Ollama on a rented remote Mac while keeping OpenClaw nearby for agents and automation. You will install both services with sane defaults, route APIs through a small reverse proxy instead of exposing raw ports, cap concurrency at the queue and server layers, and add degradation plus retries so overnight jobs survive transient overload. For MacCompute capacity and access patterns, start from Home, the notes index, or help.

Goal and recommended layout

Batches fail when parallelism overshoots VRAM, models thrash, or clients quit on the first timeout. On a remote Mac, use:

  • Ollama127.0.0.1:11434 (inference only).
  • OpenClaw Gateway127.0.0.1:18789 (agents; see official Docker / CLI docs).
  • Edge — Caddy/Nginx for one TLS hostname on a VPN, or SSH -L from your laptop.
  • Queue worker — caps in-flight jobs and adds backoff.

Install Ollama on macOS (remote Mac)

Install and pull the model you will batch:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

Keep the default loopback bind unless you intentionally set OLLAMA_HOST. For reboot-safe settings, put OLLAMA_NUM_PARALLEL in a profile or launchd plist. Smoke test: curl -fsS http://127.0.0.1:11434/api/tags. OpenAI-style clients can use /v1/chat/completions on the same port.

OpenClaw Gateway: install points that matter for co-hosting

Two supported paths: global CLI (Node 24 or 22.16+) or Docker with ./scripts/docker/setup.sh on a clone of openclaw/openclaw. Typical Mac setup: npm install -g openclaw@latest, openclaw onboard --install-daemon, openclaw gateway --port 18789 --verbose; or Docker with bind-mounted config/workspace. Keep Ollama native for Metal; let OpenClaw tools call 127.0.0.1:11434. curl -fsS http://127.0.0.1:18789/healthz. More hardening: OpenClaw deploy & remote Mac workflows.

API routing: one hostname, two upstreams

For VPN clients that need HTTPS to both stacks, terminate TLS once and split by path (conceptual Caddy):

inference.internal.example.com {
  route /v1/* {
    reverse_proxy 127.0.0.1:11434
  }
  route /openclaw/* {
    reverse_proxy 127.0.0.1:18789
  }
}

Add proxy rate limits (limit_req or Caddy rate-limit) to absorb client bursts. No public hostname? Use ssh -L for 11434 and 18789 instead of opening the firewall.

Routing decisions at a glance

Traffic Target Why
Batch /api/generate or OpenAI shim 127.0.0.1:11434 Lowest latency; keep behind loopback or authenticated proxy.
Gateway UI and WS control plane 127.0.0.1:18789 OpenClaw health on /healthz; tunnel when debugging.
Untrusted internet None by default Require VPN, SSH, or mutual-TLS before publishing either port.

Queue script: concurrency cap and JSON-safe payloads

Store one prompt per line in prompts.txt. The worker below uses python3 only to build JSON safely. Parallelism uses a small bash job pool (works on stock macOS bash) instead of GNU xargs -P.

#!/usr/bin/env bash
set -euo pipefail
OLLAMA_URL="${OLLAMA_URL:-http://127.0.0.1:11434}"
MODEL="${MODEL:-llama3.2}"
MAX_JOBS="${MAX_JOBS:-2}"
PROMPTS="${1:?path to prompts.txt}"
mkdir -p out failed

run_one() {
  local i="$1" line="$2"
  local body try=0 delay=1
  body="$(python3 -c 'import json,sys; print(json.dumps({"model":sys.argv[1],"prompt":sys.argv[2],"stream":False}))' "$MODEL" "$line")"
  while (( try < 4 )); do
    if curl -fsS --max-time 600 -H 'Content-Type: application/json' \
      -d "$body" "$OLLAMA_URL/api/generate" -o "out/resp-$i.json"; then
      return 0
    fi
    sleep "$delay"
    delay=$(( delay * 2 ))
    try=$(( try + 1 ))
  done
  printf '%s\n' "$line" >> "failed/prompts-$i.txt"
  return 1
}
export -f run_one
export OLLAMA_URL MODEL

i=0
while IFS= read -r line || [ -n "${line-}" ]; do
  i=$((i+1))
  while (( $(jobs -rp | wc -l | tr -d ' ') >= MAX_JOBS )); do
    sleep 0.2
  done
  ( run_one "$i" "$line" ) || true
done < "$PROMPTS"
wait

Keep MAX_JOBS within OLLAMA_NUM_PARALLEL and unified memory; on 16 GB Macs start at 1 for 7B–8B models, then tune while watching memory pressure.

Resource limits: Ollama, macOS, and the batch driver

  • OllamaOLLAMA_NUM_PARALLEL; optional OLLAMA_MAX_LOADED_MODELS if you swap models often.
  • QueueMAX_JOBS ≤ server parallel after reserving headroom for cron and agents.
  • macOS — optional launchd SoftResourceLimits / HardResourceLimits on RAM.
  • OpenClaw — stagger automations so agent bursts do not align with Ollama peaks.

Degradation and retries

The worker retries HTTP errors with backoff. Add model steps: primary → smaller fallback → truncated prompt → failed/ dead letter. Cap attempts (four in the script) so one bad line cannot block the run.

FAQ

Should Ollama listen on all interfaces? Only if you fully understand the exposure. Prefer loopback plus SSH or VPN, and put auth in front of any routable address.

Where should concurrency limits live? At Ollama (OLLAMA_NUM_PARALLEL), the reverse proxy (rate limits), and the queue (MAX_JOBS). All three together prevent silent overload.

How does OpenClaw relate to Ollama here? OpenClaw runs the Gateway and agents; Ollama serves local LLM HTTP. They coexist on one Mac but should not share one process namespace.

What if jobs time out? Use curl --max-time, backoff, optional smaller model, and a dead-letter file—never infinite loops.

How do I verify both services? curl to /api/tags and /healthz; from off-box, tunnel ports with SSH instead of opening the firewall.

Summary

Co-host OpenClaw and Ollama on loopback, add a proxy only for internal TLS, and cap load at Ollama, the edge, and the queue. Retries plus fallbacks turn overnight batches into bounded, inspectable outcomes. For always-on Mac capacity, see pricing and purchase; help covers access.

Batch on hardware that stays up. Remote Mac mini tiers fit long SSH sessions, local inference, and assistant-style automation without pinning your laptop.

Quick buy