Should Ollama listen on all interfaces when I run batch jobs on a remote Mac?

Default to binding Ollama to loopback (127.0.0.1) and reach it through SSH port forwarding or a reverse proxy on the same host. Exposing 11434 to the public internet without TLS and auth invites abuse and model exfiltration.

Where should concurrency limits live: Ollama, the proxy, or the queue script?

Use all three in layers: OLLAMA_NUM_PARALLEL and memory-related Ollama settings cap server-side parallelism; nginx or Caddy limit_req throttles burst traffic at the edge; your queue uses xargs -P or a worker pool so total in-flight jobs cannot exceed what VRAM and CPU can sustain.

How does OpenClaw relate to Ollama in this architecture?

OpenClaw provides the Gateway control plane and agent tooling (default listener on port 18789), while Ollama serves the local inference API on 11434. Keep them separate processes: agents and automations talk to OpenClaw; batch scoring or embedding pipelines call Ollama directly or via the proxy path you define.

What is a practical degradation path when requests time out or the model unloads?

Retry with exponential backoff and jitter, cap total attempts, then fall back to a smaller model name, shorter context, or a stub response written to a dead-letter file so the batch can finish and you can requeue failures.

How do I verify both services after SSH to the rented Mac?

curl -fsS http://127.0.0.1:11434/api/tags for Ollama and curl -fsS http://127.0.0.1:18789/healthz for OpenClaw. From your laptop, use ssh -L 11434:127.0.0.1:11434 and ssh -L 18789:127.0.0.1:18789 without exposing either port publicly.

2026 OpenClaw + Ollama on a Rented Remote Mac: API Routing, Queue Concurrency, Retries

This runbook shows how to run repeatable batch inference with Ollama on a rented remote Mac while keeping OpenClaw nearby for agents and automation. You will install both services with sane defaults, route APIs through a small reverse proxy instead of exposing raw ports, cap concurrency at the queue and server layers, and add degradation plus retries so overnight jobs survive transient overload. For MacCompute capacity and access patterns, start from Home, the notes index, or help.

Goal and recommended layout

Batches fail when parallelism overshoots VRAM, models thrash, or clients quit on the first timeout. On a remote Mac, use:

Ollama — 127.0.0.1:11434 (inference only).
OpenClaw Gateway — 127.0.0.1:18789 (agents; see official Docker / CLI docs).
Edge — Caddy/Nginx for one TLS hostname on a VPN, or SSH -L from your laptop.
Queue worker — caps in-flight jobs and adds backoff.

Install Ollama on macOS (remote Mac)

Install and pull the model you will batch:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

Keep the default loopback bind unless you intentionally set OLLAMA_HOST. For reboot-safe settings, put OLLAMA_NUM_PARALLEL in a profile or launchd plist. Smoke test: curl -fsS http://127.0.0.1:11434/api/tags. OpenAI-style clients can use /v1/chat/completions on the same port.

OpenClaw Gateway: install points that matter for co-hosting

Two supported paths: global CLI (Node 24 or 22.16+) or Docker with ./scripts/docker/setup.sh on a clone of openclaw/openclaw. Typical Mac setup: npm install -g openclaw@latest, openclaw onboard --install-daemon, openclaw gateway --port 18789 --verbose; or Docker with bind-mounted config/workspace. Keep Ollama native for Metal; let OpenClaw tools call 127.0.0.1:11434. curl -fsS http://127.0.0.1:18789/healthz. More hardening: OpenClaw deploy & remote Mac workflows.

API routing: one hostname, two upstreams

For VPN clients that need HTTPS to both stacks, terminate TLS once and split by path (conceptual Caddy):

inference.internal.example.com {
  route /v1/* {
    reverse_proxy 127.0.0.1:11434
  }
  route /openclaw/* {
    reverse_proxy 127.0.0.1:18789
  }
}

Add proxy rate limits (limit_req or Caddy rate-limit) to absorb client bursts. No public hostname? Use ssh -L for 11434 and 18789 instead of opening the firewall.

Routing decisions at a glance

Traffic	Target	Why
Batch `/api/generate` or OpenAI shim	`127.0.0.1:11434`	Lowest latency; keep behind loopback or authenticated proxy.
Gateway UI and WS control plane	`127.0.0.1:18789`	OpenClaw health on `/healthz`; tunnel when debugging.
Untrusted internet	None by default	Require VPN, SSH, or mutual-TLS before publishing either port.

Queue script: concurrency cap and JSON-safe payloads

Store one prompt per line in prompts.txt. The worker below uses python3 only to build JSON safely. Parallelism uses a small bash job pool (works on stock macOS bash) instead of GNU xargs -P.

#!/usr/bin/env bash
set -euo pipefail
OLLAMA_URL="${OLLAMA_URL:-http://127.0.0.1:11434}"
MODEL="${MODEL:-llama3.2}"
MAX_JOBS="${MAX_JOBS:-2}"
PROMPTS="${1:?path to prompts.txt}"
mkdir -p out failed

run_one() {
  local i="$1" line="$2"
  local body try=0 delay=1
  body="$(python3 -c 'import json,sys; print(json.dumps({"model":sys.argv[1],"prompt":sys.argv[2],"stream":False}))' "$MODEL" "$line")"
  while (( try < 4 )); do
    if curl -fsS --max-time 600 -H 'Content-Type: application/json' \
      -d "$body" "$OLLAMA_URL/api/generate" -o "out/resp-$i.json"; then
      return 0
    fi
    sleep "$delay"
    delay=$(( delay * 2 ))
    try=$(( try + 1 ))
  done
  printf '%s\n' "$line" >> "failed/prompts-$i.txt"
  return 1
}
export -f run_one
export OLLAMA_URL MODEL

i=0
while IFS= read -r line || [ -n "${line-}" ]; do
  i=$((i+1))
  while (( $(jobs -rp | wc -l | tr -d ' ') >= MAX_JOBS )); do
    sleep 0.2
  done
  ( run_one "$i" "$line" ) || true
done < "$PROMPTS"
wait

Keep MAX_JOBS within OLLAMA_NUM_PARALLEL and unified memory; on 16 GB Macs start at 1 for 7B–8B models, then tune while watching memory pressure.

Resource limits: Ollama, macOS, and the batch driver

Ollama — OLLAMA_NUM_PARALLEL; optional OLLAMA_MAX_LOADED_MODELS if you swap models often.
Queue — MAX_JOBS ≤ server parallel after reserving headroom for cron and agents.
macOS — optional launchd SoftResourceLimits / HardResourceLimits on RAM.
OpenClaw — stagger automations so agent bursts do not align with Ollama peaks.

Degradation and retries

The worker retries HTTP errors with backoff. Add model steps: primary → smaller fallback → truncated prompt → failed/ dead letter. Cap attempts (four in the script) so one bad line cannot block the run.

FAQ

Should Ollama listen on all interfaces? Only if you fully understand the exposure. Prefer loopback plus SSH or VPN, and put auth in front of any routable address.

Where should concurrency limits live? At Ollama (OLLAMA_NUM_PARALLEL), the reverse proxy (rate limits), and the queue (MAX_JOBS). All three together prevent silent overload.

How does OpenClaw relate to Ollama here? OpenClaw runs the Gateway and agents; Ollama serves local LLM HTTP. They coexist on one Mac but should not share one process namespace.

What if jobs time out? Use curl --max-time, backoff, optional smaller model, and a dead-letter file—never infinite loops.

How do I verify both services? curl to /api/tags and /healthz; from off-box, tunnel ports with SSH instead of opening the firewall.

Summary

Co-host OpenClaw and Ollama on loopback, add a proxy only for internal TLS, and cap load at Ollama, the edge, and the queue. Retries plus fallbacks turn overnight batches into bounded, inspectable outcomes. For always-on Mac capacity, see pricing and purchase; help covers access.