API Reference

Private LLM
API Docs

OpenAI-compatible. Drop-in replacement for GPT-4o. Your data stays on our hardware.

Base URL

https://llm.ro2-labs.ai

Authentication

All authenticated endpoints require a Bearer token:

Authorization: Bearer ro2_...

Request a free key to get started. No credit card required.

Endpoints

GET /health Public API status and uptime
GET /v1/models Auth required List available models
POST /v1/chat/completions Auth required Chat completion (OpenAI-compatible)

Quick Start

curl
curl https://llm.ro2-labs.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ro2_..." \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize GLBA Section 501(b)."}
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

OpenAI SDK (Python)

python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.ro2-labs.ai/v1",
    api_key="ro2_...",
)

response = client.chat.completions.create(
    model="llama3:70b",
    messages=[
        {"role": "user", "content": "Summarize ITAR 22 CFR 120."}
    ],
)
print(response.choices[0].message.content)

Response Format

Responses include data residency headers confirming on-prem processing:

response.json
{
  "id": "chatcmpl-ro2-a7f3c2b1",
  "model": "llama3:70b",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "..."
    },
    "finish_reason": "stop"
  }],
  "x-ro2-data-residency": "on-prem-austin-tx",
  "x-ro2-third-party-routing": "none"
}

Pricing

Free

$0

10 req/hr · 100 req/mo

  • Full 70B model access
  • Streaming + batch
  • Data residency headers
Get Free Key

Pro

$49 /mo

300 req/hr · 15,000 req/mo

  • Everything in Free
  • Priority inference queue
  • Email support
Subscribe on RapidAPI

B2B Pilot

Custom

Tailored to your workload

  • Everything in Pro
  • Dedicated capacity windows
  • Pilot agreement + SLA
  • Direct Slack/email support
Contact Sales

Infrastructure Controls

Our architecture provides properties that support teams operating under regulatory constraints. Compliance is a shared responsibility between provider and customer.

Defense

Single-tenant hardware under U.S. jurisdiction. No foreign-national data access. No offshore sub-processors.

Healthcare

Data never leaves the inference environment. No third-party processors in the request path.

Finance

Dedicated hardware. No shared GPU pools. No third-party sub-processors handling customer data.

Legal

Documents processed on isolated, single-tenant hardware with no external data routing.

Technical Brief

Want the full picture?

Our technical brief covers the complete architecture, data flow, hardware specs, and how our infrastructure supports regulated workloads.

Read the Technical Brief