API Reference

Private LLM
API Docs

OpenAI-compatible. Drop-in replacement for GPT-4o. Your data stays on our hardware.

Base URL

https://llm.ro2-labs.ai

Authentication

All authenticated endpoints require a Bearer token:

Authorization: Bearer ro2_...

Request a free key to get started. No credit card required.

Endpoints

GET /health Public API status and uptime

GET /v1/models Auth required List available models

POST /v1/chat/completions Auth required Chat completion (OpenAI-compatible)

Quick Start

curl

curl https://llm.ro2-labs.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ro2_..." \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize GLBA Section 501(b)."}
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

OpenAI SDK (Python)

python

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.ro2-labs.ai/v1",
    api_key="ro2_...",
)

response = client.chat.completions.create(
    model="llama3:70b",
    messages=[
        {"role": "user", "content": "Summarize ITAR 22 CFR 120."}
    ],
)
print(response.choices[0].message.content)

Response Format

Responses include data residency headers confirming on-prem processing:

response.json

{
  "id": "chatcmpl-ro2-a7f3c2b1",
  "model": "llama3:70b",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "..."
    },
    "finish_reason": "stop"
  }],
  "x-ro2-data-residency": "on-prem-austin-tx",
  "x-ro2-third-party-routing": "none"
}

Pricing

Free

10 req/hr · 100 req/mo

Full 70B model access
Streaming + batch
Data residency headers

Get Free Key

Pro

$49 /mo

300 req/hr · 15,000 req/mo

Everything in Free
Priority inference queue
Email support

Subscribe on RapidAPI

B2B Pilot

Custom

Tailored to your workload

Everything in Pro
Dedicated capacity windows
Pilot agreement + SLA
Direct Slack/email support

Contact Sales

Infrastructure Controls

Our architecture provides properties that support teams operating under regulatory constraints. Compliance is a shared responsibility between provider and customer.

Defense

Single-tenant hardware under U.S. jurisdiction. No foreign-national data access. No offshore sub-processors.

Healthcare

Data never leaves the inference environment. No third-party processors in the request path.

Finance

Dedicated hardware. No shared GPU pools. No third-party sub-processors handling customer data.

Legal

Documents processed on isolated, single-tenant hardware with no external data routing.

Private LLMAPI Docs