Skip to content

API Reference

Endpoint

POST /vitallens-v3/file

  • Base URL: https://api.rouast.com
  • Content-Type: application/json

Request

Headers

Header Value Description
x-api-key string Required. Your unique API key.
Content-Type application/json Required.

Body Parameters

The body must be a JSON object.

Parameter Type Required Description
video string Yes Base64-encoded string of raw RGB24 video bytes.
• Resolution: 40x40
• Shape: (frames, 40, 40, 3)
fps float Conditional Frames per second (e.g., 30.0).
Required if process_signals is true. Used to detrend signals and calculate rates.
process_signals boolean No Default: false.
true: Calculates global vitals (HR, RR, HRV) based on video duration.
false: Returns raw waveforms only.
state string No Base64-encoded float32 string representing the LSTM internal state. Used for streaming long videos in chunks (see below).
model string No Specific model version to use. If omitted, the API selects the best model available for your plan.

Available Models

Capabilities depend on the model version (configurable via model parameter).

Model Capabilities Notes
vitallens-2.0 HR, RR, HRV (SDNN, RMSSD, LF/HF) Recommended. Best accuracy.
vitallens-1.1 HR, RR Legacy support.
vitallens-1.0 HR, RR Legacy support.

HTTP Status Codes

The API uses standard HTTP status codes to indicate the success or failure of the request parsing.

Code Meaning Common Causes
200 OK Processing successful. Note: This does not guarantee that a valid face was found or that the vital signs are reliable. You must check the processing_status object in the response body to validate the result.
400 Bad Request Missing fps when process_signals=true, or video is too short/long or entirely missing.
403 Forbidden Invalid API Key, or plan does not support the requested model.
422 Unprocessable Video Formatting Error. The byte string length does not match frames * 40 * 40 * 3.
429 Quota Exceeded You have run out of frame credits for your billing cycle.
500 Server Error Unexpected internal error.

Response Structure

The API returns a JSON object containing estimates, confidence scores, and processing metadata.

Example Response

Actual values from quickstart example:

{
  "vital_signs": {
    "ppg_waveform": {
      "data": [0.2094, 0.2895, 0.3912, "..."],
      "unit": "unitless",
      "confidence": [1.0, 1.0, 1.0, "..."],
      "note": "Processed estimate of the PPG waveform..."
    },
    "respiratory_waveform": {
      "data": [-0.2876, -0.2961, -0.2975, "..."],
      "unit": "unitless",
      "confidence": [0.9934, 0.9819, 0.9959, "..."],
      "note": "Processed estimate of the respiratory waveform..."
    },
    "heart_rate": {
      "value": 75.5167,
      "unit": "bpm",
      "confidence": 0.8885,
      "note": "Global estimate of heart rate..."
    },
    "respiratory_rate": {
      "value": 8.0018,
      "unit": "bpm",
      "confidence": 0.9681,
      "note": "Global estimate of respiratory rate..."
    },
    "hrv_sdnn": {
      "value": null,
      "unit": "ms",
      "confidence": null,
      "note": "Did not estimate HRV (SDNN) because provided video was too short..."
    },
    "hrv_rmssd": {
      "value": null,
      "unit": "ms",
      "confidence": null,
      "note": "Did not estimate HRV (RMSSD) because provided video was too short..."
    },
  },
  "processing_status": {
    "face_detected": true,
    "avg_face_confidence": 0.9773,
    "signal_quality": "optimal",
    "issues": []
  },
  "face": {
    "confidence": [0.8499, 0.9562, 0.9536, "..."],
    "note": "Confidence whether a live face is present in the provided video."
  },
  "state": {
    "data": "...",
    "note": "Provide in the next call if continuing with the same video."
  },
  "model_used": "vitallens-2.0",
  "message": "The provided values are estimates..."
}

Data Availability

The API returns different data points depending on the video length and whether process_signals is enabled.

Vital Sign JSON Key Type Required Duration Returned If
PPG Waveform ppg_waveform Continuous waveform N/A (Always) Always
Respiratory Waveform respiratory_waveform Continuous waveform N/A (Always) Always
Heart Rate heart_rate Global value ≥ 5s process_signals=true
Respiratory Rate respiratory_rate Global value ≥ 10s process_signals=true
HRV (SDNN) hrv_sdnn Global value ≥ 20s process_signals=true & Model 2.0
HRV (RMSSD) hrv_rmssd Global value ≥ 20s process_signals=true & Model 2.0
HRV (LF/HF) hrv_lfhf Global value ≥ 55s process_signals=true & Model 2.0

Null Values

If the video meets the Required Duration but the signal quality is too low (e.g., excessive motion), these values will be returned as null or NaN. Always check the processing_status object.

Processing Status & Quality

When process_signals=true, the response includes a processing_status object. You should check this before displaying any vital signs to the user.

  • face_detected: true if average face confidence > 50%. If false, all vitals are invalid.
  • signal_quality:
    • optimal: High confidence in both PPG and Respiratory signals.
    • suboptimal: Face detected, but one signal is weak (e.g., good HR but noisy RR). Check issues.
    • low: Face detected, but overall signal quality is too low to be reliable.
    • unusable: No face detected or extreme noise. This is often an indication that preprocessing is faulty (see Preprocessing Guide).
  • issues: Array of specific warnings, e.g., ["no_face_detected"], ["low_ppg_quality"], ["low_respiratory_quality"].

State Management (Streaming)

The VitalLens model uses a stateful architecture that maintains temporal context across frames. To process a long video (e.g., 60 seconds) in small chunks (e.g., 1 second at a time) without losing this context:

  1. Request 1: Send the first chunk of video.
  2. Response 1: API returns a state string.
  3. Request 2: Send the second chunk of video AND the state string from Response 1.
  4. Repeat: This mimics a continuous video stream.

Constraint

For stateless requests (no state provided), the video must be at least 16 frames. For stateful requests (with state), the video chunk can be as small as 5 frames.