Skip to content

Direct API Calls

The VitalLens API is a web-based service that estimates a person's vital signs from a video of their face. To get your API Key or learn more, please visit the API page.

You can directly call the VitalLens API, which uses the same inference engine as our free iOS app VitalLens.

  • Expects a base64-encoded string of a raw RGB24 video, spatially downsampled to 40x40 pixels, of a person's face and upper chest.
  • Supports heart rate, respiratory rate, heart rate variability (HRV), pulse waveform, and respiratory waveform estimation.
  • Returns an estimation confidence for each vital.

Endpoint

POST /vitallens-v3/file

This endpoint is used to submit a video for processing and receive vital sign estimates in return.

Base URL:

https://api.rouast.com/vitallens-v3/file

Request

Headers

  • x-api-key: Your unique API key (required).
  • Content-Type: application/json (required).

Body

The body of the request should be a JSON object that includes the following fields:

  • video: (string) Required. The base64-encoded (cropped) video file containing a person's face and upper chest. This video must be formatted as raw RGB24 and downsampled to 40x40 pixels. The number of video frames must be between 16 and 900.
  • fps: (float or string) Required if process_signals is true. The frames per second of the video (e.g., 30.0 or "30.0").
  • process_signals: (string or boolean) Required to get global vital signs. Set to "1", "true", or true.
    • If omitted or set to false, the API will only return the ppg_waveform and respiratory_waveform.
    • If set to true, the API will calculate and return global values for heart_rate, respiratory_rate, and HRV metrics (if video duration is sufficient).
  • model: (string) Optional. Specify which model to use (e.g., "vitallens-2.0"). If omitted, the API automatically selects the best model available for your plan.

Example Assets

To run the example request below, first download the sample video file and place it in your working directory:

For this specific sample video, the recommended crop Region of Interest (ROI) is:

  • Width (W): 250
  • Height (H): 400
  • X-coordinate (X): 335
  • Y-coordinate (Y): 60

Example Request

This example assumes ffmpeg, ffprobe, and curl are installed.

input_video="sample_video_1.mp4"
api_key="YOUR_API_KEY"

# Crop ROI
CROP_W=250
CROP_H=400
CROP_X=335
CROP_Y=60

# Extract the frames per second (fps) from the video
fps=$(ffprobe -v error -select_streams v:0 -show_entries stream=avg_frame_rate -of default=nw=1:nk=1 "$input_video" | xargs -I {} sh -c 'echo "scale=2; {}" | bc')

# Convert the video by crop, scale, changing pixel format, and Base64 encode
video=$(ffmpeg -i "$input_video" -vf "crop=$CROP_W:$CROP_H:$CROP_X:$CROP_Y,scale=40:40" -pix_fmt rgb24 -f rawvideo - | base64)

# Prepare the JSON payload
echo "{\"video\": \"$video\", \"fps\": \"$fps\", \"process_signals\": \"1\"}" > payload.json

# Send the request to the API
curl -X POST -H "x-api-key: $api_key" -H "Content-Type: application/json" --data-binary @payload.json https://api.rouast.com/vitallens-v3/file

Guidelines for Accurate Estimation

To ensure optimal performance and accuracy, especially for global vital signs like HR, RR, and HRV:

  • Minimal Movement: Both the camera and the subject should be as still as possible.
  • Good Lighting: Ensure the face is well and evenly illuminated with a steady, non-flashing light source.
  • Input Cropping: The video frames should be cropped to the subject's face and upper chest before downsampling to 40x40 and encoding. Including the upper chest/shoulders provides motion cues that significantly improve respiratory rate accuracy.

The ideal input video should contain the full face and the upper chest area, as shown below with the crop used for our sample video.

Recommended cropping for sample video

Video Preprocessing Requirements

The API requires strictly formatted video input. Incorrect preprocessing is the most common cause of inaccurate results or "hallucinated" high confidence scores.

  1. Scaling vs. Tiling: When reducing your crop to 40x40 pixels, you must scale (resize) the image using an interpolation method (e.g., bilinear or bicubic). Do not tile a smaller crop to fill the 40x40 buffer, and do not pad with black pixels.
  2. Color Space: The input must be RGB (3 channels). Ensure you are not sending Grayscale (1 channel) or BGRA (4 channels).
  3. Pixel Format: The data must be raw byte values (0-255). Do not normalize to 0.0-1.0 floats.

Verifying Your Payload (Sanity Check)

Before sending your request, we strongly recommend inspecting the actual video data inside your JSON payload. Many client-side libraries introduce artifacts during resizing or encoding.

This is an illustrative one-line command to extract the base64 video from our payload.json and save the first frame as a visible PNG:

jq -r .video payload.json | base64 -d | ffmpeg -f rawvideo -pixel_format rgb24 -video_size 40x40 -i - -frames:v 1 -vf "scale=400:400:flags=neighbor" -y sanity_check.png

See below the result for our forrectly formatted sample request payload. The image shows a single, pixelated frame.

Sanity check result for our sample request

What to look for:

  • Correct: A single, pixelated face and upper chest filling the square. Colors should look natural (RGB).
  • Incorrect: A grid of multiple small faces (tiling error), unnatural blue skin (BGR error), or black bars (padding error).

Response & Status Codes

HTTP Status Codes

The API uses standard HTTP status codes to indicate the success or failure of the request parsing.

Status Code Meaning Description
200 OK Processing Successful The API successfully parsed the video and ran inference. Note: This does not guarantee that a valid face was found or that the vital signs are reliable. You must check the processing_status object in the response body to validate the result.
400 Bad Request Invalid Parameters Required parameters (e.g., video or fps) are missing, or the video format/length is invalid.
403 Forbidden Forbidden The API Key provided is missing, invalid, or does not have access to the requested model.
422 Unprocessable Entity Parsing Error The video string could not be decoded into the expected shape (e.g., wrong resolution or bytes).
429 Too Many Requests Quota Exceeded You have exceeded the rate limit or frame quota for your plan.
500 Internal Server Error Server Error An unexpected error occurred on the server.

Returned estimation results

When directly called the API returns estimates for the following vital signs:

Name Type Returned if
ppg_waveform Continuous waveform Always returned
respiratory_waveform Continuous waveform Always returned
heart_rate Global value process_signals=1, fps provided, and video ≥ 5s
respiratory_rate Global value process_signals=1, fps provided, and video ≥ 10s
hrv_sdnn Global value process_signals=1, fps provided, video ≥ 20s, and model supports hrv
hrv_rmssd Global value process_signals=1, fps provided, video ≥ 20s, and model supports hrv

Validation: When process_signals=true, the response includes a processing_status object. You should check this before displaying any vital signs to the user.

  • face_detected (boolean): true if the average face confidence > 50%. If false, all data should be discarded.
  • signal_quality (string):
    • "optimal": Face detected and good signal quality.
    • "suboptimal": Face detected, but one signal is weak (e.g., good HR but noisy RR). Check issues.
    • "low": Face detected, but overall signal quality is too low to be reliable.
    • "unusable": No face detected.
  • issues (array): Specific quality warnings, e.g., ["no_face_detected"], ["low_ppg_quality"], ["low_respiratory_quality"], or ["low_signal_quality"].

The estimation results are returned with the following structure (actual values from above example):

{
  "vital_signs": {
    "ppg_waveform": {
      "data": [0.5657663258622856, 0.4904857244706095, "..." ],
      "unit": "unitless",
      "confidence": [0.9840503931045532, 0.9109493494033813, "..." ],
      "note": "Processed estimate of the PPG waveform..."
    },
    "respiratory_waveform": {
      "data": [0.04170521843334242, 0.03956666101315539, "..." ],
      "unit": "unitless",
      "confidence": [0.26131314039230347, 0.2909419536590576, "..." ],
      "note": "Processed estimate of the respiratory waveform..."
    },
    "heart_rate": {
      "value": 76.5,
      "unit": "bpm",
      "confidence": 0.96,
      "note": "Global estimate of heart rate..."
    },
    "respiratory_rate": {
      "value": 16.5,
      "unit": "bpm",
      "confidence": 0.97,
      "note": "Global estimate of respiratory rate..."
    }
  },
  "processing_status": {
    "face_detected": true,
    "avg_face_confidence": 0.98,
    "signal_quality": "optimal",
    "issues": []
  },
  "face": {
    "confidence": [ 0.99, 0.99, 0.98, "..." ],
    "note": "Confidence whether a live face is present in the provided video."
  },
  "state": {
    "data": "...",
    "note": "Provide in the next call if continuing with the same video."
  },
  "model_used": "vitallens-1.0",
  "message": "The provided values are estimates..."
}