Pixazo APIModelsVibeVoice
Pixazo APIModelsVibeVoice

VibeVoice API - AI Text to Speech APIs

by Microsoft

VibeVoice API, developers can convert text into realistic speech with multiple voice options and speaking styles. The API includes real-time capabilities for interactive applications, making it suitable for virtual assistants, accessibility features, and any application requiring natural-sounding voice output.

Get API Key
VibeVoice TTS API

Models Version

LIMITED TIME OFFER

Get $5 Free Credit on First Payment

No strings attached — add funds and get $5 bonus instantly

Claim Your $5 →

VibeVoice v1 Text to Speech API Documentation

https://gateway.pixazo.ai/vibevoice/v1

Authentication

All requests require an API key passed via header.

Header Type Required Description
Ocp-Apim-Subscription-Key string Yes Your API subscription key

Text to Speech - Vibe Voice API

Request Code

POST https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}
import requests

url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
    "speakers": [
        {
            "preset": "Alice [EN]"
        }
    ]
}

response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest";
const headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
    script: "Speaker 0: Hello, this is a test of the VibeVoice API.",
    speakers: [
        {
            preset: "Alice [EN]"
        }
    ]
};

fetch(url, {
    method: "POST",
    headers: headers,
    body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));
curl -v -X POST "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}'

Output

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Webhook (Optional)

Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.

X-Webhook-URL: https://your-server.com/webhook/callback

Request Parameters - Text to Speech

Field Type Required Default Description
script string Yes The script to convert to speech. Can be formatted with `Speaker X:` prefixes for multi-speaker dialogues. This is the main text content that will be converted to audio.
speakers array<object> No [] List of speakers to use for the script. Each speaker object can contain a preset voice or a custom audio URL. If not provided, speakers will be inferred from the script or voice samples.
speakers[].preset string No Alice [EN] Default voice preset to use for the speaker. Not used if `audio_url` is provided. Available presets: Alice [EN], Carter [EN], Frank [EN], Mary [EN] (Background Music), Maya [EN], Anchen [ZH] (Background Music), Bowen [ZH], Xinran [ZH].
speakers[].audio_url string (URI) No URL to a voice sample audio file for voice cloning. If provided, the preset will be ignored and the AI will clone the voice from this sample.
seed integer No Random seed for reproducible generation. Use the same seed with the same script to get consistent results across multiple generations.
cfg_scale float No 1.3 CFG (Classifier-Free Guidance) scale for generation. Higher values increase adherence to text but may reduce naturalness. Range typically 1.0–2.0.

Minimum Request

{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}

Full Request (all options)

{
  "script": "Speaker 0: VibeVoice is now available on Pixazo. Isn't that right, Carter?\nSpeaker 1: That's right Frank, and it supports up to four speakers at once. Try it now!",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ],
  "cfg_scale": 1.3,
  "seed": 42
}

Response

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Request Headers

Header Value
Content-Type application/json
Cache-Control no-cache
Ocp-Apim-Subscription-Key Your API subscription key

Response Handling

Common status codes for Text to Speech.

Code Meaning
202 Accepted — Request queued
Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
Too Many Requests
500 Internal Server Error

Response Handling

Common status codes.

CodeMeaning
202Accepted — Request queued
Bad Request
401Unauthorized
402Insufficient Balance
403Forbidden
Too Many Requests
500Internal Server Error

Error Responses

Queue system errors and model validation errors.

Queue System Errors

// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'vibevoice' not found or is disabled"
}

Error via Status/Webhook

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "vibevoice",
  "error": "Description of the error",
  "output": null
}

Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

Endpoint

GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY

cURL Example

curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Response (Completed)

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "vibevoice",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}

Response Fields

FieldTypeDescription
request_idstringUnique request identifier
statusstringQUEUED, PROCESSING, COMPLETED, FAILED, or ERROR
model_idstringModel that processed the request
errorstring|nullError message if failed
output.media_urlarrayURLs to generated media (R2 CDN)
output.media_typestringMIME type of the output
created_atstringWhen request was created
completed_atstring|nullWhen request completed
polling_urlstringStatus URL (initial response only)

Status Values

StatusDescription
QUEUEDRequest accepted, waiting to be processed
PROCESSINGBeing processed by the model
COMPLETEDDone — output contains the result
FAILEDFailed — check error field
ERRORSystem error — not charged

Status Flow

QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR

Typical Workflow

  1. Send a generate request to the API endpoint
  2. Save the request_id from the response
  3. Poll every 5-10 seconds: GET /v2/requests/status/{request_id}
  4. When status is "COMPLETED", download from output.media_url

Tip: Use X-Webhook-URL header to get a callback instead of polling.

VibeVoice v1 Text to Speech API Pricing

ResolutionPrice (USD)
480p$0.75
580p$1
720p$1.25

VibeVoice v1 Text to Speech (Realtime) API Documentation

https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1

Authentication

All requests require an API key passed via header.

Header Type Required Description
Ocp-Apim-Subscription-Key string Yes Your API subscription key

VibeVoice-Realtime-0.5B generate request - Vibe Voice-Realtime-0.5B

Request Code

POST https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
}
import requests

url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
    "speakers": [
        {
            "preset": "Frank [EN]"
        },
        {
            "preset": "Carter [EN]"
        }
    ]
}

response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request";
const headers = {
  "Content-Type": "application/json",
  "Cache-Control": "no-cache",
  "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
};

fetch(url, {
  method: "POST",
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));
curl -X POST "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request" \
  -H "Content-Type: application/json" \
  -H "Cache-Control: no-cache" \
  -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
  --data-raw '{
    "script": "Speaker 0: Hello, this is Frank.\\nSpeaker 1: And I am Carter.",
    "speakers": [
      {
        "preset": "Frank [EN]"
      },
      {
        "preset": "Carter [EN]"
      }
    ]
  }'

Output

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Webhook (Optional)

Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.

X-Webhook-URL: https://your-server.com/webhook/callback

Request Parameters - VibeVoice-Realtime-0.5B generate request

Field Type Required Default Description
script string Yes The dialogue script with speaker labels (e.g., "Speaker 0: Hello..."). Each line must begin with "Speaker X:" where X is a zero-based index matching the speakers array.
speakers array of objects Yes Array of speaker configurations. Each object must contain a preset field identifying the voice.
speakers[].preset string Yes The predefined voice preset to use for each speaker. Supported presets include: "Frank [EN]", "Carter [EN]", and other Microsoft TTS voices. Must match available voice identifiers.
cfg_scale number No 1.3 Classifier-free guidance scale, controls how closely the output adheres to the input script and speaker intent. Higher values increase fidelity but may reduce naturalness.

Minimum Request

{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
}

Full Request (all options)

{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ],
  "cfg_scale": 1.3
}

Response

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Request Headers

Header Value
Content-Type application/json
Cache-Control no-cache
Ocp-Apim-Subscription-Key Your API subscription key

Response Handling

Common status codes for VibeVoice-Realtime-0.5B generate request.

Code Meaning
202 Accepted — Request queued
Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
Too Many Requests
500 Internal Server Error

Response Handling

Common status codes.

CodeMeaning
202Accepted — Request queued
Bad Request
401Unauthorized
402Insufficient Balance
403Forbidden
Too Many Requests
500Internal Server Error

Error Responses

Queue system errors and model validation errors.

Queue System Errors

// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'vibevoice-realtime-0-5b-135' not found or is disabled"
}

Error via Status/Webhook

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "vibevoice-realtime-0-5b-135",
  "error": "Description of the error",
  "output": null
}

Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

Endpoint

GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY

cURL Example

curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Response (Completed)

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "vibevoice-realtime-0-5b-135",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice-realtime-0-5b-135_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}

Response Fields

FieldTypeDescription
request_idstringUnique request identifier
statusstringQUEUED, PROCESSING, COMPLETED, FAILED, or ERROR
model_idstringModel that processed the request
errorstring|nullError message if failed
output.media_urlarrayURLs to generated media (R2 CDN)
output.media_typestringMIME type of the output
created_atstringWhen request was created
completed_atstring|nullWhen request completed
polling_urlstringStatus URL (initial response only)

Status Values

StatusDescription
QUEUEDRequest accepted, waiting to be processed
PROCESSINGBeing processed by the model
COMPLETEDDone — output contains the result
FAILEDFailed — check error field
ERRORSystem error — not charged

Status Flow

QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR

Typical Workflow

  1. Send a generate request to the API endpoint
  2. Save the request_id from the response
  3. Poll every 5-10 seconds: GET /v2/requests/status/{request_id}
  4. When status is "COMPLETED", download from output.media_url

Tip: Use X-Webhook-URL header to get a callback instead of polling.

VibeVoice v1 Text to Speech (Realtime) API Pricing

No data available

Could not load current pricing