VibeVoice API - AI Text to Speech APIs

VibeVoice API, developers can convert text into realistic speech with multiple voice options and speaking styles. The API includes real-time capabilities for interactive applications, making it suitable for virtual assistants, accessibility features, and any application requiring natural-sounding voice output.

Get API Key

Models Version

LIMITED TIME OFFER

Get $5 Free Credit on First Payment

No strings attached — add funds and get $5 bonus instantly

Claim Your $5 →

VibeVoice v1 Text to Speech API Documentation

https://gateway.pixazo.ai/vibevoice/v1

Authentication

All requests require an API key passed via header.

Header	Type	Required	Description
Ocp-Apim-Subscription-Key	string	Yes	Your API subscription key

Text to Speech - Vibe Voice API

Request Code

POST https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}

import requests

url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
    "speakers": [
        {
            "preset": "Alice [EN]"
        }
    ]
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

const url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest";
const headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
    script: "Speaker 0: Hello, this is a test of the VibeVoice API.",
    speakers: [
        {
            preset: "Alice [EN]"
        }
    ]
};

fetch(url, {
    method: "POST",
    headers: headers,
    body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));

curl -v -X POST "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}'

Output

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Try Now

Webhook (Optional)

Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.

X-Webhook-URL: https://your-server.com/webhook/callback

Request Parameters - Text to Speech

Field	Type	Required	Default	Description
script	string	Yes	—	The script to convert to speech. Can be formatted with `Speaker X:` prefixes for multi-speaker dialogues. This is the main text content that will be converted to audio.
speakers	array<object>	No	[]	List of speakers to use for the script. Each speaker object can contain a preset voice or a custom audio URL. If not provided, speakers will be inferred from the script or voice samples.
speakers[].preset	string	No	Alice [EN]	Default voice preset to use for the speaker. Not used if `audio_url` is provided. Available presets: Alice [EN], Carter [EN], Frank [EN], Mary [EN] (Background Music), Maya [EN], Anchen [ZH] (Background Music), Bowen [ZH], Xinran [ZH].
speakers[].audio_url	string (URI)	No	—	URL to a voice sample audio file for voice cloning. If provided, the preset will be ignored and the AI will clone the voice from this sample.
seed	integer	No	—	Random seed for reproducible generation. Use the same seed with the same script to get consistent results across multiple generations.
cfg_scale	float	No	1.3	CFG (Classifier-Free Guidance) scale for generation. Higher values increase adherence to text but may reduce naturalness. Range typically 1.0–2.0.

Minimum Request

{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}

Full Request (all options)

{
  "script": "Speaker 0: VibeVoice is now available on Pixazo. Isn't that right, Carter?\nSpeaker 1: That's right Frank, and it supports up to four speakers at once. Try it now!",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ],
  "cfg_scale": 1.3,
  "seed": 42
}

Response

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Request Headers

Header	Value
Content-Type	application/json
Cache-Control	no-cache
Ocp-Apim-Subscription-Key	Your API subscription key

Response Handling

Common status codes for Text to Speech.

Code	Meaning
202	Accepted — Request queued
400	Bad Request
401	Unauthorized
403	Forbidden
404	Not Found
429	Too Many Requests
500	Internal Server Error

Response Handling

Common status codes.

Code	Meaning
202	Accepted — Request queued
400	Bad Request
401	Unauthorized
402	Insufficient Balance
403	Forbidden
429	Too Many Requests
500	Internal Server Error

Error Responses

Queue system errors and model validation errors.

Queue System Errors

// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}

// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'vibevoice' not found or is disabled"
}

Error via Status/Webhook

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "vibevoice",
  "error": "Description of the error",
  "output": null
}

Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

Endpoint

GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY

cURL Example

curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Response (Completed)

{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "vibevoice",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}

Response Fields

Field	Type	Description
request_id	string	Unique request identifier
status	string	QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR
model_id	string	Model that processed the request
error	string\|null	Error message if failed
output.media_url	array	URLs to generated media (R2 CDN)
output.media_type	string	MIME type of the output
created_at	string	When request was created
completed_at	string\|null	When request completed
polling_url	string	Status URL (initial response only)

Status Values

Status	Description
QUEUED	Request accepted, waiting to be processed
PROCESSING	Being processed by the model
COMPLETED	Done — output contains the result
FAILED	Failed — check error field
ERROR	System error — not charged

Status Flow

QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR

Typical Workflow

Send a generate request to the API endpoint
Save the request_id from the response
Poll every 5-10 seconds: GET /v2/requests/status/{request_id}
When status is "COMPLETED", download from output.media_url

Tip: Use X-Webhook-URL header to get a callback instead of polling.

VibeVoice v1 Text to Speech API Pricing

Resolution	Price (USD)
480p	$0.75
580p	$1
720p	$1.25

VibeVoice v1 Text to Speech (Realtime) API Documentation

https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1

Authentication

All requests require an API key passed via header.

Header	Type	Required	Description
Ocp-Apim-Subscription-Key	string	Yes	Your API subscription key

VibeVoice-Realtime-0.5B generate request - Vibe Voice-Realtime-0.5B

Request Code

POST https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
}

import requests

url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
    "speakers": [
        {
            "preset": "Frank [EN]"
        },
        {
            "preset": "Carter [EN]"
        }
    ]
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

const url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request";
const headers = {
  "Content-Type": "application/json",
  "Cache-Control": "no-cache",
  "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
};

fetch(url, {
  method: "POST",
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));

curl -X POST "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request" \
  -H "Content-Type: application/json" \
  -H "Cache-Control: no-cache" \
  -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
  --data-raw '{
    "script": "Speaker 0: Hello, this is Frank.\\nSpeaker 1: And I am Carter.",
    "speakers": [
      {
        "preset": "Frank [EN]"
      },
      {
        "preset": "Carter [EN]"
      }
    ]
  }'

Output

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Try Now

Webhook (Optional)

Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.

X-Webhook-URL: https://your-server.com/webhook/callback

Request Parameters - VibeVoice-Realtime-0.5B generate request

Field	Type	Required	Default	Description
script	string	Yes	—	The dialogue script with speaker labels (e.g., "Speaker 0: Hello..."). Each line must begin with "Speaker X:" where X is a zero-based index matching the speakers array.
speakers	array of objects	Yes	—	Array of speaker configurations. Each object must contain a preset field identifying the voice.
speakers[].preset	string	Yes	—	The predefined voice preset to use for each speaker. Supported presets include: "Frank [EN]", "Carter [EN]", and other Microsoft TTS voices. Must match available voice identifiers.
cfg_scale	number	No	1.3	Classifier-free guidance scale, controls how closely the output adheres to the input script and speaker intent. Higher values increase fidelity but may reduce naturalness.

Minimum Request

{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
}

Full Request (all options)

{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ],
  "cfg_scale": 1.3
}

Response

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Request Headers

Header	Value
Content-Type	application/json
Cache-Control	no-cache
Ocp-Apim-Subscription-Key	Your API subscription key

Response Handling

Common status codes for VibeVoice-Realtime-0.5B generate request.

Code	Meaning
202	Accepted — Request queued
400	Bad Request
401	Unauthorized
403	Forbidden
404	Not Found
429	Too Many Requests
500	Internal Server Error

Response Handling

Common status codes.

Code	Meaning
202	Accepted — Request queued
400	Bad Request
401	Unauthorized
402	Insufficient Balance
403	Forbidden
429	Too Many Requests
500	Internal Server Error

Error Responses

Queue system errors and model validation errors.

Queue System Errors

// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}

// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'vibevoice-realtime-0-5b-135' not found or is disabled"
}

Error via Status/Webhook

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "vibevoice-realtime-0-5b-135",
  "error": "Description of the error",
  "output": null
}

Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

Endpoint

GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY

cURL Example

curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Response (Completed)

{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "vibevoice-realtime-0-5b-135",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice-realtime-0-5b-135_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}

Response Fields

Field	Type	Description
request_id	string	Unique request identifier
status	string	QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR
model_id	string	Model that processed the request
error	string\|null	Error message if failed
output.media_url	array	URLs to generated media (R2 CDN)
output.media_type	string	MIME type of the output
created_at	string	When request was created
completed_at	string\|null	When request completed
polling_url	string	Status URL (initial response only)

Status Values

Status	Description
QUEUED	Request accepted, waiting to be processed
PROCESSING	Being processed by the model
COMPLETED	Done — output contains the result
FAILED	Failed — check error field
ERROR	System error — not charged

Status Flow

QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR

Typical Workflow

Send a generate request to the API endpoint
Save the request_id from the response
Poll every 5-10 seconds: GET /v2/requests/status/{request_id}
When status is "COMPLETED", download from output.media_url

Tip: Use X-Webhook-URL header to get a callback instead of polling.

VibeVoice v1 Text to Speech (Realtime) API Pricing

No data available

Could not load current pricing