VibeVoice API - AI Text to Speech APIs
by Microsoft
VibeVoice API, developers can convert text into realistic speech with multiple voice options and speaking styles. The API includes real-time capabilities for interactive applications, making it suitable for virtual assistants, accessibility features, and any application requiring natural-sounding voice output.

Models Version
Get $5 Free Credit on First Payment
No strings attached — add funds and get $5 bonus instantly
VibeVoice v1 Text to Speech API Documentation
https://gateway.pixazo.ai/vibevoice/v1
Authentication
All requests require an API key passed via header.
| Header | Type | Required | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | string | Yes | Your API subscription key |
Text to Speech - Vibe Voice API
Request Code
POST https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
{
"script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
"speakers": [
{
"preset": "Alice [EN]"
}
]
}
import requests
url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest"
headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
"script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
"speakers": [
{
"preset": "Alice [EN]"
}
]
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest";
const headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
script: "Speaker 0: Hello, this is a test of the VibeVoice API.",
speakers: [
{
preset: "Alice [EN]"
}
]
};
fetch(url, {
method: "POST",
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));
curl -v -X POST "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
"script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
"speakers": [
{
"preset": "Alice [EN]"
}
]
}'
Output
{
"request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Webhook (Optional)
Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.
X-Webhook-URL: https://your-server.com/webhook/callback
Request Parameters - Text to Speech
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| script | string | Yes | — | The script to convert to speech. Can be formatted with `Speaker X:` prefixes for multi-speaker dialogues. This is the main text content that will be converted to audio. |
| speakers | array<object> | No | [] | List of speakers to use for the script. Each speaker object can contain a preset voice or a custom audio URL. If not provided, speakers will be inferred from the script or voice samples. |
| speakers[].preset | string | No | Alice [EN] | Default voice preset to use for the speaker. Not used if `audio_url` is provided. Available presets: Alice [EN], Carter [EN], Frank [EN], Mary [EN] (Background Music), Maya [EN], Anchen [ZH] (Background Music), Bowen [ZH], Xinran [ZH]. |
| speakers[].audio_url | string (URI) | No | — | URL to a voice sample audio file for voice cloning. If provided, the preset will be ignored and the AI will clone the voice from this sample. |
| seed | integer | No | — | Random seed for reproducible generation. Use the same seed with the same script to get consistent results across multiple generations. |
| cfg_scale | float | No | 1.3 | CFG (Classifier-Free Guidance) scale for generation. Higher values increase adherence to text but may reduce naturalness. Range typically 1.0–2.0. |
Minimum Request
{
"script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
"speakers": [
{
"preset": "Alice [EN]"
}
]
}
Full Request (all options)
{
"script": "Speaker 0: VibeVoice is now available on Pixazo. Isn't that right, Carter?\nSpeaker 1: That's right Frank, and it supports up to four speakers at once. Try it now!",
"speakers": [
{
"preset": "Frank [EN]"
},
{
"preset": "Carter [EN]"
}
],
"cfg_scale": 1.3,
"seed": 42
}
Response
{
"request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Request Headers
| Header | Value |
|---|---|
| Content-Type | application/json |
| Cache-Control | no-cache |
| Ocp-Apim-Subscription-Key | Your API subscription key |
Response Handling
Common status codes for Text to Speech.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 403 | Forbidden |
| 404 | Not Found |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Response Handling
Common status codes.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 402 | Insufficient Balance |
| 403 | Forbidden |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Error Responses
Queue system errors and model validation errors.
Queue System Errors
// 402 — Insufficient balance
{
"error": "Insufficient Balance",
"message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
"error": "Model not found",
"message": "Model 'vibevoice' not found or is disabled"
}
Error via Status/Webhook
{
"request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "vibevoice",
"error": "Description of the error",
"output": null
}
Retrieving Results
Poll the universal status endpoint to check progress and retrieve results.
Endpoint
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
cURL Example
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
"https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Response (Completed)
{
"request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "vibevoice",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice_019dxxxx-xxxx/output.ext"
],
"media_type": "application/octet-stream"
},
"created_at": "2026-03-31T10:00:00.000Z",
"updated_at": "2026-03-31T10:00:15.000Z",
"completed_at": "2026-03-31T10:00:15.000Z"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| request_id | string | Unique request identifier |
| status | string | QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR |
| model_id | string | Model that processed the request |
| error | string|null | Error message if failed |
| output.media_url | array | URLs to generated media (R2 CDN) |
| output.media_type | string | MIME type of the output |
| created_at | string | When request was created |
| completed_at | string|null | When request completed |
| polling_url | string | Status URL (initial response only) |
Status Values
| Status | Description |
|---|---|
| QUEUED | Request accepted, waiting to be processed |
| PROCESSING | Being processed by the model |
| COMPLETED | Done — output contains the result |
| FAILED | Failed — check error field |
| ERROR | System error — not charged |
Status Flow
QUEUED → PROCESSING → COMPLETED
→ FAILED
→ ERROR
Typical Workflow
- Send a generate request to the API endpoint
- Save the
request_idfrom the response - Poll every 5-10 seconds:
GET /v2/requests/status/{request_id} - When
statusis"COMPLETED", download fromoutput.media_url
Tip: Use X-Webhook-URL header to get a callback instead of polling.
VibeVoice v1 Text to Speech API Pricing
| Resolution | Price (USD) |
|---|---|
| 480p | $0.75 |
| 580p | $1 |
| 720p | $1.25 |
VibeVoice v1 Text to Speech (Realtime) API Documentation
https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1
Authentication
All requests require an API key passed via header.
| Header | Type | Required | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | string | Yes | Your API subscription key |
VibeVoice-Realtime-0.5B generate request - Vibe Voice-Realtime-0.5B
Request Code
POST https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
{
"script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
"speakers": [
{
"preset": "Frank [EN]"
},
{
"preset": "Carter [EN]"
}
]
}
import requests
url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request"
headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
"script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
"speakers": [
{
"preset": "Frank [EN]"
},
{
"preset": "Carter [EN]"
}
]
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request";
const headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
"script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
"speakers": [
{
"preset": "Frank [EN]"
},
{
"preset": "Carter [EN]"
}
]
};
fetch(url, {
method: "POST",
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));
curl -X POST "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
"script": "Speaker 0: Hello, this is Frank.\\nSpeaker 1: And I am Carter.",
"speakers": [
{
"preset": "Frank [EN]"
},
{
"preset": "Carter [EN]"
}
]
}'
Output
{
"request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Webhook (Optional)
Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.
X-Webhook-URL: https://your-server.com/webhook/callback
Request Parameters - VibeVoice-Realtime-0.5B generate request
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| script | string | Yes | — | The dialogue script with speaker labels (e.g., "Speaker 0: Hello..."). Each line must begin with "Speaker X:" where X is a zero-based index matching the speakers array. |
| speakers | array of objects | Yes | — | Array of speaker configurations. Each object must contain a preset field identifying the voice. |
| speakers[].preset | string | Yes | — | The predefined voice preset to use for each speaker. Supported presets include: "Frank [EN]", "Carter [EN]", and other Microsoft TTS voices. Must match available voice identifiers. |
| cfg_scale | number | No | 1.3 | Classifier-free guidance scale, controls how closely the output adheres to the input script and speaker intent. Higher values increase fidelity but may reduce naturalness. |
Minimum Request
{
"script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
"speakers": [
{
"preset": "Frank [EN]"
},
{
"preset": "Carter [EN]"
}
]
}
Full Request (all options)
{
"script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
"speakers": [
{
"preset": "Frank [EN]"
},
{
"preset": "Carter [EN]"
}
],
"cfg_scale": 1.3
}
Response
{
"request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Request Headers
| Header | Value |
|---|---|
| Content-Type | application/json |
| Cache-Control | no-cache |
| Ocp-Apim-Subscription-Key | Your API subscription key |
Response Handling
Common status codes for VibeVoice-Realtime-0.5B generate request.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 403 | Forbidden |
| 404 | Not Found |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Response Handling
Common status codes.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 402 | Insufficient Balance |
| 403 | Forbidden |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Error Responses
Queue system errors and model validation errors.
Queue System Errors
// 402 — Insufficient balance
{
"error": "Insufficient Balance",
"message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
"error": "Model not found",
"message": "Model 'vibevoice-realtime-0-5b-135' not found or is disabled"
}
Error via Status/Webhook
{
"request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "vibevoice-realtime-0-5b-135",
"error": "Description of the error",
"output": null
}
Retrieving Results
Poll the universal status endpoint to check progress and retrieve results.
Endpoint
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
cURL Example
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
"https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Response (Completed)
{
"request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "vibevoice-realtime-0-5b-135",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice-realtime-0-5b-135_019dxxxx-xxxx/output.ext"
],
"media_type": "application/octet-stream"
},
"created_at": "2026-03-31T10:00:00.000Z",
"updated_at": "2026-03-31T10:00:15.000Z",
"completed_at": "2026-03-31T10:00:15.000Z"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| request_id | string | Unique request identifier |
| status | string | QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR |
| model_id | string | Model that processed the request |
| error | string|null | Error message if failed |
| output.media_url | array | URLs to generated media (R2 CDN) |
| output.media_type | string | MIME type of the output |
| created_at | string | When request was created |
| completed_at | string|null | When request completed |
| polling_url | string | Status URL (initial response only) |
Status Values
| Status | Description |
|---|---|
| QUEUED | Request accepted, waiting to be processed |
| PROCESSING | Being processed by the model |
| COMPLETED | Done — output contains the result |
| FAILED | Failed — check error field |
| ERROR | System error — not charged |
Status Flow
QUEUED → PROCESSING → COMPLETED
→ FAILED
→ ERROR
Typical Workflow
- Send a generate request to the API endpoint
- Save the
request_idfrom the response - Poll every 5-10 seconds:
GET /v2/requests/status/{request_id} - When
statusis"COMPLETED", download fromoutput.media_url
Tip: Use X-Webhook-URL header to get a callback instead of polling.
VibeVoice v1 Text to Speech (Realtime) API Pricing
No data available
Could not load current pricing