# VibeVoice TTS API

> Provider: **Microsoft**
> Source: https://www.pixazo.ai/models/vibevoice

Text to speech capabilities by Microsoft.

## VibeVoice v1

### Text to Speech

## Base URL

```
https://gateway.pixazo.ai/vibevoice/v1
```

## Authentication

All requests require an API key passed via header.

Header

Type

Required

Description

Ocp-Apim-Subscription-Key

string

Yes

Your API subscription key

## Text to Speech - Vibe Voice API

## Request Code

HTTP Python JavaScript cURL

```
POST https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}
```

```
import requests

url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
    "speakers": [
        {
            "preset": "Alice [EN]"
        }
    ]
}

response = requests.post(url, json=data, headers=headers)
print(response.json())
```

```
const url = "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest";
const headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
    script: "Speaker 0: Hello, this is a test of the VibeVoice API.",
    speakers: [
        {
            preset: "Alice [EN]"
        }
    ]
};

fetch(url, {
    method: "POST",
    headers: headers,
    body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));
```

```
curl -v -X POST "https://gateway.pixazo.ai/vibevoice/v1/vibevoice/generateRequest" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}'
```

## Output

```
{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```

[Try Now](https://api.pixazo.ai/api-details#api=vibevoice&operation=text-to-speech)

## Webhook (Optional)

Add the `X-Webhook-URL` header to your generate request to receive a POST callback instead of polling.

```
X-Webhook-URL: https://your-server.com/webhook/callback
```

## Request Parameters - Text to Speech

Field

Type

Required

Default

Description

script

string

Yes

—

The script to convert to speech. Can be formatted with \`Speaker X:\` prefixes for multi-speaker dialogues. This is the main text content that will be converted to audio.

speakers

array<object>

No

\[\]

List of speakers to use for the script. Each speaker object can contain a preset voice or a custom audio URL. If not provided, speakers will be inferred from the script or voice samples.

speakers\[\].preset

string

No

Alice \[EN\]

Default voice preset to use for the speaker. Not used if \`audio\_url\` is provided. Available presets: Alice \[EN\], Carter \[EN\], Frank \[EN\], Mary \[EN\] (Background Music), Maya \[EN\], Anchen \[ZH\] (Background Music), Bowen \[ZH\], Xinran \[ZH\].

speakers\[\].audio\_url

string (URI)

No

—

URL to a voice sample audio file for voice cloning. If provided, the preset will be ignored and the AI will clone the voice from this sample.

seed

integer

No

—

Random seed for reproducible generation. Use the same seed with the same script to get consistent results across multiple generations.

cfg\_scale

float

No

1.3

CFG (Classifier-Free Guidance) scale for generation. Higher values increase adherence to text but may reduce naturalness. Range typically 1.0–2.0.

## Minimum Request

```
{
  "script": "Speaker 0: Hello, this is a test of the VibeVoice API.",
  "speakers": [
    {
      "preset": "Alice [EN]"
    }
  ]
}
```

## Full Request (all options)

```
{
  "script": "Speaker 0: VibeVoice is now available on Pixazo. Isn't that right, Carter?\nSpeaker 1: That's right Frank, and it supports up to four speakers at once. Try it now!",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ],
  "cfg_scale": 1.3,
  "seed": 42
}
```

## Response

```
{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```

## Request Headers

Header

Value

Content-Type

application/json

Cache-Control

no-cache

Ocp-Apim-Subscription-Key

Your API subscription key

## Response Handling

Common status codes for Text to Speech.

Code

Meaning

202

Accepted — Request queued

400

Bad Request

401

Unauthorized

403

Forbidden

404

Not Found

429

Too Many Requests

500

Internal Server Error

## Response Handling

Common status codes.

Code

Meaning

202

Accepted — Request queued

400

Bad Request

401

Unauthorized

402

Insufficient Balance

403

Forbidden

429

Too Many Requests

500

Internal Server Error

## Error Responses

Queue system errors and model validation errors.

### Queue System Errors

```
// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}
```

```
// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'vibevoice' not found or is disabled"
}
```

### Error via Status/Webhook

```
{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "vibevoice",
  "error": "Description of the error",
  "output": null
}
```

## Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

### Endpoint

```
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
```

## cURL Example

```
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
```

## Response (Completed)

```
{
  "request_id": "vibevoice_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "vibevoice",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}
```

## Response Fields

Field

Type

Description

request\_id

string

Unique request identifier

status

string

QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR

model\_id

string

Model that processed the request

error

string|null

Error message if failed

output.media\_url

array

URLs to generated media (R2 CDN)

output.media\_type

string

MIME type of the output

created\_at

string

When request was created

completed\_at

string|null

When request completed

polling\_url

string

Status URL (initial response only)

## Status Values

Status

Description

QUEUED

Request accepted, waiting to be processed

PROCESSING

Being processed by the model

COMPLETED

Done — output contains the result

FAILED

Failed — check error field

ERROR

System error — not charged

## Status Flow

```
QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR
```

## Typical Workflow

1.  **Send a generate request** to the API endpoint
2.  **Save the `request_id`** from the response
3.  **Poll** every 5-10 seconds: `GET /v2/requests/status/{request_id}`
4.  **When `status` is `"COMPLETED"`**, download from `output.media_url`

**Tip:** Use `X-Webhook-URL` header to get a callback instead of polling.

### Text to Speech (Realtime)

## Base URL

```
https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1
```

## Authentication

All requests require an API key passed via header.

Header

Type

Required

Description

Ocp-Apim-Subscription-Key

string

Yes

Your API subscription key

## VibeVoice-Realtime-0.5B generate request - Vibe Voice-Realtime-0.5B

## Request Code

HTTP Python JavaScript cURL

```
POST https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
}
```

```
import requests

url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
    "speakers": [
        {
            "preset": "Frank [EN]"
        },
        {
            "preset": "Carter [EN]"
        }
    ]
}

response = requests.post(url, json=data, headers=headers)
print(response.json())
```

```
const url = "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request";
const headers = {
  "Content-Type": "application/json",
  "Cache-Control": "no-cache",
  "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
};

fetch(url, {
  method: "POST",
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));
```

```
curl -X POST "https://gateway.pixazo.ai/vibevoice-realtime-0-5b-135/v1/vibevoice-realtime-0-5b-request" \
  -H "Content-Type: application/json" \
  -H "Cache-Control: no-cache" \
  -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
  --data-raw '{
    "script": "Speaker 0: Hello, this is Frank.\\nSpeaker 1: And I am Carter.",
    "speakers": [
      {
        "preset": "Frank [EN]"
      },
      {
        "preset": "Carter [EN]"
      }
    ]
  }'
```

## Output

```
{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```

[Try Now](https://api.pixazo.ai/api-details#api=vibevoice-realtime-0-5b-135&operation=vibevoice-realtime-0-5b-request)

## Webhook (Optional)

Add the `X-Webhook-URL` header to your generate request to receive a POST callback instead of polling.

```
X-Webhook-URL: https://your-server.com/webhook/callback
```

## Request Parameters - VibeVoice-Realtime-0.5B generate request

Field

Type

Required

Default

Description

script

string

Yes

—

The dialogue script with speaker labels (e.g., "Speaker 0: Hello..."). Each line must begin with "Speaker X:" where X is a zero-based index matching the speakers array.

speakers

array of objects

Yes

—

Array of speaker configurations. Each object must contain a preset field identifying the voice.

speakers\[\].preset

string

Yes

—

The predefined voice preset to use for each speaker. Supported presets include: "Frank \[EN\]", "Carter \[EN\]", and other Microsoft TTS voices. Must match available voice identifiers.

cfg\_scale

number

No

1.3

Classifier-free guidance scale, controls how closely the output adheres to the input script and speaker intent. Higher values increase fidelity but may reduce naturalness.

## Minimum Request

```
{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ]
}
```

## Full Request (all options)

```
{
  "script": "Speaker 0: Hello, this is Frank.\nSpeaker 1: And I am Carter.",
  "speakers": [
    {
      "preset": "Frank [EN]"
    },
    {
      "preset": "Carter [EN]"
    }
  ],
  "cfg_scale": 1.3
}
```

## Response

```
{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```

## Request Headers

Header

Value

Content-Type

application/json

Cache-Control

no-cache

Ocp-Apim-Subscription-Key

Your API subscription key

## Response Handling

Common status codes for VibeVoice-Realtime-0.5B generate request.

Code

Meaning

202

Accepted — Request queued

400

Bad Request

401

Unauthorized

403

Forbidden

404

Not Found

429

Too Many Requests

500

Internal Server Error

## Response Handling

Common status codes.

Code

Meaning

202

Accepted — Request queued

400

Bad Request

401

Unauthorized

402

Insufficient Balance

403

Forbidden

429

Too Many Requests

500

Internal Server Error

## Error Responses

Queue system errors and model validation errors.

### Queue System Errors

```
// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}
```

```
// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'vibevoice-realtime-0-5b-135' not found or is disabled"
}
```

### Error via Status/Webhook

```
{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "vibevoice-realtime-0-5b-135",
  "error": "Description of the error",
  "output": null
}
```

## Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

### Endpoint

```
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
```

## cURL Example

```
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
```

## Response (Completed)

```
{
  "request_id": "vibevoice-realtime-0-5b-135_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "vibevoice-realtime-0-5b-135",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/vibevoice-realtime-0-5b-135_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}
```

## Response Fields

Field

Type

Description

request\_id

string

Unique request identifier

status

string

QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR

model\_id

string

Model that processed the request

error

string|null

Error message if failed

output.media\_url

array

URLs to generated media (R2 CDN)

output.media\_type

string

MIME type of the output

created\_at

string

When request was created

completed\_at

string|null

When request completed

polling\_url

string

Status URL (initial response only)

## Status Values

Status

Description

QUEUED

Request accepted, waiting to be processed

PROCESSING

Being processed by the model

COMPLETED

Done — output contains the result

FAILED

Failed — check error field

ERROR

System error — not charged

## Status Flow

```
QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR
```

## Typical Workflow

1.  **Send a generate request** to the API endpoint
2.  **Save the `request_id`** from the response
3.  **Poll** every 5-10 seconds: `GET /v2/requests/status/{request_id}`
4.  **When `status` is `"COMPLETED"`**, download from `output.media_url`

**Tip:** Use `X-Webhook-URL` header to get a callback instead of polling.
