XTTS API - AI Voice Cloning & Text to Speech APIs
by Xtts
XTTS API, developers can clone voices and generate speech in multiple languages while maintaining the cloned voice characteristics. The API is ideal for content localization, personalized voice experiences, and applications requiring custom voice generation across language barriers.

Models Version
LIMITED TIME OFFER
Get $5 Free Credit on First Payment
No strings attached — add funds and get $5 bonus instantly
v2 Text to Speech API Documentation
https://gateway.pixazo.ai/voice-clone/v1
Authentication
All requests require an API key passed via header.
| Header | Type | Required | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | string | Yes | Your API subscription key |
Text to Speech Request - XTTS V2 API
Request Code
POST https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
{
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}
import requests
url = "https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate"
headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = 'https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate';
const data = {
speaker: 'https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav',
text: 'Hello! Welcome to our voice cloning service.',
language: 'en'
};
fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Cache-Control': 'no-cache',
'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'
},
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
curl -v -X POST "https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}'
Output
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Webhook (Optional)
Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.
X-Webhook-URL: https://your-server.com/webhook/callback
Request Parameters - Text to Speech Request
| Parameter | Required | Type | Description |
|---|---|---|---|
| speaker | Yes | string | URL to speaker audio file (wav, mp3, m4a, ogg, or flv). 3-10 seconds of clear speech recommended |
| text | No | string | Default: "Hi there, I'm your new voice clone. Try your best to upload quality audio", Text to synthesize (max 500 characters recommended) |
| language | No | string | Default: "en", Output language code. Supported: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, hu, ko, hi |
| cleanup_voice | No | boolean | Default: false, Apply denoising to speaker audio. Use for microphone recordings with background noise |
| webhook | No | string | Default: null, Callback URL for completion notification. POST request sent with results when complete |
| webhook_events_filter | No | array | Default: ["*"], Events that trigger webhook. Values: ["*"] (all), ["completed"] (success/failure only) |
Example Request
{
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}
Response
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Request Headers
| Header | Value |
|---|---|
| Content-Type | application/json |
| Cache-Control | no-cache |
| Ocp-Apim-Subscription-Key | YOUR_SUBSCRIPTION_KEY |
Response Handling
Common status codes.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 402 | Insufficient Balance |
| 403 | Forbidden |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Error Responses
Queue system errors and model validation errors.
Queue System Errors
// 402 — Insufficient balance
{
"error": "Insufficient Balance",
"message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
"error": "Model not found",
"message": "Model 'xtts-v2-api' not found or is disabled"
}
Error via Status/Webhook
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "xtts-v2-api",
"error": "Description of the error",
"output": null
}
Retrieving Results
Poll the universal status endpoint to check progress and retrieve results.
Endpoint
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
cURL Example
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
"https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Response (Completed)
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "xtts-v2-api",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/xtts-v2-api_019dxxxx-xxxx/output.ext"
],
"media_type": "application/octet-stream"
},
"created_at": "2026-03-31T10:00:00.000Z",
"updated_at": "2026-03-31T10:00:15.000Z",
"completed_at": "2026-03-31T10:00:15.000Z"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| request_id | string | Unique request identifier |
| status | string | QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR |
| model_id | string | Model that processed the request |
| error | string|null | Error message if failed |
| output.media_url | array | URLs to generated media (R2 CDN) |
| output.media_type | string | MIME type of the output |
| created_at | string | When request was created |
| completed_at | string|null | When request completed |
| polling_url | string | Status URL (initial response only) |
Status Values
| Status | Description |
|---|---|
| QUEUED | Request accepted, waiting to be processed |
| PROCESSING | Being processed by the model |
| COMPLETED | Done — output contains the result |
| FAILED | Failed — check error field |
| ERROR | System error — not charged |
Status Flow
QUEUED → PROCESSING → COMPLETED
→ FAILED
→ ERROR
Typical Workflow
- Send a generate request to the API endpoint
- Save the
request_idfrom the response - Poll every 5-10 seconds:
GET /v2/requests/status/{request_id} - When
statusis"COMPLETED", download fromoutput.media_url
Tip: Use X-Webhook-URL header to get a callback instead of polling.
v2 Text to Speech API Pricing
| Resolution | Price (USD) |
|---|---|
| All | $0.015 |