XTTS API - AI Voice Cloning & Text to Speech APIs

by Xtts

XTTS API, developers can clone voices and generate speech in multiple languages while maintaining the cloned voice characteristics. The API is ideal for content localization, personalized voice experiences, and applications requiring custom voice generation across language barriers.

Get API Key

Models Version

LIMITED TIME OFFER

Get $5 Free Credit on First Payment

No strings attached — add funds and get $5 bonus instantly

Claim Your $5 →

v2 Text to Speech API Documentation

https://gateway.pixazo.ai/voice-clone/v1

Authentication

All requests require an API key passed via header.

Header	Type	Required	Description
Ocp-Apim-Subscription-Key	string	Yes	Your API subscription key

Text to Speech Request - XTTS V2 API

Request Code

POST https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
  "text": "Hello! Welcome to our voice cloning service.",
  "language": "en"
}

import requests

url = "https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
    "text": "Hello! Welcome to our voice cloning service.",
    "language": "en"
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

const url = 'https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate';

const data = {
  speaker: 'https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav',
  text: 'Hello! Welcome to our voice cloning service.',
  language: 'en'
};

fetch(url, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Cache-Control': 'no-cache',
    'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'
  },
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));

curl -v -X POST "https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate" \
  -H "Content-Type: application/json" \
  -H "Cache-Control: no-cache" \
  -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
  --data-raw '{
    "speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
    "text": "Hello! Welcome to our voice cloning service.",
    "language": "en"
  }'

Output

{
  "request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Try Now

Webhook (Optional)

Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.

X-Webhook-URL: https://your-server.com/webhook/callback

Request Parameters - Text to Speech Request

Parameter	Required	Type	Description
speaker	Yes	string	URL to speaker audio file (wav, mp3, m4a, ogg, or flv). 3-10 seconds of clear speech recommended
text	No	string	Default: "Hi there, I'm your new voice clone. Try your best to upload quality audio", Text to synthesize (max 500 characters recommended)
language	No	string	Default: "en", Output language code. Supported: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, hu, ko, hi
cleanup_voice	No	boolean	Default: false, Apply denoising to speaker audio. Use for microphone recordings with background noise
webhook	No	string	Default: null, Callback URL for completion notification. POST request sent with results when complete
webhook_events_filter	No	array	Default: [""], Events that trigger webhook. Values: [""] (all), ["completed"] (success/failure only)

Example Request

{
  "speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
  "text": "Hello! Welcome to our voice cloning service.",
  "language": "en"
}

Response

{
  "request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Request Headers

Header	Value
Content-Type	application/json
Cache-Control	no-cache
Ocp-Apim-Subscription-Key	YOUR_SUBSCRIPTION_KEY

Response Handling

Common status codes.

Code	Meaning
202	Accepted — Request queued
400	Bad Request
401	Unauthorized
402	Insufficient Balance
403	Forbidden
429	Too Many Requests
500	Internal Server Error

Error Responses

Queue system errors and model validation errors.

Queue System Errors

// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}

// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'xtts-v2-api' not found or is disabled"
}

Error via Status/Webhook

{
  "request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "xtts-v2-api",
  "error": "Description of the error",
  "output": null
}

Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

Endpoint

GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY

cURL Example

curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Response (Completed)

{
  "request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "xtts-v2-api",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/xtts-v2-api_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}

Response Fields

Field	Type	Description
request_id	string	Unique request identifier
status	string	QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR
model_id	string	Model that processed the request
error	string\|null	Error message if failed
output.media_url	array	URLs to generated media (R2 CDN)
output.media_type	string	MIME type of the output
created_at	string	When request was created
completed_at	string\|null	When request completed
polling_url	string	Status URL (initial response only)

Status Values

Status	Description
QUEUED	Request accepted, waiting to be processed
PROCESSING	Being processed by the model
COMPLETED	Done — output contains the result
FAILED	Failed — check error field
ERROR	System error — not charged

Status Flow

QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR

Typical Workflow

Send a generate request to the API endpoint
Save the request_id from the response
Poll every 5-10 seconds: GET /v2/requests/status/{request_id}
When status is "COMPLETED", download from output.media_url

Tip: Use X-Webhook-URL header to get a callback instead of polling.

v2 Text to Speech API Pricing

Resolution	Price (USD)
All	$0.015