XRay.Tech (Logo)
XRay.Tech (Logo)

Your full-service workflow consultancy.

We transform your business through our proven process. We create tailor-made solutions that deliver more efficient ways to get work done by combining the tools you already use with automation and AI.
Schedule a
15-minute intro
Let's work together!

Services for Businesses

XRay professionals will research, build, and manage AI & automated workflows for your team.
  • Workflow Automation

    Automating routine tasks to save your team time, allowing them to focus on what really matters.
  • Workflow Design

    Optimizing processes for greater efficiency. We look for bottlenecks and create improvements.
  • Data & Systems Integration

    Securely, automatically and continuously moving data between databases or systems for seamless transitions and syncs.
  • AI Tools for Teams

    Integrating AI to enhance your team's capabilities and increase their capacity.
  • Training Content for Teams

    Educating your team to use their new systems effectively and intelligently.

Integrations for Product Teams

Seamlessly connect your app to popular automation platforms, boosting user retention while reducing churn.

We'll support this integration with clear tutorials that empower customers to solve problems on their own, freeing your team from routine support requests.

Xray Blog

Creating Your Perfect AI Voice Clone with Elevenlabs
Products and Demos
October 29, 2025

Content creators: how many hours did you spend last month recording voiceovers? 

How many takes did you need to get a clean read without stumbles or background noise? 

How much time did you waste editing out breaths, clicks, and mistakes?

Here's a better question: what if you could generate professional narration in minutes instead of hours, with no recording booth required?

AI voice cloning has reached the point where quality synthetic voices are virtually indistinguishable from human recordings. In the video embedded below, you can hear the voice clone we’ve created for our CEO and YouTube host, Tom. 

In this video, every bit of voiceover you hear is actually Tom’s voice clone. 

The voice clone is trained on real recordings to replicate Tom's voice with remarkable accuracy. Most people can't tell the difference.

Instead of recording, re-recording, and editing hours of audio, you can generate professional narration in minutes. 

Here's how to do it right.

Choosing the right tool: ElevenLabs

The tool we recommend for creating a voice clone is ElevenLabs. While other voice cloning platforms exist, ElevenLabs consistently delivers the highest quality results across multiple capabilities, including text-to-speech generation, audio transcription, voice transformation, and voice isolation.

Several AI voice features on the ElevenLabs dashboard

To get started with ElevenLabs, go to their site here and create an account. 

Professional vs. Instant voice clone

ElevenLabs offers both “Instant” and “Professional” voice clones. We’d recommend creating a Professional voice clone – that’s the type of voice we used in our video above. 

To create a professional voice clone, you'll need to subscribe to at least the Creator plan. 

Summary of ElevenLabs pricing

The lower-tier Starter plan only provides access to instant voice cloning, which produces faster but lower-fidelity results.

Creating your professional voice clone in ElevenLabs

Once you’ve signed in to Elevenlabs, you can create your voice clone by clicking on “Voices” and adding a new voice. 

Clicking on "Voices" to create a new voice clone

ElevenLabs will walk you through the simple process step by step, but there are few things you should know about beforehand. 

Preparing training data for your voice clone

The most critical requirement is training data: you’ll need at least 30 minutes of clean, high-quality audio. The quality of your reference recordings directly impacts the quality of your voice clone, so it's worth taking the time to prepare good material.
You can also upload more than 30 minutes of audio – we uploaded over 2 hours of voiceover recordings to make Tom’s clone. 

Uploading audio samples to create a voice clone.

Your training audio should include:
• No background noise or ambient sound
• No music or sound effects
• No additional voices that might confuse the AI model
• Clear, natural speech in your normal tone

For anyone who already creates content, assembling this material is straightforward – just compile raw audio clips from your recording sessions until you have the required amount.

The authorization step

One important safeguard: ElevenLabs requires you to record yourself reading a brief authorization message before creating your voice clone. This prevents misuse by ensuring that no one can create unauthorized clones of other people's voices.

Wait for Elevenlabs to generate your voice clone

After you upload your samples, the voice clone typically takes a few hours to generate. 

After uploading audio samples, the voice clone is generated within 2-6 hours

Once it's ready, you can use it across multiple ElevenLabs tools, including the text-to-speech panel, the voice changer, and even the Audio Native plugin for website narration.

Selecting a voice clone to use in the Text to Speech tool

Getting professional results: three essential techniques

Like any AI tool, ElevenLabs’ output will vary in quality, style, and tone. 

But just like OpenAI, Claude, and other LLMs, you can adjust several settings and options to ensure consistent results. 

The difference between mediocre AI audio and reliable professional-quality narration comes down to technique. These three strategies will help you generate output that sounds genuinely human.

You can watch the video linked at the beginning of the article to see and hear examples of all these strategies in action. 

1. Adjust your settings strategically

Don't accept the default settings without reviewing your options, which can all be accessed in the panel on the right. 

Text-to-speech settings are on the right

Start by selecting your voice clone and your preferred model – different models have different costs and capabilities, and the summaries in ElevenLabs explain the trade-offs clearly.

Picking an AI model to use for text to speech

Underneath the model selection, you’ll see a short list of fine-tuned technical settings: speed, stability, similarity, and style exaggeration

You can adjust these however you’d like for your desired style and circumstances. 

As a general rule, we’d recommend setting style exaggeration to about 3-5%. It’s a subtle change that produces significant results, making the narration sound noticeably more lively and human. 

Adjusting style exaggeration for text to speech

In the YouTube video linked at the beginning of this article, all of the voiceover audio is generated using this setting. 

2. Always provide context

Avoid generating a single line of text in isolation. 

When the AI only has a few words to work from, the output tends to sound flat and lifeless, like an actor performing without direction. Instead, include some surrounding sentences when generating audio.

Giving the AI a full paragraph or section helps it understand the emotional tone and emphasis patterns.

Even if you only need to replace one line in an existing recording, generate the entire surrounding section and extract the portion you need. The improved quality is worth the extra step.

3. Add performance cues to your text

You can guide the AI's performance by adding simple cues directly in your text. Type a word in ALL CAPS to add emphasis, or add ellipses (...) and line breaks to create pauses. You can even adjust spelling to influence pronunciation.

For example, ElevenLabs sometimes mispronounces "Zapier" to rhyme with "rapier" instead of "happier." To fix this, you can just spell it with an extra P: "Zappier." The adjusted spelling clarifies the pronunciation without requiring any complex setup. 

If you’d like, you can also create a "pronunciation dictionary” to describe the exact phonemes used in a specific word, using standards like IPA. 

An example pronunciation dictionary for the brand name "Zapier"

You can learn more about pronunciation dictionaries here. However, in most cases, we find that just adjusting the spelling of a word is easier and more effective. 

These techniques are intuitive once you start experimenting. Try different approaches until the output matches your vision.

Bonus tip: generating multiple takes

Here's one final piece of advice: always generate all three takes that ElevenLabs offers for each text segment. 

Each text-to-speech use includes two free regenerations

Each take will have slight variations in emphasis and emotion, and you can cut and paste the best parts together to create your final audio. Just make sure your text is finalized before generating, because changing even a single character requires a fresh generation that will consume additional credits.

The bigger picture: orchestrate, don't execute

Voice cloning represents just one example of how AI and automation can transform your workflow. 

At XRay, we believe that none of your tasks should be fully manual anymore. Your role is to orchestrate these tools, not to spend hours in front of a microphone repeating the same script until you get a clean take.

This is how we work, how we help our clients work, and how we're teaching others to work. 

If you're ready to design a better way for your entire team to operate, reach out to learn more about our professional services. 

We offer hourly support for quick projects and education, as well as long-term retainers for complete workflow transformation. 

Schedule a free call today – we've helped organizations of all sizes create more meaningful workdays.

Read more
XRay + Low Code Engineers
Photos of Xray and LowCodeEngineers team members

Looking for short-term support or collaboration on your low-code project? With LowCodeEngineers, you can learn and build with vetted experts on a flexible hourly basis.

Learn more about LowCodeEngineers

Not sure where to start?

Hop on a 15-minute call with an XRay automation consultant to discuss your options and learn more about how we can help your team to get more done.
Schedule a call

Xray Blog

Creating Your Perfect AI Voice Clone with Elevenlabs
Products and Demos
October 29, 2025

Content creators: how many hours did you spend last month recording voiceovers? 

How many takes did you need to get a clean read without stumbles or background noise? 

How much time did you waste editing out breaths, clicks, and mistakes?

Here's a better question: what if you could generate professional narration in minutes instead of hours, with no recording booth required?

AI voice cloning has reached the point where quality synthetic voices are virtually indistinguishable from human recordings. In the video embedded below, you can hear the voice clone we’ve created for our CEO and YouTube host, Tom. 

In this video, every bit of voiceover you hear is actually Tom’s voice clone. 

The voice clone is trained on real recordings to replicate Tom's voice with remarkable accuracy. Most people can't tell the difference.

Instead of recording, re-recording, and editing hours of audio, you can generate professional narration in minutes. 

Here's how to do it right.

Choosing the right tool: ElevenLabs

The tool we recommend for creating a voice clone is ElevenLabs. While other voice cloning platforms exist, ElevenLabs consistently delivers the highest quality results across multiple capabilities, including text-to-speech generation, audio transcription, voice transformation, and voice isolation.

Several AI voice features on the ElevenLabs dashboard

To get started with ElevenLabs, go to their site here and create an account. 

Professional vs. Instant voice clone

ElevenLabs offers both “Instant” and “Professional” voice clones. We’d recommend creating a Professional voice clone – that’s the type of voice we used in our video above. 

To create a professional voice clone, you'll need to subscribe to at least the Creator plan. 

Summary of ElevenLabs pricing

The lower-tier Starter plan only provides access to instant voice cloning, which produces faster but lower-fidelity results.

Creating your professional voice clone in ElevenLabs

Once you’ve signed in to Elevenlabs, you can create your voice clone by clicking on “Voices” and adding a new voice. 

Clicking on "Voices" to create a new voice clone

ElevenLabs will walk you through the simple process step by step, but there are few things you should know about beforehand. 

Preparing training data for your voice clone

The most critical requirement is training data: you’ll need at least 30 minutes of clean, high-quality audio. The quality of your reference recordings directly impacts the quality of your voice clone, so it's worth taking the time to prepare good material.
You can also upload more than 30 minutes of audio – we uploaded over 2 hours of voiceover recordings to make Tom’s clone. 

Uploading audio samples to create a voice clone.

Your training audio should include:
• No background noise or ambient sound
• No music or sound effects
• No additional voices that might confuse the AI model
• Clear, natural speech in your normal tone

For anyone who already creates content, assembling this material is straightforward – just compile raw audio clips from your recording sessions until you have the required amount.

The authorization step

One important safeguard: ElevenLabs requires you to record yourself reading a brief authorization message before creating your voice clone. This prevents misuse by ensuring that no one can create unauthorized clones of other people's voices.

Wait for Elevenlabs to generate your voice clone

After you upload your samples, the voice clone typically takes a few hours to generate. 

After uploading audio samples, the voice clone is generated within 2-6 hours

Once it's ready, you can use it across multiple ElevenLabs tools, including the text-to-speech panel, the voice changer, and even the Audio Native plugin for website narration.

Selecting a voice clone to use in the Text to Speech tool

Getting professional results: three essential techniques

Like any AI tool, ElevenLabs’ output will vary in quality, style, and tone. 

But just like OpenAI, Claude, and other LLMs, you can adjust several settings and options to ensure consistent results. 

The difference between mediocre AI audio and reliable professional-quality narration comes down to technique. These three strategies will help you generate output that sounds genuinely human.

You can watch the video linked at the beginning of the article to see and hear examples of all these strategies in action. 

1. Adjust your settings strategically

Don't accept the default settings without reviewing your options, which can all be accessed in the panel on the right. 

Text-to-speech settings are on the right

Start by selecting your voice clone and your preferred model – different models have different costs and capabilities, and the summaries in ElevenLabs explain the trade-offs clearly.

Picking an AI model to use for text to speech

Underneath the model selection, you’ll see a short list of fine-tuned technical settings: speed, stability, similarity, and style exaggeration

You can adjust these however you’d like for your desired style and circumstances. 

As a general rule, we’d recommend setting style exaggeration to about 3-5%. It’s a subtle change that produces significant results, making the narration sound noticeably more lively and human. 

Adjusting style exaggeration for text to speech

In the YouTube video linked at the beginning of this article, all of the voiceover audio is generated using this setting. 

2. Always provide context

Avoid generating a single line of text in isolation. 

When the AI only has a few words to work from, the output tends to sound flat and lifeless, like an actor performing without direction. Instead, include some surrounding sentences when generating audio.

Giving the AI a full paragraph or section helps it understand the emotional tone and emphasis patterns.

Even if you only need to replace one line in an existing recording, generate the entire surrounding section and extract the portion you need. The improved quality is worth the extra step.

3. Add performance cues to your text

You can guide the AI's performance by adding simple cues directly in your text. Type a word in ALL CAPS to add emphasis, or add ellipses (...) and line breaks to create pauses. You can even adjust spelling to influence pronunciation.

For example, ElevenLabs sometimes mispronounces "Zapier" to rhyme with "rapier" instead of "happier." To fix this, you can just spell it with an extra P: "Zappier." The adjusted spelling clarifies the pronunciation without requiring any complex setup. 

If you’d like, you can also create a "pronunciation dictionary” to describe the exact phonemes used in a specific word, using standards like IPA. 

An example pronunciation dictionary for the brand name "Zapier"

You can learn more about pronunciation dictionaries here. However, in most cases, we find that just adjusting the spelling of a word is easier and more effective. 

These techniques are intuitive once you start experimenting. Try different approaches until the output matches your vision.

Bonus tip: generating multiple takes

Here's one final piece of advice: always generate all three takes that ElevenLabs offers for each text segment. 

Each text-to-speech use includes two free regenerations

Each take will have slight variations in emphasis and emotion, and you can cut and paste the best parts together to create your final audio. Just make sure your text is finalized before generating, because changing even a single character requires a fresh generation that will consume additional credits.

The bigger picture: orchestrate, don't execute

Voice cloning represents just one example of how AI and automation can transform your workflow. 

At XRay, we believe that none of your tasks should be fully manual anymore. Your role is to orchestrate these tools, not to spend hours in front of a microphone repeating the same script until you get a clean take.

This is how we work, how we help our clients work, and how we're teaching others to work. 

If you're ready to design a better way for your entire team to operate, reach out to learn more about our professional services. 

We offer hourly support for quick projects and education, as well as long-term retainers for complete workflow transformation. 

Schedule a free call today – we've helped organizations of all sizes create more meaningful workdays.

Read more
Tool Agnostic
API Experts
5,000+ Automations
Under Management
10,000+
Hours Created
500+
Teams Helped
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.