Deepgram Speak (Text-to-Speech) Examples
Mix.install([
{:deepgram, "~> 0.1"},
{:kino, "~> 0.9"},
{:ex_audio, "~> 0.1"} # For playing audio in the notebook
])
Introduction
This notebook demonstrates how to use Deepgram’s Text-to-Speech (Speak) API through the Elixir SDK. We’ll explore:
- Basic speech synthesis
- Voice customization options
- Streaming text-to-speech
- Saving audio to files
Setup
First, let’s set up our Deepgram client with our API key:
api_key_input = Kino.Input.password("Deepgram API Key")
api_key = Kino.Input.read(api_key_input)
client = Deepgram.new(api_key: api_key)
Basic Text-to-Speech Synthesis
Let’s start with a simple example of converting text to speech:
text_input = Kino.Input.textarea("Enter text to synthesize", default: "Hello! This is Deepgram's text to speech API. It sounds very natural.")
text = Kino.Input.read(text_input)
{:ok, audio_data} = Deepgram.Speak.synthesize(
client,
%{text: text},
%{
model: "aura-2-thalia-en", # Thalia voice
encoding: "linear16", # Linear PCM format
sample_rate: 24000 # 24kHz sample rate
}
)
# Save the audio to a temporary file so we can play it
temp_file = "/tmp/deepgram_speech.wav"
File.write!(temp_file, audio_data)
# Create an audio player to listen to the result
Kino.Audio.new(temp_file)
Different Voices
Deepgram offers multiple voices. Let’s try different ones:
voice_options = [
"aura-2-thalia-en",
"aura-2-zeus-en",
"aura-2-athena-en",
"aura-2-apollo-en"
]
voice_input = Kino.Input.select("Select a voice", voice_options)
selected_voice = Kino.Input.read(voice_input)
demo_text = "This is what my voice sounds like. I can read any text you provide in a natural-sounding way."
{:ok, voice_audio} = Deepgram.Speak.synthesize(
client,
%{text: demo_text},
%{
model: selected_voice,
encoding: "linear16",
sample_rate: 24000
}
)
voice_file = "/tmp/deepgram_voice_demo.wav"
File.write!(voice_file, voice_audio)
Kino.Audio.new(voice_file)
Advanced Speech Customization
You can customize various aspects of the synthesized speech:
custom_text = "I can speak with different emotions and tones. Let me demonstrate that for you now."
{:ok, custom_audio} = Deepgram.Speak.synthesize(
client,
%{text: custom_text},
%{
model: "aura-2-thalia-en",
encoding: "linear16",
sample_rate: 24000,
pitch: 1.2, # Higher pitch (> 1.0 is higher, < 1.0 is lower)
speed: 0.9, # Slightly slower speech rate
container: "mp3" # Output as MP3 instead of WAV
}
)
custom_file = "/tmp/deepgram_custom.mp3"
File.write!(custom_file, custom_audio)
Kino.Audio.new(custom_file)
Saving Speech to Files
Instead of handling the audio binary directly, you can save it to a file in one step:
file_path = "/tmp/saved_speech.wav"
{:ok, response} = Deepgram.Speak.save_to_file(
client,
file_path,
%{text: "This audio has been saved directly to a file by the Deepgram SDK."},
%{model: "aura-2-zeus-en"}
)
# Display metadata about the generated audio
response
# Play the saved file
Kino.Audio.new(file_path)
Live Speech Synthesis (Streaming)
For applications requiring real-time speech synthesis, you can use streaming TTS:
# Note: This example is for illustration; in a real application,
# you would implement proper handlers
streaming_example = """
In a real application, you would set up a WebSocket connection like this:
{:ok, websocket} = Deepgram.Speak.live_synthesis(
client,
%{
model: "aura-2-thalia-en",
encoding: "linear16",
sample_rate: 24000
}
)
# Send text chunks as they become available
Deepgram.Speak.WebSocket.send_text(websocket, "Hello, ")
Deepgram.Speak.WebSocket.send_text(websocket, "this is streaming ")
Deepgram.Speak.WebSocket.send_text(websocket, "text-to-speech!")
# Handle audio chunks as they arrive
receive do
{:deepgram_audio, audio_chunk} ->
# Play or save the audio chunk
# For example: append to a file
File.write("stream_output.wav", audio_chunk, [:append])
end
"""
IO.puts(streaming_example)
Speech Synthesis with SSML
Speech Synthesis Markup Language (SSML) provides more control over pronunciation and timing:
ssml_text = """
Hello, I am your virtual assistant.
I can pronounce complex words like anthropomorphic.
I can also speak more slowly or speak more quickly.
And I can raise my pitch or lower my pitch.
"""
{:ok, ssml_audio} = Deepgram.Speak.synthesize(
client,
%{text: ssml_text},
%{
model: "aura-2-apollo-en",
encoding: "linear16",
sample_rate: 24000,
ssml: true # Enable SSML processing
}
)
ssml_file = "/tmp/deepgram_ssml.wav"
File.write!(ssml_file, ssml_audio)
Kino.Audio.new(ssml_file)
Multilingual Text-to-Speech
Deepgram supports multiple languages for speech synthesis:
# Spanish example
{:ok, spanish_audio} = Deepgram.Speak.synthesize(
client,
%{text: "Hola, ¿cómo estás? Espero que tengas un buen día."},
%{
model: "aura-2-carmen-es", # Spanish voice model
encoding: "linear16",
sample_rate: 24000
}
)
spanish_file = "/tmp/deepgram_spanish.wav"
File.write!(spanish_file, spanish_audio)
Kino.Audio.new(spanish_file)
Conclusion
These examples demonstrate the capabilities of Deepgram’s Text-to-Speech API through the Elixir SDK. You can create natural-sounding speech for a wide range of applications, from virtual assistants to accessibility features.
For more information, refer to: