Convert Text Into Audio With Python (text to Speech)

1705 Views, 10 Favorites, 0 Comments

Convert Text Into Audio With Python (text to Speech)

In this instructable, we are going to make a simple text to speech app that converts any plain text that you input into speech. It also stores a speech file which you can use in your projects.

You can use it to send memes to your friends or make memes or even make youtube videos without actually speaking. It has all kinds of applications and uses you can imagine.

The first project that you make with this has a different kind of fun to it. And in a human-friendly programming language like Python, this is even easier to do thanks to various open source libraries already available on the web.

So lets get started.

Supplies

What do you need?

A computer(Laptop)
Speakers
Python installed

Note: If you do not have python installed already, do not worry I will show you how to install python and setup it up correctly for this project.

What will you learn from this?

How to use text to speech in python
Using 'tkinter' library to make labels, buttons and text boxes.
Playing mp3 files in python

Setting Up Python and Libraries

If you do not have python installed already, you will need to install it. First, download the python setup from the following link: Download Python | Python.org

Open the setup after dowloading, the further steps to install are just simple next-next click steps.

Note: For this project, I am using python 3.9.7, so it would be best if you get that one from the website.

Now, moving on we need to install the following libraries for this project:

gtts (for text to speech conversion)
playsound (for playing the mp3 file)

To install the libraries, open up a command prompt window, now type in

pip3 install gtts playsound

Once the package is installed, the command windows will look something like in the image, don't worry about any similar warning like in the picture.

Importing Libraries

Now open up the IDLE that comes installed with python. And click File>New File, this will open up another window. This is where we will be writing our code.

First of all, we need to import all the libraries we need for the app. This forms the very top of our script. Put the following at the top of the script.

from tkinter import * 
from tkinter.ttk import *
from gtts import gTTS
from playsound import playsound

At the top, we are importing everything(*) from the tkinter library which is what we are using for making the graphical user interface of our app.

Then, in the next line, we import everything(*) from tkinter.ttk which is a thing in the tkinter library.

The next two lines import the gTTS object and the playsound function from the playsound library.

Creating a Window

Now, copy the path of the folder/location where you will be saving the python script. In my case it is "C:\\Users\\Aparajita\\Documents\\speech.mp3", also do not forget to put two slashes instead of one as python will not accept one slash in the path and will throw out an error.

Now, we define a constant "PATH" which will store the path for our speech.mp3 file. Add the following code in the script:

PATH = "C:\\Users\\Aparajita\\Documents\\speech.mp3"

Now, we define the window with with a Tk() object and set the window title bar and the window geometry and background colour. Hence, add the following code:

root = Tk()
root.title("Text to Speech")
root.geometry("400x200")
root.config(bg='#4fe3a5')

Now, add the last line of code for this step.

root.mainloop()

This will run the window and prevent it from closing immediately. .

You can test this code by running the script with "F5" key.

Adding a Label, Textbox and Button

Now, lets make our application look useful by adding some content. Add the following code to the script just before root.mainloop()

label = Label(root,
       text ="Enter text:",
       font= ('Helvetica 15 bold')).place(relx=0.5,rely=0.1,anchor=CENTER)

Here, are defining a Label object that will be displayed on the window saying "Enter text: ", we bind it to the root.i.e.our main window. Here, text and font are attributes which the Label() object needs in order to create the label on the window. Finally, we place the label at a relative position (0.5, 0.1) using .place() method.

Similarly, we add the textbox and the speak button by adding the following code to the script after the code for label and just before root.mainloop()

speech_entry = Entry(root,
          textvariable = txt,
          font=('calibre',10,'normal')).place(relx=0.5,rely=0.3,anchor=CENTER)
speak_btn = Button(root,text = 'Speak',
         command = make_speech).place(relx=0.5,rely=0.5,anchor=CENTER)

The final code for this step should look like this:

from tkinter import * 
from tkinter.ttk import *
from gtts import gTTS
from playsound import playsound

#will be used for storing speech.mp3 
PATH = "C:\\Users\\Aparajita\\Documents\\speech.mp3"

#make the window object
root = Tk()
root.title("Text to Speech")
root.geometry("400x200")
root.config(bg='#4fe3a5')

#create a label
label = Label(root,
       text ="Enter text:",
       font= ('Helvetica 15 bold')).place(relx=0.5,rely=0.1,anchor=CENTER)
#create a textbox
speech_entry = Entry(root,
          textvariable = txt,
          font=('calibre',10,'normal')).place(relx=0.5,rely=0.3,anchor=CENTER)
#create a button
speak_btn = Button(root,text = 'Speak',
         command = make_speech).place(relx=0.5,rely=0.5,anchor=CENTER)

#keep the window running
root.mainloop()

Function for Speech

Now, for the final step, add the following code just after the lines where we have created the window using Tk() object.

txt = StringVar()

This creates a StringVar() object that actually acts as our placeholder for the text that we will get from the textbox.

In the previous step, you may have noticed that we had an attribute 'command = make_speech' in the Button() object, actually that was the function that will get called when we press the button, but we haven't yet defined the function. So we will define it just after the lines where we import all the libraries.

def make_speech():
  tts = gTTS(txt.get())
  tts.save(PATH)
  playsound(PATH)

Here, we are defining a function called make_speech that actually converts text into speech. To do this, we are creating a gTTS() object and passing the value of txt by txt.get() method and saving the retuned value to tts. Next, we use the tts.save() method to save the file as mp3 to our PATH. At last, we play the mp3 by using the playsound function.

This step completes the project. You can test the code by running it with "F5" key.

I have given the complete code as a file below for reference.

Downloads

app.py