I’ve been playing around more with AI tools, and recently discovered a fun way to use your voice to talk to a “conversational AI”. The cool thing about it, is that it can respond back to you using its own voice. This uses (NLP) natural language processing, which is the ability for computers to understand meaning from spoken language.
I’ve been using the “text-generation-webui” by “oobabooga”, which is a “gradio” web UI for running “Large Language Models”. Once installed and configured, it’s a really fun way to experiment with AI voice chat, and the interface comes with a selection of built-in extensions too use.
If you’re serious about doing any kind of AI related stuff on your home PC, then you’re going to want to ensure that your PC is up to it. For anything larger than the models I’m using in this guide - you will need a fast CPU and graphics card with at least 12 GB or more VRAM.
If like me however, you are still rocking that trusty old PC which has been chugging along with the same hardware since 2015 - so long as it has an NVIDIA 9xx series graphics card and higher - this guide may help you dip your toe into the ocean of AI chat.
For this example, I’m using:
- CPU: Intel 4790K
- RAM 16 GB of DDR3
- GFX: ASUS NVIDIA GTX 970 graphics card with 4 GB of VRAM.
- 20 TB storage (SSD/HD).
This is hardly the ideal system hardware to do any kind of AI related work - as it just isn’t powerful enough. If you’re using newer hardware and, more importantly, a graphics card with oodles of VRAM, you’ll have a far better experience.
The “oobabooga/text-generation-webui” can be quite “fiddly” to set up, and, on top of that, depending on your particular OS and hardware, you may have to adjust your installation from the one given in my example.
Some people have had success using the one-click-installer scripts to get things going. Sadly, I ran into some issues with these, especially when it comes to Python packages being installed. Therefore, I chose to install this manually using Miniconda to manage its dependencies.
In order to prevent dependency hell when it comes to installing all the necessary Python packages, I’ll be using Miniconda, for managing my the Python packages, dependencies and environments.
To install Miniconda, open a terminal and enter:
curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh"
Welcome to Miniconda3 py311_23.5.2-0 In order to continue the installation process, please review the license agreement. Please, press ENTER to continue
Press enter at the prompt, then scroll down to the end of the licence text, then type
yes, then press Return and
yes to further prompts to download and complete the installation of Miniconda.
This will create a folder named
miniconda3 in your
$HOME folder. Don’t delete this as it contains the files needed to manage your environments.
📝 Close your current terminal window. Then open a new terminal window.
Creating a conda environment
After you have opened a new terminal window, you’ll see that your shell prompt shows the
base conda environment.
You can check for which environments are available by typing:
conda env list
# conda environments: # base * /home/supa/miniconda3
We need to create a new conda environment for the text-generation-webui. This will allow for any Python packages or dependencies we install to be isolated for that project, and not part of the base environment.
Create the textgen environment, install Python version 3.10, and activate the environment:
conda create -n textgen python=3.10.9 conda activate textgen
You should now see that the command prompt has changed to show that we are in the conda environment we just created, named “textgen”.
conda activate name command is important to keep in mind when installing or upgrading dependencies later on. Check your shell prompt to ensure you are in the correctly named conda environment.
Next we need to install Pytorch. Since I’m using Arch Linux and an NVIDIA graphics card, I will be installing Pytorch with the following command:
# for Linux plus NVIDIA only pip3 install torch torchvision torchaudio
This will take a few seconds to install, so be patient.
Installing the web UI
We now need to download the oobabooga/text-generation-webui repository using git, then change directory to that project, and install the Python packages required for it to run.
git clone https://github.com/oobabooga/text-generation-webui cd text-generation-webui pip install -r requirements.txt
Install Voice Extensions
The “whisper_stt” extension allows you to talk to the bot using your microphone. It converts your spoken words into text and sends it the AI bot to respond to.
To install its dependencies - from within your
text-generation-webui/ folder type:
cd extensions/whisper_stt/ pip install -r requirements.txt
The SileroTTS extension allows your bot to talk to you using voice. Let’s change directory to SileroTTS folder and install its dependencies too:
cd ../silero_tts/ pip install -r requirements.txt
I’ve installed these two extensions to get voice working for now. However, there are more extensions that you may wish to check out. If you are interested, you can find them here.
Once pip has finished installing the
requirements.txt, go back to the projects main directory:
Downloading a model
In order for the text-generation-webui to work, you will need to install a model to the
folder. Some trained chat models can be over 5 GB in size, so ensure you have plenty of free disk space to save any additional ones at a later date.
download-model.py script to download new models. Although there is a section on the web UI to do this, you can download multiple models from your terminal at the same time, which is way faster.
Once you find a model you’d like to test on the Hugging Face website - left-click on the copy icon next to the models name and middle-mouse-button paste it into your terminal after the
Then, enter the model name into the terminal like this:
python download-model.py facebook/opt-1.3b
This model is about 2.6 GB in size, so will take a minute or to download. For those of you with lots of VRAM and a superpowered PC, check out larger models on the Hugging Face website.
Starting the web UI
Every time you want to use the text generation webui, ensure you activate the conda environment first, then run it by typing:
conda activate textgen cd text-generation-webui python server.py --chat --extensions silero_tts whisper_stt
Once the model has loaded, you’ll be provided with a URL in your terminal which you can left-clock to open the UI in your web browser.
➜ python server.py 2023-07-16 20:44:22 INFO:Loading facebook_opt-1.3b... 2023-07-16 20:44:32 INFO:Loaded the model in 10.47 seconds. # CLICK THIS URL TO START THE WEB UI Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`.
If you started the web UI from just clicking the URL in the terminal you can switch to Chat mode by clicking on the Session tab and change the Mode to chat
Now, press the “Apply and restart” button. If you enter some text into the chat prompt, you’ll see that the bot will respond to you.
Using your voice
Scroll down on the chat page until you see the Whisper STT section. There will be a button there which says “Record from microphone”, press this and talk to your AI bot. Once you have finished talking, press the same button and your voice will be sent to chat.
The first time you talk, the software will download a file in the background, but after a few seconds, your AI bot should reply to you using its synthesised voice.
You can explore more models to choose from over at the Hugging Face website. There is an overwhelming amount of models to choose from, suited to every use case and hardware. I normally look for “Conversational” models to try. Just be aware that some models are very large, and your system may not be able to run them as some need huge amounts of VRAM to run.
Some chat models are NSFW, and are intended for role playing, so be aware of this, otherwise conversations with NSFW or uncensored models can quickly turn, well, interesting! 😉😘
On my low powered PC, I can run the “TheBloke/guanaco-7B-GGML” model, and although a little slow, it does provide a more interactive experience over the facebook/opt-1.3b model, so you might want to check it out.
I’ve tried the TheBloke/Guanaco-7B-SuperHOT-8K-GGML model, and it does work with these launch parameters, although response times can be slow, from 20 to 35 seconds on my system for a response:
python server.py --chat --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf --load-in-8bit --auto-devices --no-stream --gpu-memory 3500MiB --extensions silero_tts whisper_stt
You can change to a different model by clicking on the Model tab and selecting a new one which you have installed in the
models folder from here.
If you have added a new model while the web UI is running - simply push the blue recycle icon to refresh any available models.
Out of Memory Errors
If you are on a low-end PC with low VRAM, you can try to launch the web UI with this set of commands.
python server.py --load-in-8bit --auto-devices --gpu-memory 3500MiB --extensions="silero_tts whisper_stt"
A list of what these arguments do can be found on the documents page.
This should reduce CUDA out of memory errors and crashes on low VRAM GPU’s.
This little project has been really fun to play with, and I hope it works for you too and you are having some interesting conversations with the AI bot.
Whenever you want to use the oobabooga/text-generation-webui, remember to do the following:
From a terminal:
conda activate textgen cd /path/to/your/text-generation-webui/ python server.py --chat --extensions silero_tts whisper_stt
Have you found a good chat model to use on a local PC? If so, drop me an email to let me know, and I’ll be sure to check it out. 👍