This basic AI chat-bot experiment uses Speech-to-Text technology
and some clever prompting of an OpenAI LLM to generate characterized
responses, which are then piped out to a Text-to-Speech engine,
with a lip-sync visualization.
I've found it to be an entertaining distraction to start conversations
with the AI, change up characters, and build out fake "Podcasts" from
the exported audio. Check the Examples tab for links to
a few of them
Grant the Microphone Permission when the page loads. Without it, there
is no other way to interface with the AI.
Check the Input tab to make sure your microphone and language settings
are correct. The input language doesn't matter too much, but it's helpful
in the context I originally wrote this demo: building conversation
practice tools for learning foreign languages.
Use the Model tab to select which of OpenAI's Large Language Models
to use for text generation. There is also an Additional prompt field
in which you can provide extra instructions on the conversation, like character
background notes, or situational details.
Use the Output tab to change characters and output language.
Click the Start button to begin recording speech. The app
attempts to detect quiet spaces around your utterances, so you don't
need to click Stop in between each of your prompts.
However, if you're in a noisy environment, or your speakers are turned
up too loud and your microphone ends up hearing the generated speech
and interprets it as your own speech, you can use the "Start/stop"
button to pause recording when you're done talking to avoid erroneous
prompt recordings
The Reprompt button will force another reply from the AI without
requiring your verbal input.
The Export button will concatenate all of the audio clips, both
your own and the AI's generated clips, into a single audio file,
which you can then do with as you wish.