The ChatGPT wave of innovation continues to roll along. OpenAI has released a new version of ChatGPT, version 4.0. The latest version features a host of back-end improvements and several key front-line upgrades, including real-time multi-lingual support, improved intuitive contextual understanding, and an adjustable personality slider that allows users to adjust the tone and communication style of the artificial intelligence (AI).
It is also multi-modal, which is a fancy way of saying that we can submit images to it as part of a chat or request. So, for example, we can draw a picture of a website diagram and ask it to make the website. It will then give us instructions and sample code for creating that website – all based on a drawing.
Along with this upgrade, OpenAI has upgraded the public API offerings. Unfortunately, the ChatGPT 4.0 API is currently in an invitation-only beta phase, with a long waitlist. However, OpenAI has released two new models for developers: ChatGPT 3.5 and Whisper. The ChatGPT 3.5 model features many improvements available through the web user interface – improved language and chat capabilities. Whisper allows for various speech recognition services based on audio files. For example, we can upload an audio file, and Whisper will transcribe or translate the speech in the file into text.
Chatbot, Image, and Transcription Integrations
What does all of this mean in terms of how to use ChatGPT and FileMaker? Basically, it means we have an even better version of the text version of ChatGPT that we can integrate with our custom applications. We’ve included a ChatGPT 3.5 Integration sample file with three features: chatbot, image generator, and audio transcription.
The chatbot is similar to our initial sample file, but we’ve updated the code to use the new ChatGPT 3.5 model, which will provide better results. The image generator is a different AI model from OpenAI that allows users to generate an image based on a text prompt. Finally, the audio transcription service will take a digital audio file and transcribe the speech to text.
To use the integrations, you’ll need your own API key from OpenAI – please see our previous post on how to create one for yourself. Once you have a key, enter it here:
Type your question, task, or instructions into the prompt field to generate a text response, and click submit. The chatbot will respond in the response field.
To generate an image, click the “Create Image” tab.
You can then enter a description of what you want your image to be. “A sunset on the beach in the style of Monet,” for example. We’ve given two output options: a URL or a container. The URL option will open up a browser with the image. The container option will generate and store the image in a container field. Similar to the text prompt, submitting a new prompt will generate a new image.
To transcribe an audio file, click the Transcribe Audio tab, and then click the “Select and Transcribe” button. There is a filter only to allow valid audio file formats, and make sure the file is under 25 MB. Then, you’ll find your transcription in the Response field. In the example below, I read a bit of the first chapter of The Hobbit by J.R.R. Tolkien and recorded it as an m4a. It transcribed what I read word for word.
The sample file is fully open, so you can look at the scripts and see how we’ve formatted the cURL requests. The audio integration was a bit tricky, but our wonderful FileMaker community pointed us in the right direction on how to format the cURL request to upload an audio file from FileMaker to the API.
This technology is creating a lot of excitement within the industry. Generative AI tools like ChatGPT and others continue to innovate, making new and more exciting tools for us to use – and to integrate with FileMaker.