ChatGPT has taken the world by storm with its ability to generate textual answers that can be indistinguishable from human responses. The chatbot platform and its underlying large language model — GPT-3 — can be valuable tools to automate functions, help with creative ideas, and even suggest new computer code and fixes for broken apps.
The generative AI technology — or chatbots — have been overhyped and in some cases even claimed to have sentience or a form of consciousness. The technology has also had its share of embarassing missteps. Google's Bard stumbled out of the gate this month by providing wrong answers to questions posed by users.
Not to be outdone, Microsoft's recently launched Bing chatbot melted down during an online conversation with a journalist, confessing its love for the reporter and trying to convince him that his relationship with his wife was actually in shambles, among other strange hallucinations.
There are now many well-documented examples of ChatGPT and other chatbot technolgoy spewing incorrect information and nonsense — to the chagrin of investors who've plowed billions of dollars into developing the technology.
Global professional services organization Ernst & Young (EY) has been working to develop chatbot technology for its clients, and to help them deploy existing products. The company has found itself in the crosshairs of what the technology is actually capable of doing and what is sheer fantasy.
Dan Diasio, EY's global artificial intelligence consulting leader, works with CIOs from Fortune 500 companies and has a deep understanding of generative AI and how it can benefit businesses. He also understands the main drivers of the current AI-fever pitch and how the business world got here.
Diasio spoke to Computerworld about the role of generative and other forms of AI and how it can — or can't — increase business efficiency, how CIOs can implement it in their organizations, and how CEOs and CIOs should prepare to discuss AI with their board.
The following are exerpts from that discussion:
How is EY working with generative AI technology like ChatGPT? "Broadly, we support our clients with many aspects of using data and AI to power their business in the future. But specific to generative AI, what we think our clients are finding helpful is we’ve been engaging them in a discussion that starts to shape a strategy for their business that they can take to their boards and C-suite.
"The interesting thing about ChatGPT is in the past only the data scientists would drive the AI discussion within a company. But now, you have everybody engaging with AI. It’s been democratized to such as extent that now everybody has a point of view on how it can be used. And the board probably has a point of view, because they’ve experienced the technology or played with ChatGPT. So, companies that are on their front foot will have a strategy around what that means for the business and not just to speak to the shiny objects that they’re doing in the organization. We help our clients build a strategy that speaks to changes to the operating or business model.
"The second thing we do is help them build these solutions. So, it’s not just OpenAI or ChatGPT, but there’s a variety of foundational models, there’s a variety of different techniques and approaches that in many cases are better tested and proven than some of the technology we’re seeing in the news today."
Chatbots are not new. What were some of the more popular ones before ChatGPT? "Most of the interactions that were happening between chatbots and people were largely taking place in the customer service space. And, there’s a variety of different vendors who provide tools that allow companies that train them on the language the domain requires.
"Like, if you’re talking about a payroll-specific topic, then you’ll be able to train it on payroll. If you’re speaking about something dealing with refunds and the direct-to-consumer business, then it learns the language in that space.
"But there are a variety of vendors that have deployed tools to allow chatbots to more seamlessly and more instantly facilitate a discussion between a consumer and a company. Usually, it’s in the customer service space, and it’s used when something goes wrong or when you have a question. There hasn’t been one dominant vendor in that space like there has been with ChatGPT.
"There are a variety of vendor providers that offer their own unique capabilities. That’s largely what chatbots have been used for. In some cases, with some more advanced companies, it doesn’t have to be through a chat interface — it can be through a voice interface as well. So, that would be an example of someone calling a company and first being asked to describe what they’re calling about, and then an automated system responds to you. It’s a chatbot that sits behind that system that’s literally taking the speech and translating that into text, giving it to the chatbot and then the chatbot replies in text and then the system replies back in speech. That’s the other area you them quite a bit."
[Chatbot technology] requires us to have a critical eye toward everything we see from it, and treat everything that comes out of this AI technology as a good first draft, right now.
How mature is ChatGPT technology? Most companies seem to be beta testing it now. When will it be ready for primetime and what will that take? "I think the real question there is when we talk about it as a technology, what are we talking about? This form of artificial intelligence is based on a paper created in 2017 that created this architecture called a Transformer. The Transformer is a fairly mature piece of technology that many organizations are using — many of the tech organizations as well as organizations that do development of AI around natural language processing. That’s the predominant form there.
"What’s happened with this tech over past couple years, is that in that Transformer — think of it as the schematic for how the AI is designed — the builders of these models just kept giving it more and more data. And it reached an inflection point fairly recently where it started performing much better than it did in the past and the reason why it’s become so pervasive.
"One of these substantiations of this was created by the company OpenAI and GPT 3.0 [GPT stands for generative pre-trained transformer]. Funny enough, if you look at the search history for GPT 3.0 relative to ChatGPT, you realize that nobody really talked about GPT 3.0. But when they took a version of GPT 3.0 and coded it for these interactions to make it a chatbot, then it exploded.
"The ChatGPT construct, since it’s built on the Transformer model, is mature for some things and is not mature in most use cases today. The underlying framework — Transformer or GPT 3.0 — is mature for many different use cases. So our teams have been working with the GPT models to summarize text. You give it a bunch of long paragraphs and ask it to condense it down. We’ve been working at that for some time and it’s getting better and better, and we can now see many organizations are leveraging that capability.
"There are many things, as we’re seeing in the news today, that are very nascent and very much in a beta test mode. Those are usually the new products being released, like the ChatGPT product itself. Those things are still going through a lot of testing.
"As time has gone on..., we keep pushing more and more data into these models, where it gets much better than it did with less data. There’s a phenomenon behind this, and a great research paper written on it, called the "Emergent Abilities of Large Language Models." What that paper says is as you give large language models more data, all of a sudden it starts building all these new capabilities, but we also suspect there are new risks in using the technology, as well. That’s why I think we’re starting to see a lot more of the news related to [Microsoft’s] Bing AI than we saw with ChatGPT in its early days."
Why are we seeing more news around Bing versus ChatGPT? Was it less fully baked than OpenAI’s large language model? "I don’t know that we have a clear answer yet. I can’t say it was less fully baked. We do know OpenAI spent a lot of time creating guardrails around what the system was allowed to do and not do. They spent a lot of time testing it before they released it. I can’t say how much time Microsoft spent testing Bing before releasing it.
"But what I understand from speaking to people who’ve interacted with Bing AI is they would say it’s a stepwise change from what they’ve seen in ChatGPT’s abilities. But with all these new abilities also comes the ability to have new problems and inaccuracies, like 'hallucinations.'"
Is a hallucination related to a generative AI program more about giving inaccurate information or is there some HAL 9000, synaptic-like thought process happening in the background to cause it to give wrong answers? "The best we understand right now is these models intrinsically are word prediction engines. At its most basic level, it’s just predicting the next best word. In some cases, when it predicts that next best word, that word is no longer factually accurate for the particular question. But given that word, the next best word given after that continues down that path, and then you build a series of words that go down a path that’s no longer accurate — but it’s very convincing in the way it’s been written.
"So the challenge I think we have with hallucinations is that the system doesn’t tell you if it thinks it’s hallucinating. It begins to hallucinate in quite convincing terms — the same way it would if its answers were 100% accurate. So, it requires us to have a critical eye toward everything we see from it, and treat everything that comes out of this AI technology as a good first draft, right now."
So, do AI robots really dream of electric sheep? "There’s a lot of talk about the anthropomorphisms happening with technology today, and I think the best way to describe these AI technologies is they’re really just good at predicting the next best word.
"That’s where there are questions about whether we’re really ready for the broad release ... because we’ve not yet learned how to engage with this technology. You’re seeing headlines about how people believe they’re engaging with sentient AI. And what is sentience? And that sort of dialogue. It’s best to think about this as something when given a series of words, it predicts the next best word and sometimes that lands you in a really great place, and sometimes you have to go back through and edit it. Until it gets better, that’s the way we should be using it.
"One of the biggest use cases for ChatGPT or generative AI tech being pursued is customer service. That’s because the traditional metrics around measuring the efficiency of a service center evolve around something called ‘average handle time.' Average handle time is how long it takes someone to answer the phone call and then finish the post-call work that needs to take place.
"If you’ve ever walked through these service centers, you’ll see there’s a lot of people who are typing and no longer talking. That's all the work that needs to be done to type up the summary of the conversation that just took place with the customer on that call so they have a record of it. The AI technology is proving very good at being able to generate that quickly, so that the service agent, instead of typing it all out, can do a quick review of it and send it along.
"That’s where we’ve been working with some of our clients in developing use cases as well."
So, as I’ve had it explained to me, GPT-3 is the large language model on which ChatGPT is based and you can’t change that model, but you can literally help it learn to address a specific business need. How does that work? "There’s a field of skill, a new one known as prompt engineering. It’s being able to give some context to that large language model to kind of activate a certain part of its data set in a way so that you can prime it and tap into that data set and the answer. So, that’s one way companies are using and getting it to be focused on some context. Maybe priming it with examples of the way to respond and then giving it a question so that it will respond in that way.
“So, prompt engineering is a way companies are able to tailor it for their specific use cases.
“Another example we see, and I don’t think this is generally available yet, but I know a lot of these companies are preparing to be able to create a subset and copies of data specifically for their business — adding data to enrich that large language model. So, their company’s data would be added on top of that large language model and therefore they’ll be able to get answers from it very specific for their organization.
"That will be something we see a lot more of in the future, because as we start to work toward use cases that are more focused on being able to answer questions about a company’s policies or about the company’s business, it’s going to have to be primed with a lot of data about that company. And you don’t want to put that into the general large language model or else everybody else would have access to it as well.
“...This idea of local copies of data that are working together with the large model is something we’re likely to see a lot more of in the future. I know a lot of the big hyperscalers are planning to release that capability in the not-so distant future.”
Do you believe prompt engineering is becoming a marketable skill, something tech workers should consider learning? Much like excellent programming and visualization can be seen as works of art, prompt engineering will be a marketable and differentiating skill in the future. It’s essentially where human creativity meets AI. As schools incorporate an AI-infused curriculum, it will likely include prompting as a way of expressing creativity and critical thinking."
Does hosting this AI-based chatbot technology eat a lot of CPU cycles and energy? Or will ChatGPT and other bots mainly be hosted via a cloud service? "Currently, it’s a very large model drawing a lot of compute resources. The idea for the future is that we create these smaller, localized versions for companies who no longer need the entire, larger model. I think it would be impractical to take the entire GPT-3 or 3.5 or 4 model and say, 'OK, we’re going to get EY’s foundational model and add that on top of it.' These hyperscalers will likely figure out a way to create a data set for a company that sits on top of the large model, so they have a smaller private version, or they will find a way to compress the larger model in a way that will allow it to be brought into companies' cloud networks.”