There's no doubt generative AI models such as ChatGPT, BingChat, or GoogleBard can deliver massive efficiency benefits — but they bring with them major cybersecurity and privacy concerns along with accuracy worries.
It's already known that these programs — especially ChatGPT itself — make up facts and repeatedly lie. Far more troubling, no one seems to understand why and how these lies, coyly dubbed "hallucinations," are happening.
In a recent 60 Minutes interview, Google CEO Sundar Pichai explained: “There is an aspect of this which we call — all of us in the field — call it as a ‘black box.' You don’t fully understand. And you can’t quite tell why it said this.”
The fact that OpenAI, which created ChatGPT and the foundation for various other generative models, refuses to detail how it trained these models adds to the confusion.
Even so, enterprises are experimenting with these models for almost everything, regardless of the fact the systems lie repeatedly, no one knows why this happens and there doesn't seem to be a fix anywhere in sight. That's an enormous problem.
Consider something as mundane as summarizing lengthy documents. If you can’t trust that the summary is accurate, what’s the point? Where is the value?
How about when these systems do coding? How comfortable are you riding in an electronic vehicle with a brain designed by ChatGPT? What if it hallucinates that the road is clear when it isn’t? What about the guidance system on an airplane, or a smart pacemaker, or the manufacturing procedures for pharmaceuticals or even breakfast cereals?
In a frighteningly on-point pop-culture reference from 1983, the film Wargames depicted a generative AI system used by the Pentagon to more effectively counter-strike in a nuclear war. It was housed at NORAD. At one point, the system decides to run its own test and fabricates a large number of imminent incoming nuclear missile strikes from Russia.
The developer of the system argues the attacks are fictitious, that the system made them up. In an eerily predictive moment, the developer says that the system was “hallucinating” — decades before the term was coined in the AI community. (The first reference to hallucinations appears to be from Google in 2018.)
In the movie, NORAD officials decide to ride out the "attack," prompting the system to try and take over command so it can retaliate on its own. That was fantasy sci-fi back 40 years ago; today, not so much.
In short, using generative AI to code is dangerous, but its efficiencies are so great that it will be extremely tempting for corporate executives to use it anyway. Bratin Saha, vice president for AI and ML Services at AWS, argues the decision doesn’t have to be one or the other.
How so? Saha maintains that the efficiency benefits of coding with generative AI are so sky-high that there will be plenty of dollars in the budget for post-development repairs. That could mean enough dollars to pay for extensive security and functionality testing in a sandbox — both with automated software and expensive human talent — and the very attractive spreadsheet ROI.
Software development can be executed 57% more efficiently with generative AI — at least the AWS flavor — but that efficiency gets even better if it replaces les experienced coders, Saha said in a Computerworld interview. “We have trained it on lots of high-quality code, but the efficiency depends on the task you are doing and the proficiency level,” Saha said, adding that a coder “who has just started programming won’t know the libraries and the coding.”
Another security concern about pouring sensitive data into generative AI is that it can pour out somewhere else. Some enterprises have discovered that data fed into the system for summaries, for example, can be revealed to a different company later in the form of an answer. In essence, the questions and data fed into the system become part of its learning process.
Saha said generative AI systems will get safeguards to minimize data leakage. The AWS version, he said, will allow users to “constrain the output to what it has been given,” which should minimize hallucinations. “There are ways of using the model to just generate answers from specific content given it. And you can contain where the model gets its information from.”
As for the issue of hallucinations, Saha said his team has come up with ways to minimize that, noting also that the code-generation engine from AWS, called CodeWhisperer, uses machine learning to check for security bugs.
But Saha’s key argument is that the efficiency is so high that enterprises can pour lots of additional resources into the post-coding analysis and still deliver an ROI strong enough to make even a CFO smile.
Is that bargain worth the risk? It reminds me of a classic scene in The Godfather. Don Corleone is explaining to the heads of other organized crime families why he opposes selling drugs. Another family head says that he originally thought that way, but he had to bow to the huge profits.
“I also don’t believe in drugs. For years, I paid my people extra so they wouldn’t do that kind of business. But somebody comes to them and says ‘I have powders. You put up $3,000-$4,000 investment, we can make $50,000 distributing.’ So they can’t resist,” the chief said. “I want to control it as a business to keep it respectable. I don’t want it near schools. I don’t want it sold to children.”
In other words, CISOs and even CIOs might find the security tradeoff dangerous and unacceptable, but line-of-business chiefs will find the savings so powerful they won’t be able to resist. So CISOs might as well at least put safeguards in place.
Dirk Hodgson, the director of cybersecurity for NTT Ltd., said he would urge caution on using generative AI for coding.
“There is a real risk for software development and you are going to have to explain how it generated the wrong answers rather than the right answers,” Hodgson said. Much depends on the nature of the business — and the nature of the task being coded.
“I would argue that if you look at every discipline where AI has been highly successful, in all cases it had a low cost of failure,” Hodgson said, meaning that if something went wrong, the damage would be limited.
One example of a low-risk effort would be an entertainment company using generative AI to devise ideas for shows or perhaps dialogue. In that scenario, no harm would come from the system making stuff up because that's the actual task at hand. Then again, there's danger in plagiarizing an idea or dialogue from a copyrighted source.
Another major programming risk includes unintended security holes. Although security lapses can happen within one application, they can also easily happen when two clean apps interact and create a security hole; that's a scenario that would never have been tested because no one anticipated the apps interacting. Add in some API coding and the potential for problems is orders of magnitude higher.
“It could accidentally introduce new vulnerabilities at the time of coding, such as a new way to exploit some underlying databases. With AI, you don’t know what holes you may be introducing into that code,” Hodgson said. “That said, AI coding is coming and it does have benefits. We absolutely have to try to take advantage of those benefits. Still, do we really know the liability it will create? I don’t think we know that yet. Our policy at this stage is that we don’t use it.”
Hodgson noted Saha's comments about AI efficiencies being highest when replacing junior coders. But he resisted the suggestion that he take programming tasks away from junior programmers and give them to AI. “If I don’t develop those juniors, I won’t ever make them seniors. They have to learn the skills to make them good seniors.”