,

Artificial Intelligence in Evaluation, Validation, Testing and Certification

Everybody seems to jump on the AI bandwagon these days, “enhancing” their products and services with “AI.” It sounds, however, a bit like the IoT hype from the last decade when your coffee machine desperately needed Internet access. This time, though, there’s also some Armageddon undertone, claiming that AI would make our jobs obsolete and completely transform all sorts of businesses, including ours.

So, it comes as no surprise that atsec gets asked by customers, government agencies, and almost  everybody communicating with us how we position ourselves on the use of AI in our work and how we deal with AI being used in the IT security environment of our customers and in all sorts of other areas as well.

First answer: Unfortunately, we don’t yet use it for authoring blog entries, so musing about the benefits and drawbacks of AI in our work still can ruin your weekend. 🙁

Second answer: For an excellent overview of how we deal with AI and what we expect from this technology, there is a brilliant interview with Rasma Araby, Managing Director of atsec AB Sweden:

Of course, AI is discussed within atsec frequently, as we are a tech company by nature. We analyze IT technologies for impacts on IT security and are eager to deploy new technologies for ourselves or introduce them to our customers if we believe they will be beneficial.

atsec’s AI policy foundation

Recently, we defined some basic policies on the use of AI within atsec. Those policies have two cornerstones:

First and foremost, we are committed to protecting  all sensitive information we deal with, especially any information entrusted to us by our customers. We will not share such information and data with third parties and thus will not supply any such information in publicly available AI tools.

There are several reasons for this: Obviously, we would violate our NDAs with our customers if we send their information to a public server. Also, there is currently no robust way to establish trust in these tools, and nobody could tell you how such information would be dealt with. So, we must assume that we would push that information directly into the public domain. Even if we tried to “sanitize” some of the information, I would be skeptical that an AI engine would not be able to determine which customer and product our chat was about. The only way to find out would be to risk disaster, and we’re not in for that. Furthermore, sanitizing the information would probably require more effort than writing up the information ourselves.

The second cornerstone is not different from our use of any other technology: any technology is only a tool supporting our work. It won’t take any responsibility for the results.

We are using many tools to help our work, for example, to help us author our evaluation reports and to keep track of our work results, evidence database, etc.  Such tools could be marketed easily as AI, but as the saying goes: “A fool with a tool is still a fool.” Our evaluators take responsibility for their work products, and our quality assurance will not accept errors being blamed on a tool. Tools are always treated with a good dose of mistrust. We always have humans to verify that our reports are correct and to assume responsibility for their contents. This will not be different with an  AI tool. At atsec, our evaluators and testers  will always be in ultimate control of our work.

With this framework, we are in a good position to embrace AI tools where they make sense and do not violate our policies. We are aware that we cannot completely avoid AI anyway, for example, when it “creeps” into standard software tools like word processors. AI-based tools helping our techies to re-phrase their texts for readability and better understanding might sometimes  be an improvement cherished by our customers. 😀

We expect AI tools to help, for example, with code reviews and defining meaningful penetration tests in the foreseeable future . However, we currently do not encounter such tools that could be run in controlled, isolated environments to fulfill our AI policy requirements.

Correctness of AI, trust in AI engines

As already stated, we do not treat current AI engines as trusted tools we can blindly rely upon. This is based on the fact that the “intelligence” displayed in the communication by these engines comes mostly from their vast input, which is absorbed into a massive network with billions, even trillions of nodes. Most of the large language models used in the popular AI engines are fed by the Common Crawl database of Internet contents (refined into Google’s Colossal Clean Crawled Corpus), which increases by about 20 terabytes per month. This implies that input for the training of the engines cannot be fully curated (i.e., fact-checked) by humans, and it leaves lots of loopholes to inject disinformation into the models. I  guess that every troll farm on the planet is busy doing exactly that.

The developers of these AI engines try to fight this, but filtering out documents containing “dirty naughty obscene and otherwise bad words” won’t do the trick. If your favorite AI engine doesn’t have quotes from Leslie Nielsen’s “The Naked Gun” handy, that’s probably why. Checking the AI’s “Ground Truths” against Wikipedia has its shortcomings, too.

Therefore, the AI engine companies use different benchmarks to test the AI engine output, with many of those outputs checked by humans. However, the work conditions of those “clickworkers” are often at a sweatshop level, which does not help to establish our trust in the accuracy and truthfulness of the results.

Therefore, if atsec would use such engines in its core business of assessing IT products and technology, we would not be able to put a reasonable amount of trust in the output obtained from these engines and it would require us to fact-check each statement made by the AI. This might easily result in more effort than writing the reports ourselves and trusting our own judgment.
Note that the accuracy of AI answers being between 60 and 80 percent depending on the subject tested in the benchmarks, together with the problems of poisoning the input, how to establish “truthfulness” of the AI, and ethical and philosophical questions about which information to provide are topics in the EU and US efforts to regulate and possibly certify AI engines. Unfortunately, while the problems are well known, their solutions are mostly not. AI researchers across the globe are busily working on those subjects, but my guess is that those issues may be intrinsic to today’s large language models and cannot be solved in the near future.

Offensive AI

A common Armageddon scenario pushed by AI skeptics is that big AI engines like the ones from OpenAI, Microsoft, Google, Meta, and others will help the evil guys  find vulnerabilities and mount attacks against IT infrastructures much easier than ever. After almost 40 years in IT security, that doesn’t scare me anymore. IT security has been an arms race between the good and bad guys from the very beginning, with the bad guys having an advantage as they only need to find one hole in a product, while the good guys have the task of plugging all holes.

As history teaches us, the tools used by the bad guys can and will be used by the good guys too. Tools searching for flaws have been used by hackers and developers alike, although developers were at times more reluctant to adopt them. AI will be no different, and maybe it will help developers to write more robust code, for example, by taking on the tedious tasks of thorough input and error checking, which are still among the most prominent causes of software flaws. Will atsec deploy those tools as well for their evaluations and testing? While we will certainly familiarize ourselves with those tools and might add them to our arsenal, it will be much more beneficial for developers to integrate those tools in their development and test processes,  subjecting all of their code to that scrutiny as soon as the code is written or modified, rather than having a lab like atsec deploying those tools when the product may already be in use by customers.

We have always advocated, in standards bodies and other organizations creating security criteria, that the search for flaws should be conducted within the developer’s processes and that the lab should verify that these searches for flaws and vulnerabilities are performed effectively in the development environment. This is also true for AI tools.

Summary

The hype about AI tools that started with the public availability of ChatGPT less than a year ago has already reached its “Peak of Inflated Expectations” (according to Gartner’s “hype cycle” model) and is on its way to the “Trough of Disillusionment.” The yet-to-come “Slope of Enlightenment” will lead to the “Plateau of Productivity,” when we finally have robust AI tools at our disposal, hopefully, combined with a certification that provides sufficient trust for their efficient deployment. In any case, atsec will monitor the development closely and offer to participate in the standardization and certification efforts. AI will become an integral part of our lives, and atsec is committed to helping make this experience as secure as possible.