Let’s talk about the security of AI and ML Systems
Let’s talk about the security of AI and ML Systems.
Yes, there are tons of benefits to using outputs from AI, whether that’s in terms of helping to write code, identifying new scenarios you may not have thought of, or simply to check if your script is along the right lines.
But, there is a dark side.
To use it correctly you should be asking yourself questions like:
- What are the security implications of using A I?
- How do we use it securely?
- Am I actively removing any sensitive information when using it?
But also, you may now be wondering how you secure these applications, what exactly are the risks and how can I protect my business from them?
A new world of risks. Are you ready?
If you are analyzing your systems from a design perspective before you go and use AI and, and deploy AI systems, you are ahead of the pack and you can essentially get that business benefit while reducing your security risk. Machine learning isn’t going anywhere, and regulation is now following it as it's recognized that it is here to stay. It brings with it new types of risks - and this is why we all need to care about these risks.
Why are ML risks different?
The whole idea is for the machine to be trained up on some training data to become the data.
And so there are a number of risks associated with that alone.
- For example, where did that data come from?
- Are they biased?
- Do they have confidential information stored in them?
- What if somebody, a bad person can attack them and change them or poison them?
- What if we produce bad information and we feed it back into the machine?
Four top risks to be aware of in Large Language Models (LLMs).
All of these risks are somewhat related to things that we've been thinking about in security for many years, but they're also novel in a sense that the machine in machine learning becomes the data. The Berryville Institute of Machine Learning (BIML) has published an open source Creative Commons licensed threat model for AI and ML systems. And so here are the top 4 risks when using Large Language Models (LLMs):
- Recursive pollution - And that's when you have a machine learning model that creates something incorrect or wrong. And it puts it out there on the internet or into your data pile. And then later it consumes the same thing that it created, making a feedback loop of wrongness that's called recursive pollution.
- Data trustworthiness and curation - Where does data come from to build these large language models? For example, ChatGPT has 14 trillion data points, not billion, not million, not hundreds, not thousands but trillions - 14 trillion. That means there's some stuff in there that it's just basically too big to check or even understand. And the data sets are often not really disclosed for some of these foundation models.
- Transfer learning - Often when you're building an LLM, you start with a foundation model like ChatGPT or Google Bard and everything that the foundation model has with regard to risks built in. You drag along with you when you build an application number four, nobody is really sure how these things work. So there's a lot of black box opacity involved. You can think of using an LLM foundation model as using an undocumented unstable API that has arbitrary behavior. So we get that kind of stochastic behavior and what that means is sometimes large language models can just be completely wrong.
- Prompt manipulation - You can have malicious input and, and it causes all sorts of hilarious, unstable behavior with large language models. For example, you can tell ChatGPT to pretend to be Google Bard, something that it's not allowed to do when it's ChatGPT, but it's happy to do when it's pretending to be Bard. This is an example of prompt manipulation.
And now with what's happening in the context of transformer architectures and LLMs, it's really this realization that you're managing and working with many, many models, you're going to need some way of standardizing the risks that you're concerned about. You need a way of interfacing or interacting with the underlying base model in a way that takes into account the risks associated with prompt injection attacks and the risks associated with data leakage and PI I and, and multimodal management.
Regulation, at record speed.
If ChatGPT set the records for the number of users that signed up in a record amount of time, the regulators that jumped on AI set another record for how quickly they responded to AI in creating the Guidelines for secure AI system development.
Is Europe leading the way?
There are two, macroeconomic and societal reasons why the EU wants to build cyber laws. First of all, they want to build economic resilience within the industry, within the economy in order to make us as a group of nations more resilient against potential cyber warfare. And secondly, the EU wants to protect citizens' rights when it comes to threats from technology.
The GDPR was the first example of this which goes quite a lot broader than, for instance, the private legislation within the United States. With the exception of some sector specific legislation such as HIPAA for instance. It looks like the EU is trying to take a more ethical stance when it comes to technology legislation.
And there's also quite a lot of parallels between the AI Act and the GDPR, for instance, because they both are talking about security by design, they are demanding that AI is built in a secure way. But because legislation can never follow the technology, the words remain somewhat vague and they state that security should be taken into the design. Which is where threat modeling can provide value.
What about other countries and continents?
The Guidelines for secure AI system development, has contributions from many countries outside of Europe, such as Japan, Canada and Singapore, to name a few. So the content isn’t just EU-specific or with a European view. The United States of America doesn’t currently have its own version of the AI guidelines, but many very big technology companies are asking Congress to create a version of it. However, these big technology companies may also come with their own version of how this law should be implemented within the United States.
However, when President Biden released the Executive Order on Improving the Nation's Cybersecurity, the US was arguably the leader on this narrative in 2021. And therefore, we may see a version of USA AI guidelines before we know it.
Can threat modeling really help?
If you do threat modeling of machine learning and you do things right from a design and security perspective, you're going to be leading and getting ahead of the regulation, especially in the United States. Organizations should do security by design, and to be truly secure, ML and AI applications must be threat modeled too, to identify these new and unique risks. Giving greater confidence in your software.
How to identify risks from ML and AI systems.
In October 2023, we released our first Ml and AI library as part of the IriusRisk platform which can even be used in our free Community Edition. It lets you build a threat model using AI concepts and helps users to understand what risks they are faced with in the system they are building.
An important aspect of doing threat modeling is, as we always say, you're going to get the most benefits by doing it early on because you'll avoid these problems later on in the rest of the development process.
This is doubly important for AI and ML systems because if you get it wrong at design, it costs you a lot of money to retrain those models with new data sets that have been properly sanitized and then go and rebuild the model at the, at the other end of it, which is quite different from, from regular software security.
Want to hear more?
Watch the fireside discussion between several experts in AI, ML and compliance, in this recent webinar for more ideas and arguments on how we can use LLMs but should also be aware of their flaws. https://www.youtube.com/watch?v=RI0pNGH9bgA&t=0s