This article was originally published on the Red Hat Customer Portal. The information may no longer be current.
Java is a very popular programming language. Two key reasons for its popularity are security and the availability of a huge ecosystem of libraries and components. Since most Java applications make use of a wide range of libraries, which in turn have dependencies on other libraries, it is difficult to ensure the integrity of these applications from a security perspective. A recent study by Aspect security has revealed the significance of this problem. This study found that 26% of dependencies being downloaded, used, and deployed contained known vulnerabilities. The study looked at 1261 versions of 31 different libraries that were being downloaded from the Central Repository via Maven. One of the key mitigation strategies suggested by this paper is to "Enforce scans of dependencies against a known vulnerability database". At the time the paper was written, no such tool or database existed.
When researching this problem, we found that a prototype database of known-vulnerable JAR files, called victims was created by Red Hat's own Steve Milner. The victims database contains a mapping of JAR file SHA-512 hashes to CVE IDs, identifying JAR files which are known to be vulnerable to the corresponding CVE IDs. The victims database was complete in a prototype form, but it was only seeded with a small data set that could prove the concept but not effectively catch vulnerable dependencies in most applications. The Red Hat Security Response Team (SRT) has now populated the victims database with JAR file hashes for all flaws with CVE IDs that are known to have affected the JBoss middleware product line, and SRT continues to add hashes to the database as they handle new flaws that affect JBoss products. At the time of writing, the victims database has 363 hashes of known vulnerable JARs, and is able to catch vulnerable dependencies in many test scenarios.
Enforce-victims-rule Maven Plugin
The victims database allows a developer to scan a build of their application and identify any known-vulnerable JARs it includes. However, the optimal time to discover vulnerable dependencies is at build time, when the cost of updating a reference to a security version is minimal. To support this, the Red Hat Product Security Team (PST) has produced a maven plugin called enforce-victims-rule. This is a new rule for the maven enforcer plugin which checks a maven project's dependencies against the victims database at build time. Checks are based on both JAR file hashes linked to the victims database, as well as JAR file metadata (artifact name and version). The plugin can be configured to trigger either warnings or fatal errors when vulnerable dependencies are detected.
Limitations & Future Work
Simple JAR hashing as used in the victims database is prone to false negatives: if you recompile a JAR, the checksum changes, so it is impossible to maintain a comprehensive database of all possible binary JARs compiled from the same source. Relying on metadata from the JAR file name, META-INF/MANIFEST.MF and other sources is prone to both false positives and false negatives. There is no reliable, mandatory metadata that can be used to identify known-vulnerable versions of a JAR file. An ideal solution would be to identify the individual method or methods exposing the vulnerability and look for their fingerprint to identify vulnerable JARs. Unfortunately, this approach is not practical, because insufficient information is available to do this for many vulnerabilities, and it would introduce heavy overhead to keep the database up to date. After working through several ideas, we are currently looking at improving the JAR file hashing with the following algorithm:
- Unpack the JAR file
- For each .class file, remove the JDK compiler mark from the file and hash the file
- Combine the hashes for all class files into a single hash
- Use diffing of the individual class files inside a JAR to handle JARs that are supersets of known-vulnerable JARs
We see several advantages to this approach over the simple hashing currently used:
- Recompiled JARs built with the same JDK will be matched
- Recompiled JARs compiled with a different JDK will be matched provided the JDK interprets the source into the same byte code as the JDK used to build the JAR that was hashed
- JARs that are combined together will be matched if they contain known-vulnerable class files and they are a complete superset of the JAR that was hashed
There are still some limitations and scope for false negatives. For example, if a JAR is recompiled by a JDK that differs significantly from the JDK used to build the JAR that was hashed, the byte code may mismatch and therefore the JAR will not be matched. If a JAR contains known-vulnerable class files, but it is not a complete superset of the JAR that was hashed, it will also not be matched. Despite these limitations, this is the best approach we have found so far. That said, this is still an open problem and we would love to hear input from the community on this problem - please reply using the comments system!
About the author
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies.