Harmful gap in Apache Commons Textual content – like Log4Shell yet again – Bare Safety


Java programmers love string interpolation options.

In the event you’re not a coder, you’re most likely confused by the phrase “interpolation” right here, as a result of it’s been borrowed as programming jargon the place it’s not an excellent linguistic match…

…however the concept is straightforward, very highly effective, and typically spectacularly harmful.

In different programming ecosystems it’s usually recognized merely as string substitution, the place string is shorthand for a bunch of characters, normally meant for displaying or printing out, and substitution means precisely what it says.

For instance, within the Bash command shell, in the event you run the command:

$ echo USER

…you’re going to get the output:


However in the event you write:

$ echo ${USER}

…you’re going to get one thing like this as an alternative:


…as a result of the magic character sequence ${USER} means to look within the setting (a memory-based assortment of knowledge values usually storing the pc identify, present username, TEMP listing, command path and so forth), retrieve the worth of the variable USER (by conference, the present person’s login identify), and use that as an alternative.

Equally, the command:

echo cat /and so forth/passwd

…prints out precisely what’s on the command line, thus producing:

cat /and so forth/passwd

…whereas the very similar-looking command:

$ echo $(cat /and so forth/passwd)

…comprises a magic $(...) sequence, with spherical brackets as an alternative of squiggly ones, which implies to execute the textual content contained in the brackets as a system command, acquire up the output, and write that out as a continous chunk of textual content as an alternative.

On this case, you’ll get again a barely garbled dump of the username file (regardless of the identify, no password information is saved in /and so forth/passwd any extra), one thing like this:

root:x:0:0::/root:/bin/bash bin:x:1:1:bin:/bin:/bin/false daemon:x:2:2:daemon:
daemon:x:2:2:daemon:/sbin:/bin/false adm:x:3:4:adm:/var/log:/bin/false lp:x:4:
7:lp:/var/spool/lpd:/bin/false [...TRUNCATED...]

The dangers of untrusted enter

As you’ll be able to think about, permitting untrusted enter, equivalent to information submitted in an internet kind or content material extracted from an e-mail, to be processed by part of your program that performs substitution or interpolation is usually a cybersecurity nightmare.

In the event you aren’t cautious, merely making ready a textual content message to be printed out to a logfile may set off a complete load of undesirable side-effects in your app.

These may embrace, at rising ranges of hazard:

  • By accident leaking information that was solely ever alleged to be in reminiscence. Any string interpolation that extracts information from setting variables after which writes it to disk with out permission may put you in bother together with your native information safety regulators. Within the Log4Shell incident, for instance, attackers made a behavior of attempting to entry setting variables equivalent to AWS_ACCESS_KEY_ID, which comprise cryptographic secrets and techniques that aren’t alleged to get logged or despatched anyplace besides to particular servers as a proof of authentication.
  • Triggering web connections to exterior servers and providers. Even when all an attacker can do is to trick you into trying up the IP variety of a servername utilizing DNS, you’ve nonetheless simply been coerced into “calling residence” to a DNS server that the attacker controls, thus probably leaking details about the interior construction of your community
  • Executing arbitrary system instructions picked by somebody exterior your community. If the string interpolation lets attackers trick your server into operating a command of their selection, then you’ve gotten created an RCE gap, brief for distant code execution, which usually means the attackers can exfiltrate information, implant malware or in any other case mess wtith the cybersecurity configuration in your server at will.

As you little question keep in mind from Log4Shell, pointless “options” in an Apache programming library referred to as Log4J (Logging For Java) immediately made all these situations attainable on any server the place an unpatched model of Log4J was put in.

In the event you can’t learn the textual content clearly right here, attempt utilizing Full Display screen mode, or watch straight on YouTube. Click on on the cog within the video participant to hurry up playback or to activate subtitles.

Not simply internet-facing servers

Worse, issues such because the Log4shell bug aren’t neatly confined solely to servers which can be straight at your community edge, equivalent to your internet servers.

When Log4Shell hit, the preliminary response from a number of organisations was to say, “We don’t have any Java-based internet servers, as a result of we solely use Java in our inner enterprise logic, so we predict we’re proof against this one.”

However any server to which person information was finally forwarded for processing – even safe servers that had been off-limits to connections from outsiders – may very well be affected if that server [A] had an unpatched model of Log4J put in, and [B] stored logs of knowledge that oroiginated from exterior.

A person who pretended their identify was ${env:USER}, for instance, would usually get logged by the Log4J code underneath the identify of the server account doing the processing, if the app didn’t take the precaution of checking for harmful characters within the enter information first.

Sadly, historical past repeated itself in July 2022, when an open supply Java toolkit referred to as Apache Commons Configurator turned out to have comparable string interpolation risks:

Third time unfortunate

And historical past is repeating itself once more in October 2022, with a 3rd Java supply code library referred to as Apache Commons Textual content selecting up a CVE for reckless string interpolation behaviour.

This time, the bug is denoted as follows:

CVE-2022-42889: Apache Commons Textual content previous to 1.10.0 permits RCE when utilized to untrusted enter as a result of insecure interpolation defaults.

Commons Textual content is a general-purpose textual content manipulation toolkit, described merely as “a library targeted on algorithms engaged on strings”.

Even in case you are a programmer who hasn’t knowingly chosen to make use of it your self, you might have inherited it as a dependency – a part of the software program provide chain – from different elements you might be utilizing.

And even in the event you don’t code in Java, or aren’t a programmer in any respect, you might have a number of functions by yourself laptop, or put in in your backend enterprise servers, that embrace compoents written in Java.

What went unsuitable?

The Commons Textual content toolkit features a helpful Java part referred to as a StringSubstitutor object, created with a Java command like this:

StringSubstitutor interp = StringSubstitutor.createInterpolator();

When you’ve created an interpolator, you should use it to rewrite enter information in helpful methods, equivalent to like this:

String str = "You have got-> ${java:model}";
String rep = interp.exchange(str);

Instance output:   You have got-> Java model 19

String str = "You might be-> ${env:USER}";
String rep = interp.exchange(str);

Instance output:   You might be-> duck

The exchange() operate processes its enter string as if it’s a type of easy software program program in its personal proper, copying the characters one-by-one aside from quite a lot of particular embedded ${...} instructions which can be similar to those utilized in Log4J.

Examples from the documentation (derived straight from the supply code file String­Substitutor.java) embrace:

Programming operate   Instance
--------------------   ----------------------------------
Base64 Decoder:        ${base64Decoder:SGVsbG9Xb3JsZCE=}
Base64 Encoder:        ${base64Encoder:HelloWorld!}
Java Fixed:         ${const:java.awt.occasion.KeyEvent.VK_ESCAPE}
Date:                  ${date:yyyy-MM-dd}  
DNS:                   $apache.org
Setting Variable:  ${env:USERNAME}
File Content material:          ${file:UTF-8:src/take a look at/assets/doc.properties}
Java:                  ${java:model} 
Script:                ${script:javascript:3 + 4} 
URL Content material (HTTP):    ${url:UTF-8:http://www.apache.org}
URL Content material (HTTPS):   ${url:UTF-8:https://www.apache.org}

The dns, script and url features are significantly harmful, as a result of they may result in untrusted information, obtained from exterior your community however processed or logged on one of many enterprise logic servers inside your community, doing the next:

dns:     Lookup a server identify and exchange the ${...} string with the given worth returned. If attackers use a site identify they themselves personal and management, then this lookup will terminate at a DNS server of their selecting. (The proprietor of a site identify is, in reality, obliged to offer whats referred to as definititive DNS information for that area.)

url:     Lookup a server identify, connect with it utilizing HTTP or HTTPS, and use what's ship again as an alternative of the string ${...}. The hazard posed by this behaviour depends upon what the alternative string is used for.

script:  Run a command of the attacker's selecting. We had been solely in a position to get this operate to work with older variations of Java, as a result of there is not any longer a JavaScript engine constructed into Java itself. However many firms and apps nonetheless use old-but-still-supported Java variations equivalent to 1.8 (JDK 8) and 11.0 (JDK 11), on which the damaging ${script:javascript:...} distant code execution interpolation trick works simply advantageous.


String str = "DNS lookup-> $nakedsecurity.sophos.com";
String rep = interp.exchange(str);

Output:   DNS lookup->


String str =  "Stuff sucked from web-> ---BEGIN---${url:UTF8:https://instance.com}---END---"
String rep = interp.exchange(str);

Output:   Stuff sucked frob web-> ---BEGIN---<!doctype html>
    <title>Instance Area</title>
    . . .

    <h1>Instance Area</h1>
    [. . .]


String str = "Run some code-> ${script:javascript:6*7}"
String rep = interp.exchange(str);

Output:   Run some code-> 42

What to do?

  • Replace to Commons Textual content 1.10.0. On this model, the dns, url and script features have been turned off by default. You possibly can allow them once more if you would like or want them, however they gained’t work until you explicity flip them on in your code.
  • Sanitise your inputs. Wherever you settle for and course of untrusted information, particularly in Java code, the place string interpolation is broadly supported and provided as a “characteristic” in lots of third-party libraries, be sure you search for and filter out probably harmful character sequences from the enter first, or take care to not go that information into string interpolation features.
  • Search your community for Commons Textual content software program that you simply didn’t know you had. Looking for information with names that match the sample commons-text*.jar (the * means “something can match right here”) is an effective begin. The suffix .jar is brief for java archive, which is how Java libraries are delivered and put in; the prefix commons-text denotes the Apache Widespread Textual content software program elements, and the textual content within the center coated by the so-called wildcard * denotes the model quantity you’ve acquired. You need commons-text-1-10.0.jar or later.
  • Observe the newest information on this concern. Exploiting this bug on weak servers doesn’t appear to be fairly as simple because it was with Log4Shell. However we suspect, if assaults are discovered that trigger bother for particular Java functions, that the dangerous information of how to take action will journey quick. You possibly can preserve up-to-date by maintaining your eye on this @sophosxops Twitter thread:


Don’t overlook that you could be discover a number of copies of the Widespread Textual content part on every laptop you search, as a result of many Java apps carry their very own variations of libraries, and of Java itself, with a purpose to preserve exact management over what code they really use.

That’s good for reliability, and avoids what’s recognized in Home windows as DLL hell or dependency catastrophe, however not fairly pretty much as good on the subject of updating, as a result of you’ll be able to’t merely replace a single, centrally managed system file and thus patch the complete laptop directly.


Please enter your comment!
Please enter your name here