Polymorphic Information-flow Guards — Guardies

Table of Contents

Caveat

This page presents a work that is sill in progress, and is not yet self-contained; in particular, some technical terms are not defined.

Presentation

This page presents Guardies, a prototype static analysis tool whose primary goal is to automatically detect leaks of confidential data in object-oriented programs. It is designed to target the specific challenges posed by the programming models often encountered on mobile platforms such as Android: object-oriented programming and message-passing with serialized structured data require the development of dedicated techniques to efficiently and precisely capture the flow of secret information within applications.

Note the actual name of the command-line executable for the tool is grds, which is a shorthand for Guardies.

Context

Guardies is currently developed in the Department of Computer Science of the University of Liverpool by Nicolas Berthier, in collaboration with Narges Khakpour from the Department of Computer Science and Media Technology at Linnaeus University.

This work started as a new approach to Narges' previous work in the PROSSES project.

Working Principles and Techniques

The working principles of Guardies are as follows:

  • Input methods are given in an assembly-style object-oriented language very similar to Soot's Jimple language, that closely reflects Java bytecode. This allows Guardies to handle already compiled and optimized applications (e.g. jar, apk);
  • Each method is considered separately and its security semantics is encoded using a symbolic transition system. This encoding assigns security levels to variables and reachable portions of heap, and encodes the control-flow of the method under consideration. It involves a generic symbolic abstract heap domain to capture relations between reference variables, as well as direct and indirect information flow through object mutations;
  • Information-flow properties (such as data confidentiality or integrity) are then translated into safety properties on the system;
  • Lastly, a co-reachability analysis solver is used to infer a sound polymorphic information-flow guard for the method, that expresses circumstances upon which the method satisfies the in terms of its calling context (i.e. security levels and relations pertaining to its formal arguments).

Guardies relies on ReaTK to solve the symbolic discrete controller synthesis problems.

Download

The tool will be distributed under the terms of the GNU General Public License upon appearance of a first publication about the underlying approach.

Basic Work-flow

grds can receive as input a description of a complete class and interface hierarchy (file .classes), and a set of methods (files .meth) to be analyzed. It then computes one information-flow guard per method (files .secsum), that represents a pre-condition on the security level of the calling context (pc) and the level of each argument, whose satisfaction guarantees the absence of leak.

guardies-basic-workflow.png

Figure 1: Basic workflow; dotted parts are not part of Guardies.

The technique that we employ within Guardies is modular in the sense that, in order to compute a security summary for a method m that calls other methods, Guardies reuses the guards inferred for the latter to construct the model for the former. This way, ensuring data confidentiality in an application boils down to solve multiple small symbolic analysis problems instead of a single one that would consider all potential information flows within the entire application at once.

The prototype supports further use cases. For instance, it also accepts security stubs that are concise descriptions of the behavior of third-party methods provided by the user (see Section Security Stubs for some details). It also features a means to compute (an over-approximation of) an application's callgraph from a given set of methods so as to: (i) compute a guard for each of the methods involved; and (ii) handle recursion.

Preliminary Notations

This Section gives some preliminary notations required for the detailed description of the results computed by Guardies.

In the descriptions below, \(\vec{c}\) denotes an object referenced by a program variable c (local or formal method argument), and as a special case, \(\vec{o}\) denotes the instance object referenced by this in non-static methods.

Security Levels

Security levels belong to the set \(\{⊥,⊤\}\). \(⊥\) denotes low or public level, and \(⊤\) denotes high or private level. \(⊔\) is the least upper bound between two security levels (i.e. in our case, \(a ⊔ b = ⊤\) iff \(a = ⊤\) or \(b = ⊤\)). The inferred security summaries involve the security level of variables (noted level(v) for a primitive or reference variable v) as well as of potentially reachable objects (denoted obj_level(r) for a reference variable r).

Furthermore, pc denotes the security level of the calling context for a method.

Synonyms

For convenience Guardies understands several synonyms for security levels, that can be used in place of \(⊤\) and \(⊥\) whenever a constant security level denotation is expected:

  • \(⊥\) can be substituted with L, l, B, or b;
  • \(⊤\) can be substituted with H, h, T, or t.

Heap-related Relations & Abstract Heap Domains

Guardies makes use of a generic notion of symbolic abstract heap domain to represent heap abstractions as part of the encoding of security semantics. The chosen domain impacts the precision and cost of the analysis, as well as the set of relations that may appear in guards inferred by the tool.

Several kinds of symbolic variables pertaining to reference variables may appear in guards inferred by Guardies. The descriptions bellow assume the domain named "deep" is used, which involves the following Heap-related relations. Given two reference variables a and b, a~b denotes that \(\vec{a}\) and \(\vec{b}\) may be the same object (i.e. a and b may alias). Similarly, a.~>b denotes that a field (or cell) of \(\vec{a}\) may (transitively) reference \(\vec{b}\).

Example Analysis Results

This Section describes guards output by Guardies for simple methods. For clarity, support for exceptions is disabled for all examples unless explicitly stated.

Polymorphic Information-flow Guards

Let's assume given a complete class hierarchy, and a method m for which we want to compute a security summary. Listing 1 presents a description of object types as understood by Guardies, where class A directly extends java.lang.Object and does not declare any field, class B inherits from A and features a primitive field f of type int, and class C are objects that may reference objects of the same class through a field f.

A <: java.lang.Object
B <: A { int f; }
C <: java.lang.Object { C f; }

Silent Method

Listing 2 shows the code of an example method m as given to Guardies. m is defined in class A; it receives as argument a reference b to an object of class B and an integer i, and returns the sum between the field f of \(\vec{b}\) and i.

int A:m (B b, int i) {
      int tmp;
      tmp = b.f;
      tmp = tmp + i;
      return tmp;
}

For the method m, Guardies computes the security summary given in Listing 3. m does not output any value or object on any channel, nor does it call any method, so it cannot leak any confidential data per se. Therefore, its guard always holds (guard = tt).

int A:m (B b, int i) {
  guard = tt;
}

Outputs on Low Channel & Their Impact on Post-conditions

Consider now in Listing 5 the result for method mo given in Listing 4. mo is broadly equivalent to method m except that it additionally outputs the value of the field f of its first argument on a public channel (\(⊥\)).

int A:mo (B b, int i) {
      int tmp;
      tmp = b.f;
      output_⊥ (tmp);
      tmp = tmp + i;
      return tmp;
}
int A:mo (B b, int i) {
  guard = pc ⊔ level(b) ⊔ obj_level(b) = ⊥;
}

In this case, the guard indicates that to prevent (direct and indirect) leaks of confidential data:

  • m should be called in a low-context (pc = ⊥); and
  • reference argument b must not be or point to any confidential data (level(b) ⊔ obj_level(b) = ⊥).

Security Stubs

When an application to analyze makes use of third-party code, or if some of its methods have not been completely implemented, then a set of security stubs can be used to concisely describe their behavior and still permit the inference of information-flow guards.

Syntax & Semantics

Sets of security stubs can be given to Guardies in files with extension .secstubs. Such a file associates method type signatures with possibly empty sequences of statements. For each one of these associations, these statements are used by the tool to construct information-flow guards and effect transformers, which put together form a summary for the method, in the following way:

  • First, a default summary is built based on a few characteristics of the method (whether it is static or not, or whether it is a constructor, whether it returns a reference, and the type of its arguments)—See Section Defaults below;
  • Then, the sequence of statements in the security stub is processed sequentially to iteratively amend the summary—See Section Statements;
  • The resulting summary is then associated with the method and is later treated by Guardies as if it had been computed from an implementation of the method itself.

Defaults

As of Thursday 04 November 2021 (guardies.0.2.10), the default summary for a method corresponds to:

  • a "silent" behavior w.r.t. output channels (i.e. the method produces no output, so its guard always holds);
  • a "distrustful" behavior w.r.t. its arguments (i.e. the method does not assume their integrity);
  • an "evil" behavior w.r.t. object mutations (i.e. the method may mutate every object it has been given access to, in the worst possible way) and returned values and references (i.e. the method may return any object reachable from its argument references whenever their types allow it).

When applicable, the security level of any returned value and object is the least upper bound of that of any of the arguments.

Listing 7 gives default summaries forged by Guardies from stubs given in Listing 6.

static int A:static_default (C, C, int)
int A:default (C, C, int)
C (A, C, int)
static int A:static_default (C a0, C a1, int a2) {
  guard = tt;
  ret(level()) = level(a0) ⊔ level(a1) ⊔ level(a2) ⊔ obj_level(a0) ⊔ obj_level(a1);
  ret(obj_level(a0)) = ⊤;
  ret(obj_level(a1)) = ⊤;
  ret(a0.~>a0) = tt;
  ret(a0.~>a1) = tt;
  ret(a1.~>a0) = tt;
  ret(a1.~>a1) = tt;
}
int A:default (C a0, C a1, int a2) {
  guard = tt;
  ret(level()) = level(a0) ⊔ level(a1) ⊔ level(a2) ⊔ level(this) ⊔ obj_level(a0) ⊔ obj_level(a1) ⊔ obj_level(this);
  ret(obj_level(a0)) = ⊤;
  ret(obj_level(a1)) = ⊤;
  ret(obj_level(this)) = ⊤;
  ret(a0.~>a0) = tt;
  ret(a0.~>a1) = tt;
  ret(a1.~>a0) = tt;
  ret(a1.~>a1) = tt;
}
C (A a0, C a1, int a2) {
  guard = tt;
  ret(obj_level(a0)) = ⊤;
  ret(obj_level(a1)) = ⊤;
  ret(obj_level(this)) = ⊤;
  ret(a1.~>a1) = tt;
  ret(a1.~>this) = tt;
  ret(this.~>a1) = tt;
  ret(this.~>this) = tt;
}

Statements

We describe in Table 1 the various statements that can appear in security stubs.

Table 1: Statements involved in security stubs. In statements, denotes an optional security level ("⊥" or "⊤" or their synonyms), "⊥" if absent, and τ is an optional taint marker ("?"); at least one of them must be present unless a ? is appended. \(α\) is the least upper bound between the security levels of every argument (primitive value, reference, and referenced objects) of the method, including \(\vec o\).
Statement Description: The method… Validity
-<~ … does not perform any object mutation any method
@<~@ … may only mutate an object referenced by any of its arguments (excluding \(\vec o\)) with data of security level at most \(α\) any method
@<~ℓτ … may only mutate an object referenced by any of its arguments (excluding \(\vec o\)) with data of security level at most ℓ any method
.*<~@ … may only mutate \(\vec o\) with data of security level at most \(α\) non-static
.*<~ℓτ … may only mutate \(\vec o\) with data of security level at most ℓ non-static
*<~@ … may only mutate \(\vec o\) or an object referenced by any of its arguments with data of security level at most \(α\) any method
*<~ℓτ … may only mutate \(\vec o\) or an object referenced by any of its arguments with data of security level at most ℓ any method
return @ … returns one of its arguments (including \(\vec o\)), of security level at most \(α\) non-void
return new_ℓτ … returns a value, or reference and object, that is distinct from its arguments (and that does not depend on any them), of security level at most ℓ non-void
return new … returns a value, or reference and object, that is distinct from its arguments (and that does not depend on any them) non-void
return .*ℓτ? … returns a field of \(\vec o\) of security level at most ℓ \(⊔~\vec o\) non-void-static
return *ℓτ? … returns a field of \(\vec o\) or one of its arguments, of security level at most ℓ \(⊔~α~⊔~\vec o\) non-void
output_⊥ … may output any of its arguments (including \(\vec o\)) on a public channel any method
output_⊤ … may output any of its arguments (including \(\vec o\)) on a private channel any method
-! … does not assume integrity of any of its arguments (including \(\vec o\)) any method
@! … assumes integrity of all its arguments (including \(\vec o\)) any method
.*! … assumes integrity of \(\vec o\) non-static

Examples

Listings 8 and 10 illustrate the construction of summaries from stubs by means of amending the default.

/* None of the following methods perform any mutation */
int A:nice1 (C, C, int) { -<~; return new }
int A:nice2 (C, C, int) { -<~; return @ }
C A:nice_allocC (C, C, int) { -<~; return new }
void A:nice_log (C, int) { -<~; output_⊥ }
int A:nice1 (C a0, C a1, int a2) {
  guard = tt;
  ret(level()) = ⊥;
  ret(obj_level(a0)) = obj_level(a0);
  ret(obj_level(a1)) = obj_level(a1);
  ret(obj_level(this)) = obj_level(this);
  ret(a0.~>a0) = a0.~>a0;
  ret(a0.~>a1) = a0.~>a1;
  ret(a1.~>a0) = a1.~>a0;
  ret(a1.~>a1) = a1.~>a1;
}
int A:nice2 (C a0, C a1, int a2) {
  guard = tt;
  ret(level()) = level(a0) ⊔ level(a1) ⊔ level(a2) ⊔ level(this) ⊔ obj_level(a0) ⊔ obj_level(a1) ⊔ obj_level(this);
  ret(obj_level(a0)) = obj_level(a0);
  ret(obj_level(a1)) = obj_level(a1);
  ret(obj_level(this)) = obj_level(this);
  ret(a0.~>a0) = a0.~>a0;
  ret(a0.~>a1) = a0.~>a1;
  ret(a1.~>a0) = a1.~>a0;
  ret(a1.~>a1) = a1.~>a1;
}
C A:nice_allocC (C a0, C a1, int a2) {
  guard = tt;
  ret(level()) = ⊥;
  ret(obj_level()) = (if ret(~a0) then obj_level(a0) else ⊥) ⊔ (if ret(~a1) then obj_level(a1) else ⊥) ⊔ (if ret(a0.~>) then obj_level(a0) else ⊥) ⊔ (if ret(a1.~>) then obj_level(a1) else ⊥);
  ret(obj_level(a0)) = obj_level(a0);
  ret(obj_level(a1)) = obj_level(a1);
  ret(obj_level(this)) = obj_level(this);
  ret(~a0) = ff;
  ret(~a1) = ff;
  ret(.~>a0) = tt;
  ret(.~>a1) = tt;
  ret(a0.~>) = ff;
  ret(a0.~>a0) = a0.~>a0;
  ret(a0.~>a1) = a0.~>a1;
  ret(a1.~>) = ff;
  ret(a1.~>a0) = a1.~>a0;
  ret(a1.~>a1) = a1.~>a1;
}
void A:nice_log (C a0, int a1) {
  guard = pc ⊑ ⊥ ∧ level(a0) ⊔ level(a1) ⊔ level(this) ⊔ obj_level(a0) ⊔ obj_level(this) ⊑ ⊥;
  ret(obj_level(a0)) = obj_level(a0);
  ret(obj_level(this)) = obj_level(this);
  ret(a0.~>a0) = a0.~>a0;
}

The resulting summaries are given in Listings 9 and 11, respectively.

Observe for instance how the summary for method C A:nice_allocC (C, C, int) indicates that the reference it returns cannot alias any of its arguments of type C.

/* The following method may mutate any of its arguments with public data */
void A:mutate_arg (C, int) { -<~; @<~⊥ }

/* The following method may mutate any of its arguments with any data it
   recieves, and return anything */
A A:mutate_args (A, C, int) { -<~; @<~@; return * }
void A:mutate_arg (C a0, int a1) {
  guard = tt;
  ret(obj_level(a0)) = pc;
  ret(obj_level(this)) = obj_level(this);
  ret(a0.~>a0) = tt;
}
A A:mutate_args (A a0, C a1, int a2) {
  guard = tt;
  ret(level()) = pc ⊔ level(a0) ⊔ level(a1) ⊔ level(a2) ⊔ level(this) ⊔ obj_level(a0) ⊔ obj_level(a1) ⊔ obj_level(this);
  ret(obj_level()) = pc ⊔ level(a0) ⊔ level(a1) ⊔ level(a2) ⊔ level(this) ⊔ obj_level(a0) ⊔ obj_level(a1) ⊔ obj_level(this) ⊔ (if ret(~a0) then obj_level(a0) else ⊥) ⊔ (if ret(~this) then obj_level(this) else ⊥);
  ret(obj_level(a0)) = pc ⊔ level(a0) ⊔ level(a1) ⊔ level(a2) ⊔ level(this) ⊔ obj_level(a0) ⊔ obj_level(a1) ⊔ obj_level(this);
  ret(obj_level(a1)) = pc ⊔ level(a0) ⊔ level(a1) ⊔ level(a2) ⊔ level(this) ⊔ obj_level(a0) ⊔ obj_level(a1) ⊔ obj_level(this);
  ret(obj_level(this)) = obj_level(this);
  ret(~a0) = tt;
  ret(~this) = tt;
  ret(a1.~>a1) = tt;
}

Supporting Exceptions

Note that users wanting to handle exceptions when processing .meth files must give the corresponding translation flag (with `-tf +exceptions' on the command-line) before feeding Guardies with a set of security stubs (similarly for taints…). Doing so for the stubs of Listing 6 produces default-w-exceptions.secsums.

Acknowledgments

  • The visual presentation of this page is highly based on Solarized CSS.

Author: Nicolas Berthier and Narges Khakpour

Created: 2021-11-04 Thu 17:06