codyze-evaluator

Basics of Program Analysis and its Application in the Query API.

To write meaningful queries, it is useful to get familiar with some terminology used in program analysis since this is used to configure the functions used during the analysis.

Several functions share the following configuration options:

AnalysisType

In general, it is possible to differentiate between a Must and May analysis:

Must analysis means that the requirement has to fulfilled on all (possible) paths.
May analysis means that at least one path is required which fulfills the requirement.
Note: If you want to check if certain behavior can never happen, you can express this via the combination of not-May.

The functions dataFlow, alwaysFlowsTo and executionPath receive this configuration via the argument type. The two objects Must and May allow this configuration.

AnalysisScope

We provide several options to configure the scope of the analysis. Most importantly, it is necessary to understand the difference between interprocedural and intraprocedural analysis.

Intraprocedural analysis only considers a single function and does not follow calls to other functions nor does it leave the function body via return statements. This option can be configured by a maximal number of edges which will be followed by the parameter maxSteps. This could be useful if a certain operation should occur in a timely manner after another operation. However, this also means that the result may miss actions if they occur with a higher distance than the configured maxSteps.
Interprocedural analysis, in contrast, follows function calls, thus also analyzing the called functions, and will also leave the function’s scope on return statements. This option can also be configured with the maxSteps parameter. In addition, the analyst can decide to limit the scope of the analysis to a maximal depth of function calls. This allows the analyst to account for a trade-off between the analysis time and precision/soundness. In particular, following all possible paths until the end of the control flow is very time-consuming.

In addition, InterproceduralWithDfgTermination can be used by alwaysFlowsTo to terminate following the evaluation order if the predicate can no longer be fulfilled on a path leaving a function. E.g. if not a single target of the start node’s dataflows is in the scope of the function containing the call-site, it is not promising to keep iterating the EOG from this call-site.

AnalysisDirection

Depending on the use-case, it can be required to follow edges in the direction of the control flow or against it. We account for this difference by providing a configuration option for the direction of the analysis:

Forward analysis follows the order of the control flow (this also means that DFG edges are traversed from the source of information to the target).
Backward analysis follows the edges in the opposite direction.
Bidirectional provides the utility to explore the graph in both direction from a start node.

All of these options can be configured with the graph that should be followed. Currently, the options EOG and DFG are available.

!!! note “Note: Implicit dataflows”

If configured with the AnalysisScope `Implicit`, the `DFG` will actually iterate through the program dependence graph (PDG) which includes the control dependence graph (CDG) as well.

Sensitivities

In program analysis, we can distinguish between different types of sensitivities. These represent different challenges when following the flow of program execution or data through the program. Some classes which are also considered in our tooling are:

Flow sensitivity: A flow-sensitive analysis considers the execution order of statements. Following the execution path is obviously flow sensitive. The dataflow edges (generated by the default pass ControlFlowSensitiveDFGPass) are also flow sensitive. Note that it is possible to disable running this pass (either based on a threshold of cyclomatic complexity of functions ore in general) during translation. In this case, the dataflow-edges are created by the flow-insensitive DFGPass.
Context sensitivity: A context-sensitive analysis distinguishes between the calling contexts of a function/method. It can be configured by adding ContextSensitive. E.g., if a function is called by different call-sites,the analysis will jump back to the call-site which led into the function after having processed this function completely. The analysis generates a call stack when entering/leaving a function during the analysis. This may be disabled for performance-reasons, in which case, the analysis can explore paths which would never happen during runtime (i.e., returning to a different call-site than the one calling the function).
Field sensitivity: Field sensitive analysis distinguishes between the fields of an object. It can be configured by adding FieldSensitive. E.g., a field-insensitive analysis may detect a dataflow between a to the field x.c even if a was only assigned to the field x.b. This will result in detecting more dataflows than what is actually feasible in the program. Field sensitivity may be disabled for performance reasons.

!!! note

It is not possible to compute an efficient solution for a combination of all possible sensitivities in program analysis.

We use the term “sensitivity” to configure other aspects of the analysis as well, i.e., the user can configure:

Following implicit dataflows via Implicit. In this case, the analysis traverses the program dependence graph (PDG) instead of the dataflow graph (DFG). This allows us to detect implicit dataflows, i.e., possible leakage of data by exploiting different behavior in different branches of control-flow-modifying statements (e.g. loops, if-statements).
The option FilterUnreachableEOG excludes paths from the result which are not reachable on runtime, e.g., because a condition always evaluates to false. This allows to remove irrelevant results.
The option OnlyFullDFG only follows full DFG edges. In particular, it won’t follow reads from or writes to fields of an object. While this analysis may be faster as there are fewer paths being explored, it is likely to miss possible results.

The sensitivities can be configured by passing the argument sensitivity. It accepts a variable length of arguments (vararg) which is equivalent to an array. To simplify constructing the respective typed array, we provide utility functions by overriding the + operator. Hence, you can configure the functions dataFlow, alwaysFlowsTo and executionPath, you can simply call them with the (named) argument sensitivities = ContextSensitive + FieldSensitive as an example.