To write meaningful queries, it is useful to get familiar with some terminology used in program analysis since this is used to configure the functions used during the analysis.
Several functions share the following configuration options:
In general, it is possible to differentiate between a Must and May analysis:
The functions dataFlow, alwaysFlowsTo and executionPath receive this configuration via the argument type.
The two objects Must and May allow this configuration.
We provide several options to configure the scope of the analysis. Most importantly, it is necessary to understand the difference between interprocedural and intraprocedural analysis.
maxSteps.
This could be useful if a certain operation should occur in a timely manner after another operation.
However, this also means that the result may miss actions if they occur with a higher distance than the configured maxSteps.maxSteps parameter.
In addition, the analyst can decide to limit the scope of the analysis to a maximal depth of function calls.
This allows the analyst to account for a trade-off between the analysis time and precision/soundness.
In particular, following all possible paths until the end of the control flow is very time-consuming.In addition, InterproceduralWithDfgTermination can be used by alwaysFlowsTo to terminate following the evaluation order if the predicate
can no longer be fulfilled on a path leaving a function.
E.g. if not a single target of the start node’s dataflows is in the scope of the function containing the call-site, it is not promising to keep iterating the EOG from this call-site.
Depending on the use-case, it can be required to follow edges in the direction of the control flow or against it. We account for this difference by providing a configuration option for the direction of the analysis:
All of these options can be configured with the graph that should be followed.
Currently, the options EOG and DFG are available.
!!! note “Note: Implicit dataflows”
If configured with the AnalysisScope `Implicit`, the `DFG` will actually iterate through the program dependence graph (PDG) which includes the control dependence graph (CDG) as well.
In program analysis, we can distinguish between different types of sensitivities. These represent different challenges when following the flow of program execution or data through the program. Some classes which are also considered in our tooling are:
ControlFlowSensitiveDFGPass) are also flow sensitive.
Note that it is possible to disable running this pass (either based on a threshold of cyclomatic complexity of functions ore in general) during translation.
In this case, the dataflow-edges are created by the flow-insensitive DFGPass.ContextSensitive.
E.g., if a function is called by different call-sites,the analysis will jump back to the call-site which led into the function after having processed this function completely.
The analysis generates a call stack when entering/leaving a function during the analysis.
This may be disabled for performance-reasons, in which case, the analysis can explore paths which would never happen during runtime (i.e., returning to a different call-site than the one calling the function).FieldSensitive.
E.g., a field-insensitive analysis may detect a dataflow between a to the field x.c even if a was only assigned to the field x.b.
This will result in detecting more dataflows than what is actually feasible in the program.
Field sensitivity may be disabled for performance reasons.!!! note
It is not possible to compute an efficient solution for a combination of all possible sensitivities in program analysis.
We use the term “sensitivity” to configure other aspects of the analysis as well, i.e., the user can configure:
Implicit.
In this case, the analysis traverses the program dependence graph (PDG) instead of the dataflow graph (DFG).
This allows us to detect implicit dataflows, i.e., possible leakage of data by exploiting different behavior in different branches of control-flow-modifying statements (e.g. loops, if-statements).FilterUnreachableEOG excludes paths from the result which are not reachable on runtime, e.g., because a condition always evaluates to false.
This allows to remove irrelevant results.OnlyFullDFG only follows full DFG edges. In particular, it won’t follow reads from or writes to fields of an object.
While this analysis may be faster as there are fewer paths being explored, it is likely to miss possible results.The sensitivities can be configured by passing the argument sensitivity.
It accepts a variable length of arguments (vararg) which is equivalent to an array.
To simplify constructing the respective typed array, we provide utility functions by overriding the + operator.
Hence, you can configure the functions dataFlow, alwaysFlowsTo and executionPath, you can simply call them with the (named) argument sensitivities = ContextSensitive + FieldSensitive as an example.