codyze-evaluator

How do I write a query?

A query is a piece of code which retrieves information from the CPG (or a part of it) to assess if a statement holds or not.

The main sources for the implementation can be found in the files Query.kt, FlowQueries.kt and QueryTree.kt in the module cpg-analysis.

Where to add Queries

Queries can be added everywhere in the evaluation project. However, we recommend to add them to the queries folder to simplify finding them. They must be included in a kotlin file (file ending .kt) which is included in the sources of the project. Finally, you have to call the queries in the requirements block, or in a query called therein, of the evaluation project script, or they won’t be assessed.

Starting point: Choosing between allExtended and existsExtended

You would typically start writing a query by using one of the two functions allExtended or existsExtended. If a certain property should be fulfilled at least once in the whole codebase, you can use existsExtended, while allExtended serves to check if the property is fulfilled for all nodes (which you select). Both functions receive the following arguments:

Few notes on Kotlin:

Flow-based functions of the Query API

Currently, following four functions of the Query API can be used to reason about the flow of data or control in the program: dataFlow, dataFlowWithValidator, executionPath and alwaysFlowsTo. A detailed explanation of the parameters direction, type, sensitivities and scope which configure these functions is provided in program-analysis-basics.md.

dataFlow

=== "Signature" ```kotlin fun dataFlow( startNode: Node, direction: AnalysisDirection = Forward(GraphToFollow.DFG), type: AnalysisType = May, vararg sensitivities: AnalysisSensitivity = FieldSensitive + ContextSensitive, scope: AnalysisScope = Interprocedural(), earlyTermination: ((Node) -> Boolean)? = null, predicate: (Node) -> Boolean, ): QueryTree ``` === "Goal of the function" Follows the `Dataflow` edges from `startNode` in the given `direction` until reaching a node fulfilling `predicate`. The interpretation of the analysis result can be configured as must or may analysis by setting the `type` parameter. Note that this function only reasons about existing DFG paths, and it might not be sufficient if you actually want a guarantee that some action always happens with the data. In this case, you may need to check the `executionPath` or `alwaysFlowsTo`. === "Parameters" * `startNode`: The node from which the data flow should be followed. * `direction`: See the [explanation of class `AnalysisDirection`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysisdirection) * `type`: See the [explanation of class `AnalysisType`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysistype) * `sensitivities`: See the [explanation of Sensitivities](/codyze-evaluator/documentation/docs/program-analysis-basics.html#sensitivities) * `scope`: See the [explanation of class `AnalysisScope`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysisscope) * `earlyTermination`: If applying this function to a `Node` returns `true` before a node fulfilling `predicate`, the analysis/traversal of this path will stop and return `false`. If `null` is provided, the analysis will not stop. * `predicate`: This function marks the desired end of a dataflow path. If this function returns `true`, the analysis/traversal of this path will stop. </div> ### `executionPath`
=== "Signature" ```kotlin fun executionPath( startNode: Node, direction: AnalysisDirection = Forward(GraphToFollow.EOG), type: AnalysisType = May, scope: AnalysisScope = Interprocedural(), earlyTermination: ((Node) -> Boolean)? = null, predicate: (Node) -> Boolean, ): QueryTree ``` === "Goal of the function" Follows the Evaluation Order Graph (EOG) edges from startNode in the given direction until reaching a node fulfilling predicate. The interpretation of the analysis result can be configured as must or may analysis by setting the type parameter. This function reasons about execution paths in the program and can be used to determine whether a specific action or condition is reachable during execution. === "Parameters" * `startNode`: The node from which the execution path should be followed. * `direction`: See the [explanation of class `AnalysisDirection`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysisdirection) * `type`: See the [explanation of class `AnalysisType`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysistype) * `sensitivities`: See the [explanation of Sensitivities](/codyze-evaluator/documentation/docs/program-analysis-basics.html#sensitivities) * `scope`: See the [explanation of class `AnalysisScope`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysisscope) * `earlyTermination`: If applying this function to a `Node` returns `true` before a node fulfilling `predicate`, the analysis/traversal of this path will stop and return `false`. If `null` is provided, the analysis will not stop. * `predicate`: This function marks the desired end of an execution path. If this function returns `true`, the analysis/traversal of this path will stop. </div> ### `alwaysFlowsTo`
=== "Signature" ```kotlin fun Node.alwaysFlowsTo( allowOverwritingValue: Boolean = false, earlyTermination: ((Node) -> Boolean)? = null, identifyCopies: Boolean = true, stopIfImpossible: Boolean = true, scope: AnalysisScope, vararg sensitivities: AnalysisSensitivity = ContextSensitive + FieldSensitive + FilterUnreachableEOG, predicate: (Node) -> Boolean, ): QueryTree ``` === "Goal of the function" Checks that the data originating from `this` reach a sink (fulfilling `predicate`) on each execution path. === "Parameters" * `allowOverwritingValue`: If set to `true`, the value of a variable can be changed before reaching `predicate` but the function would still return `true`. If set to `false`, overwriting the value held in `this` before reaching a sink specified by `predicate` will lead to a `false` result. * `identifyCopies`: If set to `true`, the query will aim to figure out if the dataflow of one object is copied to another object which requires separate tracking of the instances (e.g. if they require separate deletion, or other clearing/validating actions). * `stopIfImpossible`: If set to `true`, the analysis will stop if it is impossible to reach a node fulfilling `predicate`. This is useful for performance reasons, as it avoids unnecessary iterations. In particular, it checks if any dataflow exists to any node which is in scope of a function containing the call-site of a function declaration, we stop iterating. * `earlyTermination`: If applying this function to a `Node` returns `true` before a node fulfilling `predicate`, the analysis/traversal of this path will stop and return `false`. If `null` is provided, the analysis will not stop. * `scope`: See the [explanation of class `AnalysisScope`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysisscope) * `sensitivities`: See the [explanation of Sensitivities](/codyze-evaluator/documentation/docs/program-analysis-basics.html#sensitivities) * `predicate`: This function marks the desired end of a combined dataflow and execution path. If this function returns `true`, the analysis/traversal of this path will stop. </div> ### `dataFlowWithValidator`
=== "Signature" ```kotlin fun dataFlowWithValidator( source: Node, validatorPredicate: (Node) -> Boolean, sinkPredicate: (Node) -> Boolean, scope: AnalysisScope, vararg sensitivities: AnalysisSensitivity, ): QueryTree ``` === "Goal of the function" Checks that the data originating from `source` are validated by a validator (fulfilling `validatorPredicate`) on each execution path before reaching a sink marked by `sinkPredicate`. This function always runs a forward analysis and combines the DFG and EOG. === "Parameters" * `source`: The node from which the dataflow path should be followed. * `validatorPredicate`: If applying this function to a `Node` returns `true` before a node fulfilling `predicate`, the analysis/traversal of this path will stop and return `true`. * `sinkPredicate`: This function marks a sink, where it is not permitted to reach the sink without always passing through a validator characterized by `validatorPredicate`. If this function returns `true`, the analysis/traversal of this path will stop and return `false`. * `scope`: See the [explanation of class `AnalysisScope`](/codyze-evaluator/documentation/docs/program-analysis-basics.html#analysisscope) * `sensitivities`: See the [explanation of Sensitivities](/codyze-evaluator/documentation/docs/program-analysis-basics.html#sensitivities) </div> ## The class `QueryTree` The class `de.fraunhofer.aisec.cpg.query.QueryTree` serves as a wrapper around the result of a query and the sub-statements which were used to retrieve the result. It is parametrized with a type `T` which is the type of the field `value`. Fields: * `value`: The result of the query, which is of type `T`. Frequent values are boolean values which determine if the query was successful or not, lists of nodes which were found by the query (i.e., representing paths), or other values representing a possible value of a variable. * `children`: A list of sub-queries which were evaluated to retrieve the result in `value`. * `stringRepresentation`: A human-readable string representation of the query's result. It typically includes the operation/function which was performed, its inputs and the computed result which is now held in `value`. * `assumptions`: A list of assumptions which were collected on evaluation. Numerous methods allow to evaluate the queries while keeping track of all the steps. Currently, the following operations are supported: * **eq**: Equality of two values. * **ne**: Inequality of two values. * **IN**: Checks if a value is contained in a [Collection] * **IS**: Checks if a value implements a type ([Class]). Additionally, some functions are available only for certain types of values. For boolean values: * **and**: Logical and operation (&&) * **or**: Logical or operation (||) * **xor**: Logical exclusive or operation (xor) * **implies**: Logical implication For numeric values: * **gt**: Grater than (>) * **ge**: Grater than or equal (>=) * **lt**: Less than (<) * **le**: Less than or equal (<=) The top-level result of any query must be a `QueryTree`. However, it is sometimes necessary to aggregate the results of multiple sub-queries with methods of the kotlin standard library (such as `map` for collections), which does not return a `QueryTree` itself. In this case, you can simply create a `QueryTree` by calling the constructor. It may be handy to implement some helper functions for frequent purposes as part of the Codyze Evaluator and import these extensions in the query scripts.