codyze-evaluator

Categorization of Analysis Assumptions

During the analysis of a program, different limitations and problems can appear. Assumptions are necessary to provide any results, but are often not reported as part of the analysis result. This document is an initial idea on how to categorize assumptions, that can be added to the translation of a CPG, and finally collected and reported along with the results of a query.

The purpose of categorizing assumptions is to reduce the mental load on the user who has to work with the analysis results and contained assumptions. Instead of unrelated assumption messages in a general assumption object, assumption categories allow a user to group assumptions when working with the results and make a quicker decision on the reliability of results based on the types of assumptions that are reported.

Assumptions are Added …

Assumption Node Placement

Assumptions are added as overlay nodes connected to a graph node.

Developers can Manually add Assumptions

Assumption Collection

Assumptions placed at CPG nodes are collected during evaluations that return a QueryTree object and placed in the QueryTree object. Assumptions are collected from nodes that are visited by the query tree evaluation or are attached to the AST-Ancestor of a visited node. Global assumptions are always included and summarized in the final result.

When a QueryTree is returned as a result, it is printed into a SARIF output. The assumptions are placed in the same SARIF output for later printing to the end user. Assumptions are placed as attachment objects to the SARIF output.

Assumption Categories

Assumptions on Analysis Completeness and Code Availability

InferenceAssumption, ClosedMacroAssumption, UnsupportedLanguageProblem, MissingCodeProblem, …

Examples:

Assumptions on Language Semantics and Syntactic Correctness

AmbiguityAssumption, …

Examples:

Assumptions on Program Semantics

ConceptPlacementAssumption, ExhaustiveEnumerationAssumption, …

An assumption that we correctly captured a program semantic, e.g. logging of data, crossing system boundaries, file operations.

Assumptions on Soundness and Completeness

CompletenessAssumption, SoundnessAssumption, …

Examples:

Assumptions on Data and Control Flow Approximations

CFIntegrityAssumption, NoExceptionsAssumption, CFAllOrNothingExecutesAssumption, TrustedConfigAssumption, ExternalDataAssumption, …

Examples:

Assumptions on Runtime Preconditions

NetworkAvailableAssumption, ResourceExistsAssumption, ServiceReachableAssumption, …

Examples:

Assumptions on Sequentiality under Parallel Execution

AtomicExecutionAssumption, …

Examples:

Assumptions on Input Data

TrustBoundaryAssumption, DataRangeAssumption, TrustedInputAssumption, …

Examples:

Problems and Limitations

Problems and limitations during analysis influence how trustworthy results of an evaluation can be. As such they should be reported to the user when they influence the queries. This is the same motivation we have with assumptions, and we therefore plan to make them part of the same feature. However, strictly speaking they are different from assumptions.

Two solutions are possible in the categorization sense:

  1. Problems and limitations are on the same level as assumptions: Categories:

    • Problems
    • Limitations
    • Assumptions
      • MissingCodeAssumptions
      • SyntacticCorrectnessAssumption
  2. Or can we restate problems and limitations as assumptions?

    • Assumptions
      • CanNotTranslateProblem as assumption: We assume that the remainder of the query is not impacted by this problem.