OCaml type checking recursive modules

https://github.com/ocaml/ocaml/blob/206e408bf3f1dfd156e160d8d3b11f7f7c14af06/typing/typemod.ml#L2921-L3006

The Life of a Recursive Module: A Deep Dive into the OCaml Type-Checker

Introduction

Mutually recursive modules, defined with module rec ... and ..., are a powerful feature in OCaml, allowing for the clean separation of concepts even when their implementations are intertwined. For a type-checker, however, this construct presents a classic “chicken and egg” problem: to know the type of module A, you must check its body, but its body may depend on module B, whose type is also unknown until its body is checked.

OCaml solves this by enforcing a simple but powerful rule: all modules in a recursive group must have explicit signature annotations. This signature acts as a contract or a “forward declaration,” breaking the circular dependency.

This document provides a comprehensive analysis of the exact code block in typing/typemod.ml that handles this process, with the original code spelled out and annotated with inline comments at each step.

The Core Logic: Annotated Breakdown

When the OCaml type-checker processes a source file, it walks the Abstract Syntax Tree (AST) generated by the parser. When it encounters a recursive module definition (Pstr_recmodule), it executes the following logic.

Step 1: Validation - The Gatekeeper

The very first action the compiler takes is to protect itself and enforce the core contract of recursive modules. It iterates through every module in the group and ensures each one has an explicit signature. If not, compilation stops.

(* Match on the `Pstr_recmodule` node from the parser's AST. 
   `sbind` is the list of module bindings from the source code. *)
| Pstr_recmodule sbind ->
    (* Remap the list of module bindings (`sbind`) to a more convenient format.
       This also serves as a validation step. *)
    let sbind =
      List.map
        (function
          (* For each module, check if its expression is a `Pmod_constraint`.
             This corresponds to the `(module_body : module_type)` syntax. *)
          | {pmb_name = name;
             pmb_expr = {pmod_desc=Pmod_constraint(expr, typ)};
             pmb_attributes = attrs;
             pmb_loc = loc;
            } ->
              (* If it has a signature, extract the relevant parts into a tuple:
                 (name, signature, body, attributes, location) *)
              name, typ, expr, attrs, loc
 
          (* If the module binding does NOT have a signature constraint... *)
          | mb ->
              (* ...it's an error. Raise it immediately. *)
              raise (Error (mb.pmb_expr.pmod_loc, env,
                            Recursive_module_require_explicit_type))
        )
        sbind
    in

Step 2: The First Pass - Iterative Type Resolution

This is the crucial first pass. It calls the [[OCaml type checking recursive modules#transl_recmodule_modtypes-the-fixed-point-solver|transl_recmodule_modtypes (The Fixed-Point Solver)]]transl_recmodule_modtypes function, which is responsible for resolving the web of interdependent module types. It does this through an iterative refinement process (a fixed-point computation), returning the final, stable declarations and an environment where they are all defined.

    (* `sdecls` is the list of module declarations from the parser. *)
    let (decls, newenv) =
      (* This function is the heart of the first pass. It takes the current environment
         and the list of parsed module declarations. *)
      transl_recmodule_modtypes env
        (* We first re-map the `sbind` list from Step 1 into the format that
           `transl_recmodule_modtypes` expects: a list of records containing
           the module's name and its declared signature (`pmd_type`). *)
        (List.map (fun (name, smty, _smodl, attrs, loc) ->
             {pmd_name=name; pmd_type=smty;
              pmd_attributes=attrs; pmd_loc=loc}) sbind
        )
    in
 
    (* As a sanity check, iterate through the newly created declarations (`decls`). *)
    List.iter
      (fun (md, _, _) ->
         (* For each module, check if its name has already been used in a way that
            would conflict with this signature. `Signature_names` tracks this. *)
         Option.iter Signature_names.(check_module names md.md_loc) md.md_id
      ) decls;

Step 3: The Second Pass - The Implementation Auditor

Now that the signatures are fully resolved, the second pass checks the module bodies. It ensures that the implementation (the “reality”) for each module actually matches the signature it declared (the “promise”).

    (* This is the main second pass. `List.map2` combines the resolved declarations
       from the first pass (`decls`) with the original module implementations (`sbind`). *)
    let bindings1 =
      List.map2
        (fun ({md_id=id; md_type=mty}, uid, _prev_shape) 
            (* From `decls` -- the fixed-point resolved one from step 2 *)
             (name, _, smodl, attrs, loc) ->             (* From `sbind` *)
 
(* 1. TYPE-CHECK THE BODY *)
(* `type_module` is called to type-check the actual implementation (`smodl`).
 Crucially, it uses `newenv`, the environment where all sibling modules
 are already defined, thus resolving the recursion. *)
           let modl, shape =
             Builtin_attributes.warning_scope attrs
               (fun () ->
                  type_module ~strengthen:true ~funct_body
                    (anchor_recmodule id) newenv smodl
               )
           in
 
(* 2. ENRICH THE SIGNATURE *)
(* The inferred type of the body (`modl.mod_type`) might be more specific
  than the declared signature. `enrich_module_type` merges this new
  information into the declared type. *)
           let mty' =
             enrich_module_type anchor name.txt modl.mod_type newenv
           in
 
(* 3. VERIFY CONSISTENCY *)
(* This is the moment of truth. It checks if the actual, enriched type of
  the body (`mty'`) is a valid subtype of the promised signature (`mty.mty_type`).
  If the implementation doesn't fulfill the contract, this raises an error. *)
           Includemod.modtypes_consistency ~loc:modl.mod_loc newenv
            mty' mty.mty_type;
 
           (* Return a tuple containing all the computed information for this module. *)
           (id, name, mty, modl, mty', attrs, loc, shape, uid))
        decls sbind in

Step 4: Finalization - The Public Record

After successfully checking all modules, the compiler constructs the final data structures: the Typedtree node for the recursive block and the final environment to be used for checking the rest of the file.

    (* Create the final environment for the code that FOLLOWS the `module rec` block. *)
    let newenv = (* The comment notes this allows aliasing from outside. *)
      List.fold_left
        (fun env (id_opt, _, mty, _, _, attrs, loc, shape, uid) ->
           match id_opt with
           | None -> env
           | Some id ->
               (* Create the final module declaration record. *)
               let mdecl =
                 {
                   md_type = mty.mty_type;
                   md_attributes = attrs;
                   md_loc = loc;
                   md_uid = uid;
                 }
               in
               (* Add the final, validated module declaration to the environment. *)
               Env.add_module_declaration ~check:true ~shape
                 id Mp_present mdecl env
        )
        env bindings1
    in
 
    (* Handle the tricky edge case of `include` statements within the rec block. *)
    let bindings2 =
      check_recmodule_inclusion newenv bindings1 in
 
    (* Extract a list of module bindings with their IDs and shapes. *)
    let mbs =
      List.filter_map (fun (mb, shape, uid) ->
        Option.map (fun id -> id, mb, uid, shape)  mb.mb_id
      ) bindings2
    in
    (* Update the compiler's internal `shape_map` with the new modules. *)
    let shape_map =
      List.fold_left (fun map (id, _mb, _uid, shape) ->
        Shape.Map.add_module map id shape
      ) shape_map mbs
    in
 
    (* CONSTRUCT THE FINAL RESULT *)
    (* The function returns a tuple. *)
    (
      (* 1. The Typed AST node: `Tstr_recmodule` containing the typed bindings. *)
      Tstr_recmodule (List.map (fun (mb, _, _) -> mb) bindings2),
 
      (* 2. The signature for this structure item, for creating `.cmi` files. *)
      map_rec (fun rs (id, mb, uid, _shape) ->
          Sig_module(id, Mp_present, {
              md_type=mb.mb_expr.mod_type;
              md_attributes=mb.mb_attributes;
              md_loc=mb.mb_loc;
              md_uid = uid;
            }, rs, Exported))
         mbs [],
 
      (* 3. The updated shape map. *)
      shape_map,
 
      (* 4. The final environment to be used for checking the rest of the file. *)
      newenv
    )

A Closer Look at the Auxiliary Functions

transl_recmodule_modtypes (The Fixed-Point Solver)

This function is the heart of the first pass. Its goal is to take a list of interdependent signatures and compute a “fixed point”—the stable, fully-resolved set of module types where all dependencies have been accounted for. It does this through iterative refinement.

and transl_recmodule_modtypes env sdecls =
  (* Helper to add a set of declarations to an environment *)
  let make_env curr = ... in
 
  (* Helper to translate signatures using a given environment *)
  let transition env_c curr = ... transl_modtype env_c pmd.pmd_type ... in
 
  (* ... *)
 
  (* Phase 1: Initialization and Approximation *)
  let init =
    List.map2
      (fun id pmd ->
         let md =
           { md_type = approx_modtype ...; ... }
         in
         ...
      ) ids sdecls
  in
 
  (* Phase 2: Iteration 1 *)
  let env0 = make_env init in
  let dcl1 = Warnings.without_warnings (fun () -> transition env0 init) in
 
  (* Phase 3: Iteration 2 (Refinement) *)
  let env1 = make_env dcl1 in
  check_recmod_typedecls env1 (map_mtys dcl1);
  let dcl2 = transition env1 dcl1 in
 
  (* Phase 4: Finalization *)
  let env2 = make_env dcl2 in
  check_recmod_typedecls env2 (map_mtys dcl2);
  ...
  (dcl2, env2)
  1. Phase 1: Initialization and Approximation. The process can’t start with an empty environment. It first creates an init set of declarations. For each module, it computes an approx_modtype. This is a “best-effort” first guess at the module’s type, created in an environment where accessing any of the other recursive modules is explicitly forbidden. This bootstraps the process with a rough-but-safe starting point.

  2. Phase 2: Iteration 1.

    • An environment env0 is created containing the approximate types from init.
    • The transition function is called. It re-translates every signature, but this time using env0. Now, when translating A’s signature, it can see the approximate type of B. This allows it to resolve simple, non-nested dependencies. The result is a more refined set of declarations, dcl1.
  3. Phase 3: Iteration 2 (Refinement).

    • A new environment env1 is created from the more-refined dcl1.
    • transition is called again. This is the key to the fixed-point computation. Using env1, it can now resolve even more complex dependencies that required knowledge from the first iteration. For example, if A depends on a type T inside B, and B’s definition of T depended on C, the first iteration might resolve B.T and the second iteration can then use that to resolve A.
    • Before and after this iteration, check_recmod_typedecls is called to ensure that the type definitions within the signatures are not ill-formed (e.g., cyclic type abbreviations without rec).
  4. Phase 4: Finalization. The result of the second iteration, dcl2, is considered the stable fixed point. A final environment env2 is created, and the function returns the final declarations and this complete environment, ready for the second pass (the implementation check).

enrich_module_type (The Detail Enhancer)

When a module body is checked, the compiler gains more information than what was in the signature. For instance, an abstract type t might be revealed to be string. enrich_module_type takes the original signature and the inferred type of the body and merges them. It “strengthens” the original signature by filling in these newly discovered details, making the module type as precise as possible.

Includemod.modtypes_consistency (The Contract Verifier)

This function is the strict contract verifier. It answers the question: “Does the module’s implementation actually fulfill the contract laid out in its signature?” It performs a sophisticated subtyping check. The implementation is allowed to be more specific or provide more components than the signature requires, but it cannot provide less. This ensures that any code relying on the signature will be safe.

check_recmodule_inclusion (The Specialist for Inclusions)

This function handles the difficult case where one recursive module includes another (e.g., module rec A = struct ... end and B = struct include A ... end). A simple textual inclusion is not possible because the types are interdependent and still being computed. This function performs the necessary substitutions and checks to expand the include statement correctly in this recursive context, preventing inconsistencies or non-terminating type definitions.

Conclusion

The OCaml compiler’s handling of recursive modules is a masterclass in solving circular dependencies. It combines a strict prerequisite (explicit signatures) with a two-pass compilation strategy:

  1. Resolve Signatures: Use an iterative fixed-point computation (transl_recmodule_modtypes) to fully resolve the interdependent signatures.
  2. Check & Verify: Type-check each implementation using the resolved signatures, and then rigorously verify that the implementation matches its declared contract.

This process, orchestrated by the code in typemod.ml and its helpers, ensures that this powerful language feature remains completely type-safe.