Source-to-source Encoding Examples Audit

Context: 检查当前 OCaml example artifacts 是否贴近论文 4.4 source-to-source translation 的描述形式:

  • 先由 static pass 得到 internal definitions;
  • 再用 with type ... / substitution 创建 per-component internal view;
  • recursive module bodies M1, M2 本身不改,只是在新的 checking environment 下检查;
  • deep abstract components 不自动 refine;
  • cyclic top-level views 需要 recursive types / equi-recursive target。

Related files:

  • Paper section: /Users/ltzhou/GIT/modrec/paper/manual.tex, sec:source-to-source around lines 4723ff.
  • OCaml examples directory: /Users/ltzhou/GIT/modrec/paper/ocaml/.
  • Source TreeForest example: /Users/ltzhou/GIT/modrec/examples/cyclic/treeforest.ml.

Summary

当前 examples 混用了几种 encoding 风格:

  1. Direct with on concrete RHS:

    • Ext with type A.t = int
    • used by example41.ml, example42.ml, deep_abstract_encoding_only.ml.
  2. Hand-expanded partial internal signatures:

    • e.g. type forest; type tree = int * forest
    • closer to per-component partial view, but not paper-style with-based.
  3. View module + with:

    • e.g. module View = struct type tree = int * forest and forest = tree * forest end
    • then Ext with type tree = View.tree and type forest = View.forest.
    • used by current TreeForest encoding.

Recommended convention for paper-facing examples:

  1. module type Ext: anchored external recursive signature.
  2. module View / View_i: internal definitions discovered by static pass.
  3. module type Int_i = Ext with type P_i = View...: internal view for component i.
  4. module F_i (X : Int_i) = struct open X; M_i end.
  5. Explicit check/comment that M_i body is unchanged after the single environment line open X.
  6. Final external re-sealing.
  7. If OCaml surface syntax forces a workaround, mark it as an OCaml artifact, not part of the formal translation.

1. example42.ml — Double Vision example

Path:

/Users/ltzhou/GIT/modrec/paper/ocaml/example42.ml

Current status:

  • It contains the main two-module double-vision example.
  • Ordinary ocamlc -c example42.ml fails early at the intended bad/failing block unless the fail blocks are removed/commented for isolated compilation.

Mismatches / issues:

1.1 F / G bodies are not presented in the strongest “body unchanged” form

Current encoding uses:

module F (X : IntA) = struct
  module A = X.A
  module B = X.B
  ... original A body ...
end
 
module G (X : IntB) = struct
  module A = X.A
  module B = X.B
  ... original B body ...
end

This is close to the current manual pseudocode:

let module X1 = X.X_1 in let module X2 = X.X_2 in M_1

But it weakens the invariant we now want to emphasize:

M1 / M2 definitions are preserved unchanged; only the checking signature/environment changes.

Preferred paper-facing artifact:

module F (X : IntA) = struct
  open X
  ... original A body byte-identical ...
end
 
module G (X : IntB) = struct
  open X
  ... original B body byte-identical ...
end

Then a textual identity check can verify that the bodies are unchanged after the open X line.

1.2 No explicit View module / internal definitions

Current style:

module type IntA = Ext with type A.t = int
module type IntB = Ext with type A.u = bool
module type Int = Ext with type A.t = int and type A.u = bool

This is fine for a simple non-recursive example, but less close to the paper phrase “internal definitions + with substitution”.

More paper-style:

module View = struct
  type a_t = int
  type a_u = bool
end
 
module type IntA = Ext with type A.t = View.a_t
module type IntB = Ext with type A.u = View.a_u
module type Int = Ext
  with type A.t = View.a_t
   and type A.u = View.a_u

This is more verbose but aligns with the TreeForest View-based form.

1.3 Anchor path orientation may be non-obvious

Ext currently has:

module B : sig type t = A.t type u = A.u val g : t -> u * t end

So the view for B.u = bool is expressed as:

Ext with type A.u = bool

This is defensible: A.u is the chosen anchor path for the shared u type. But it can read strangely because the intuitive sentence is “B sees its own B.u = bool.”

Potential fix:

  • Add a comment that Ext is an anchored signature and the shared u is anchored at A.u; or
  • choose an orientation where the displayed with path better matches the narrative.

2. example41.ml — one-module degenerate case

Path:

/Users/ltzhou/GIT/modrec/paper/ocaml/example41.ml

Current source:

module OCaml = struct
  module rec A : sig type t type u = t end
  = struct type t = bool type u = A.t end
end

Mismatches / issues:

2.1 Degenerate one-component case

The current source-to-source section in manual.tex is phrased for two mutually recursive modules:

module rec X1 : S1 = M1 and X2 : S2 = M2

example41.ml is a useful sanity check, but it is not a literal instance of that two-component presentation. It should be marked as a one-module degenerate case.

2.2 Encoding body adds an extra value not present in source

Current F body includes:

let x : u = true

This is not in the original module body. It is a useful witness that the internal view is available, but it violates the strict invariant:

the module definition body is unchanged.

Potential fix:

  • Remove let x : u = true from the translation artifact; or
  • Move it to a separate witness/test module and label it as not part of the translation.

2.3 No explicit View module

Current style:

module type Int = Ext with type t = bool

This is acceptable for a tiny example, but not maximally close to the paper form. A View module would make it more uniform:

module View = struct type t = bool end
module type Int = Ext with type t = View.t

2.4 Parameter shape differs from the general translation

Current:

module F (A : Int) = struct ... end

General translation uses an environment parameter, roughly X : Int, containing components. This is okay for a degenerate one-module case, but should be noted.

3. Deep Abstraction examples

Paths:

  • /Users/ltzhou/GIT/modrec/paper/ocaml/deep_abstract_source_encoding_test.ml
  • /Users/ltzhou/GIT/modrec/paper/ocaml/deep_abstract_encoding_only.ml

Current strengths:

  • Already uses the better body-preservation shape:
module F (X : IntA) = struct
  open X
  ... original A body ...
end
 
module G (X : IntB) = struct
  open X
  ... original B body ...
end
  • Textual check showed:
A body unchanged after environment line: True
B body unchanged after environment line: True
  • It demonstrates the intended deep-abstraction point:
    • top-level internal views are added;
    • nested/deep abstract types such as Box.s and Pack.s are not refined;
    • they remain usable abstractly.

Mismatches / issues:

3.1 No explicit View module

Current:

module type IntA = Ext with type A.t = int
module type IntB = Ext with type A.u = bool
module type Int = Ext with type A.t = int and type A.u = bool

More paper-style:

module View = struct
  type a_t = int
  type a_u = bool
end
 
module type IntA = Ext with type A.t = View.a_t
module type IntB = Ext with type A.u = View.a_u
module type Int = Ext
  with type A.t = View.a_t
   and type A.u = View.a_u

3.2 Ext is an anchored version, not a literal copy of source signatures

Original direct source has:

A : sig
  type u = B.u
  type t
  ...
end
 
B : sig
  type t = A.t
  type u
  ...
end

Encoding Ext uses:

A : sig
  type t
  type u
  ...
end
 
B : sig
  type t = A.t
  type u = A.u
  ...
end

This chooses A.u as the anchor for the shared u type. This is likely fine under the paper’s “anchorable source signature” condition, but it should be explicitly documented.

Suggested comment:

Ext is an anchored version of the recursive external signatures; the shared abstract u is anchored at A.u.

3.3 The combined test file intentionally fails as a whole

deep_abstract_source_encoding_test.ml contains a failing direct block, so the whole file is not meant to compile with ordinary ocamlc -c.

The compile-only artifact is:

deep_abstract_encoding_only.ml

If cited as evidence, cite the encoding-only file or state that the source test file contains expected-failure blocks.

4. TreeForest cyclic type example

Canonical path:

/Users/ltzhou/GIT/modrec/paper/ocaml/treeforest_double_vision_encoding_only.ml

Duplicate / deprecated path currently also exists:

/Users/ltzhou/GIT/modrec/paper/ocaml/treeforest_double_vision_with_encoding_only.ml

Source path:

/Users/ltzhou/GIT/modrec/examples/cyclic/treeforest.ml

Current strengths:

  • It is now paper-style with-based:
module View = struct
  type tree = int * forest
  and forest = tree * forest
end
 
module type Int = Ext
  with type tree = View.tree
   and type forest = View.forest
  • It preserves module bodies via open X.
  • Textual identity check showed:
Tree body unchanged after environment line: True
Forest body unchanged after environment line: True
  • Compile check:
ocamlc -rectypes -c /Users/ltzhou/GIT/modrec/paper/ocaml/treeforest_double_vision_encoding_only.ml

succeeds.

  • Ordinary ocamlc rejects the cyclic abbreviation in module View, as expected.

Mismatches / issues:

4.1 Top-level aliases tree / forest are an OCaml anchoring workaround

Current Ext:

module type Ext = sig
  type tree
  type forest
  module Tree : sig
    type t = tree
    val children : t -> forest
    ...
  end
  module Forest : sig
    type u = forest
    val child : u -> tree
    ...
  end
end

Original source shape is closer to:

module rec Tree : sig
  type t
  val children : t -> Forest.u
end
and Forest : sig
  type u
  val child : u -> Tree.t
end

OCaml ordinary module type Ext = sig ... end cannot forward-reference Forest from inside the earlier Tree signature. Therefore the top-level aliases tree and forest are an OCaml surface-language encoding artifact.

Suggested note:

The top-level aliases tree and forest are an OCaml artifact for anchoring mutually recursive abstract components in a non-recursive module type; they are not an extra transformation of the formal source calculus.

4.2 Current with-based version appears to over-refine per-component views

Current:

module type IntTree = Ext
  with type tree = View.tree
   and type forest = View.forest
 
module type IntForest = Ext
  with type tree = View.tree
   and type forest = View.forest

This gives both F(Tree) and G(Forest) the full cyclic view.

But the prose around source-to-source translation says each module should see its own internal definitions while keeping the other module’s external view. A more literal partial-view version would be:

module type IntTree = sig
  type forest
  type tree = int * forest
  ...
end
 
module type IntForest = sig
  type tree
  type forest = tree * forest
  ...
end

That partial-view shape is closer to the intended checking discipline, but it is not expressed using Ext with ... in the current OCaml artifact.

This is the most important TreeForest mismatch.

Possible resolutions:

  1. Accept the hand-expanded partial signatures for the OCaml artifact and note that OCaml cannot express this partial anchored refinement directly with simple with syntax.
  2. Keep the View + with version as a cyclic-type stress test, but state that it over-refines the per-component views compared with the ideal paper translation.
  3. Refine the paper explanation if full internal view is intended for mutually dependent cyclic definitions.
  4. Design a better with-based partial-view encoding, if possible.

4.3 Direct with type Tree.t = int * Forest.u is not accepted by OCaml

Attempting:

module type IntTree = Ext with type Tree.t = int * Forest.u

fails in OCaml because the RHS cannot directly refer to Forest.u as a sibling path in that position:

Error: Unbound module Forest

This is another OCaml surface-language limitation. The View module is a workaround.

4.4 Duplicate file may confuse future checks

Both files currently exist:

  • canonical: treeforest_double_vision_encoding_only.ml
  • duplicate/deprecated: treeforest_double_vision_with_encoding_only.ml

The canonical file should be used. The duplicate should be deleted or clearly marked deprecated.

5. Overall recommendation

Most urgent cleanup tasks:

  1. example42.ml

    • Replace module A = X.A; module B = X.B with open X.
    • Optionally introduce module View.
    • Add a note about anchor path choice for shared u.
  2. example41.ml

    • Remove or isolate let x : u = true.
    • Mark as degenerate one-module case.
    • Optionally introduce module View.
  3. Deep abstraction examples

    • Optionally introduce module View for uniformity.
    • Add a comment that Ext is an anchored version of the original recursive signatures, with A.u chosen as the anchor for shared u.
  4. TreeForest

    • Decide whether the paper-facing artifact should prioritize:
      • with-based shape, or
      • precise per-component partial views.
    • If keeping current View + with version, explicitly mark the full-view issue.
    • Remove or deprecate the duplicate treeforest_double_vision_with_encoding_only.ml.

One-sentence takeaway

The examples mostly support the intended story, but they are not yet uniformly in the paper’s canonical source-to-source form: example41 adds a non-source witness, example42 still uses alias declarations rather than strict body preservation, Deep Abstraction needs an explicit anchoring note, and TreeForest’s current with-based OCaml artifact likely over-refines per-component views while using top-level aliases as an OCaml anchoring workaround.