A Categorical Approach to Automata Learning and Minimization

Author: Daniela Petrişan Institution: Université Paris Cité, IRIF, France Event: EPIT’25, Aussois Date: 20 May 2025 Source: Petrişan EPIT’25 Slides

References

T. Colcombet and D. Petrişan, Automata minimization: a functorial approach. Log. Methods Comput. Sci., 16(1), 2020
T. Colcombet, D. Petrisan, R. Stabile, Learning Automata and Transducers: A Categorical Approach. CSL 2021
J. E. Pin (Ed.) Handbook of Automata Theory, EMS Press, 2021
Further reading: Q. Aristote, S. van Gool, D. Petrişan, M. Shirmohammadi, Learning Weighted Automata over Number Rings, Concretely and Categorically. LICS 2025 (arXiv:2504.16596)

Tutorial Overview

This tutorial focuses on the interplay between category theory and automata theory. The categorical approach aims to:

Provide a unifying framework for modelling various forms of automata.
Obtain generic algorithms for learning.
Highlight the link between automata learning and minimization.

Automata with Effects: A Categorical View

We can redefine various automata types by specifying a set of states Q and functions/arrows in a suitable category. The general structure involves an initial map, transition maps for each alphabet symbol, and a final map.

Complete Deterministic Finite Automata (cDFA)
- Given a set Q, an initial state $q_{0} : 1 \to Q$ , transition functions $δ_{σ} : Q \to Q$ for $σ \in A$ , and a final state map $χ_{F} : Q \to 2$ (where 2 is a two-element set for accept/reject).
- Category: Set (sets and total functions).
- Diagram: $1 q_{0} Q δ_{σ} Q χ_{F} 2$ .
Deterministic Finite Automata (DFA) (possibly incomplete)
- Given a set Q, an initial state $q_{0} : 1 \to Q$ (or $1 \to 1 + Q$ ), transition functions $δ_{σ} : Q \to 1 + Q$ (partial functions), and a final state map $χ_{F} : Q \to 1 + 1$ (partial map to {accept}).
- Category: Set• (sets and partial functions).
- Diagram: $1 q_{0} Q δ_{σ} Q χ_{F} 1$ (final map is to a singleton if only concerned with acceptance, often depicted as $Q \to 1 + 1$ or $Q \to 1$ where $1$ is a terminal object with a partial map).
Nondeterministic Finite Automata (NFA)
- Given a set Q, initial states $I \subseteq Q$ (map $i : 1 \to P (Q)$ ), transition relations $δ_{σ} : Q \to P (Q)$ , and final states $F \subseteq Q$ (map $f : Q \to 2$ or $Q \to P (1)$ ).
- Category: Rel (sets and relations).
- Diagram: $1 i Q δ_{σ} Q f 1$ (where arrows are relations; initial $1 \to Q$ and final $Q \to 1$ ).
Weighted Automata (WA) over a Semiring R
- States Q form a basis for a free R-module $R^{Q}$ .
- Initial vector $i : R \to R^{Q}$ , transition matrices (linear maps) $δ_{σ} : R^{Q} \to R^{Q}$ , and final vector $f : R^{Q} \to R$ .
- Category: FreeMod_R (R-modules and linear maps).
- Diagram: $R i R^{Q} δ_{σ} R^{Q} f R$ (Often initial is $1 \to R^{Q}$ or $R \to R^{Q}$ and final is $R^{Q} \to R$ ).
linear map i: for entering initial state
- transitions: give a matrix (linear transformation) = $Σ_{i} δ_{i} q_{i}$
- linear map fin for weight of leaving the final state
Sequential Transducers (input alphabet A, output alphabet B)
- States Q, initial state with initial output $i : 1 \to B^{*} \times Q + 1$ (or $1 \to 1 + (B^{*} \times Q)$ ).
- Transitions $δ_{σ} : Q \to B^{*} \times Q + 1$ for each $a \in A$ .
- Final outputs $f : Q \to B^{*} + 1$ .
- An arrow $X \to Y$ is a function $X \to B^{*} \times Y + 1$ .
  
  1: for undefined states, B* for accepting words
- Category: T (Kleisli category for $T (X) = B^{*} \times X + 1$ ).
related to the standard notion of Monad
- Diagram: $1 i Q δ_{σ} Q f 1$ (arrows are $X \to 1 + B^{*} \times Y$ or $X \to 1 + B^{*}$ ).

We haven’t seen composition: why matters? for automata: accepting words

Automata	Category	Objects	Morphisms
complete DFAs	Sets	sets	functions
DFAs	Set_{\cdot}	sets	partial functions
NFA	Rel	sets	relations
WAs over R	FreeMod_R	R-modules	linear maps
subsequential transducers	…	…	… (we can construct, Kleisli category)

General Form: (C, I, O)-automata

These are automata where initial (I), state (Q), and final (O) objects live in a category $C$ .
Diagram: $I i Q δ_{σ} Q f O$ .
- I is considered as some initial object
- O is considered as some final object

Word Automata as Functors

Word automata on $A^{*}$ can be seen as functors $A : I \to C$ .
The input category $I$ is freely generated by objects states and arrows in: initial_object_placeholder \to states, out: states \to final_object_placeholder, and $α : states \to states$ for each $α \in A$ .
A functor $A$ provides:
- An object $Q = A (states)$ in $C$ .
- An initial arrow $i : I \to Q$ (where $I = \mathcal{A}(\text{initial_object_placeholder})$ ).
- A final arrow $f : Q \to F$ (where $F = \mathcal{A}(\text{final_object_placeholder})$ ).
- Transition arrows $δ_{a} : Q \to Q$ for each $a \in A$ .
The language accepted by $A$ is a map $L_{A} : A^{*} \to C (I, F)$ , where for $w = a_{1} ... a_{n}$ , $L_{A} (w) = f \circ δ_{a_{n}} \circ \dots \circ δ_{a_{1}} \circ i$ .

this gives the standard notion for acceptance of words on all the above examples e.g.: for DFA, |I| = 1, |O| = 2, Set(1, 2) ~~ 2, so either accpept or reject, every word is mapped to some morphism from intial to final

An automaton $A$ accepts a language $L$ (itself a functor $L : O \to C$ , where $O$ is an observation subcategory of $I$ ) if a specific diagram commutes.
$A u t o_{L}$ is the category of automata accepting $L$ .

Automata isomorphism ~~ natural transformation

Output Categories and Monads

The output categories seen so far (Set, Set•, Rel, Vec, T) are often Kleisli categories for monads T: Set $\to$ Set, specifying some effect:

Set: Identity monad.
Set• (partial functions): Maybe monad (option monad $X \mapsto 1 + X$ ).
Rel: Powerset monad ( $P X$ ) for non-determinism.
T (sequential transducers): Monad of partial free actions of $B^{*}$ ( $T (X) = B^{*} \times X + 1$ ).

What in common? Answer. They are categories of free algebras (aka Kleisli categories) for monads specifying some effect: • the identity monad • the Maybe monad (aka option) • the powerset monad – non-determinism • the monad of partial free actions of _B_∗

PL perspective now!

Changing Output Categories via Adjunctions

Adjuction - For relating two categories in some way when they are not isomorphic

Adjunction Recap: $F : C ⇆ D : U$ with $F ⊣ U$ means natural isomorphisms $C (X, U Y) ≅ D (FX, Y)$ .
- $f : FX \to Y$ yields $f_{b} : X \to U Y$ .
- $g : X \to U Y$ yields $g^{#} : FX \to Y$ .
Example 1 (Set vs Set•): $F : Set ⇆ Set• : U$ .
- $FX = X$ , $U X = 1 + X$ .
option monad:
Example 2 (Set vs Rel): $F : Set ⇆ Rel : U$ .
- $FX = X$ , $U X = P X$ .
powerset monad
These are adjunctions between Set and $K l (T)$ , with $F$ identity on objects and $U X = TX$ .

Lifting Adjunctions

If languages $L_{C} : A^{*} \to C (X, U Y)$ and $L_{D} : A^{*} \to D (FX, Y)$ are “the same” under adjunction, there’s a relationship between $A u t o (L_{C})$ and $A u t o (L_{D})$ .

Recall the map interpretation of language acceptance here

Completing DFAs: The completion of a DFA is a right adjoint to the inclusion of complete DFAs into DFAs (adjunction between Set• and Set).
Determinization of NFAs: Determinization (powerset construction) is a right adjoint to the inclusion of DFAs into NFAs (adjunction between Set and Rel).

Corollary 2. Initial automata are “free” in Kleisli-valued automata.

Automata Minimization: Categorical Perspective

Classical DFA Minimization

Left Quotient: For $L \subseteq A^{*}$ , $u^{- 1} L = {v \in A^{*} ∣ uv \in L}$ .

aka language derivatives

Myhill-Nerode Equivalence: $u \equiv_{L} v ⟺ u^{- 1} L = v^{- 1} L$ .

Two words are equivalent if they have the same left quotient

Myhill-Nerode Theorem: L is regular $⟺$ finitely many left quotients $⟺ \equiv_{L}$ has finite index.
- Proof idea:
  - $\Rightarrow$ If $A = (Q, q_{o}, F, (δ_{a}))$ accepts L, then $(Q, δ_{u} (q_{o}), F, (δ_{a}))$ accepts $u^{- 1} L$ .
  - $\Leftarrow$ Nerode Automaton: States are ${u^{- 1} L ∣ u \in A^{*}}$ , initial state is $L$ , final states are ${u^{- 1} L ∣ u \in L}$ , $δ_{a} (u^{- 1} L) = (u a)^{- 1} L$ .
Minimization Process for a DFA $A$ :
1. Remove unreachable states $\to R e a c h (A)$ .
2. Merge states accepting the same language (indistinguishable) $\to O b s (R e a c h (A))$ .

Minimization of $C$ -Automata

What is the notion of minimal?

A $C$ -automaton is minimal if it “divides” any other $C$ -automaton accepting the same language.

A DFA is minimal when it divides any other automaton accepting the same language. Here divides=«is a quotient of a sub-automaton of»

“quotient” — “surjection for sets”

“subobject” — “injection for sets”

“Divides” means “is a quotient of a sub-automaton of”.
This requires notions of “quotient” (surjection-like) and “subobject” (injection-like), i.e., a factorization system $(E, M)$ in $C$ .
- $E$ (epimorphisms/quotients), $M$ (monomorphisms/subobjects).
- Every morphism $f$ factors as $f = m \circ e$ with $e \in E, m \in M$ .
- Factorization is unique up to isomorphism (functorial).

Three Ingredients for Categorical Minimization

For a language $L : I \to C$ , a minimal automaton $M in (L)$ exists if $A u t o_{L}$ (the category of automata accepting $L$ ) has:

An initial object $A_{ini t} (L)$ .
- Exists if $C$ has copowers (related to reachability).
- For Set-DFAs: states $A^{*}$ , $i : 1 \to A^{*}$ (maps to $ϵ$ ), $δ_{a} (w) = w a$ , $f : A^{*} \to 2$ (is $w \in L$ ?).
A final object $A_{f ina l} (L)$ .
- Exists if $C$ has powers (related to observability).
- For Set-DFAs: states $2^{A^{*}}$ (all possible languages), $i : 1 \to 2^{A^{*}}$ (maps to $L$ ), $δ_{a} (K) = a^{- 1} K$ , $f : 2^{A^{*}} \to 2$ (is $ϵ \in K$ ?).
A factorization system in $C$ . Then $M in (L)$ is obtained from the unique morphism $h : A_{ini t} (L) \to A_{f ina l} (L)$ by factoring $h$ as $A_{ini t} (L) e M in (L) m A_{f ina l} (L)$ .

For DFAs, $h : A^{*} \to 2^{A^{*}}$ is $w \mapsto w^{- 1} L$ . The factorization yields the Nerode automaton as $M in (L)$ .
This framework applies to R-weighted automata and sequential transducers (yielding Choffrut’s minimal transducer).
$M in (L)$ divides any automaton $A$ for $L$ : $A_{ini t} (L) ↠ re a c h (A) ↠ o b s (re a c h (A)) ↣ A_{f ina l} (L)$ , and $o b s (re a c h (A))$ maps to $M in (L)$ .

Automata Learning: The $L^{*}$ Algorithm Categorically

Goal: Learn a regular language L.
Interaction: Learner asks Teacher:
1. Membership queries: ” $w \in L$ ?“.
2. Equivalence queries: “Does hypothesis automaton $H$ accept $L$ ?” If no, Teacher gives a counter-example.
Algorithm maintains a pair of finite sets of words $(Q, T)$ , starting with $({ϵ}, {ϵ})$ .
- $Q$ : potential states (prefixes).
- $T$ : test words (suffixes) used for equivalence.
T-equivalence ( $\sim_{T}$ ): $w \sim_{T} v ⟺ (\forall u \in T . w u \in L ⟺ vu \in L)$ .
Observation Table Conditions:
- Closedness: $\forall q \in Q, \forall a \in A, \exists p \in Q such that p \sim_{T} q a$ . If not, add $q a$ to $Q$ .
- Consistency: $\forall q, q^{'} \in Q, \forall a \in A, (q \sim_{T} q^{'} ⟹ q a \sim_{T} q^{'} a)$ . If not, find $u$ distinguishing $q a u$ and $q^{'} a u$ , add $a u$ to $T$ .
When $(Q, T)$ is closed and consistent, a hypothesis automaton $H (Q, T)$ can be built.

$L^{*}$ Revisited Categorically

Learner has access to a fragment of the language $L_{Q, T} : Q A^{*} T \cup QT \to 2$ (boolean outcomes).
This can be represented by a (Q,T)-biautomaton.
- Diagram: $1 ▹ q Q_{1} a Q_{2} ϵ Q_{1} t ◃ 2$ (conceptual, $Q_{1}, Q_{2}$ are state objects derived from $Q, T$ ).
Closure and consistency are encoded via the minimal (Q,T)-biautomaton.
- The minimal (Q,T)-biautomaton is $1 ▹ q_{min} Q / \sim_{T} \cup A T a_{min} (Q \cup Q A) / \sim_{T} ϵ_{min} Q / \sim_{T} \cup A T t ◃_{min} 2$ .
- $ϵ_{min}$ is surjective $⟺ (Q, T)$ is closed.
- $ϵ_{min}$ is injective $⟺ (Q, T)$ is consistent.
- If $ϵ_{min}$ is an isomorphism, the two state objects are merged to form $H (Q, T)$ .
This yields a generic FunL* algorithm applicable to various automata types (DFAs, weighted automata over fields, sequential transducers).

FunL* Algorithm Sketch

Initialize $Q := {ϵ}$ , $T := {ϵ}$ .
Repeat:
1. While $ϵ_{min}$ of the minimal (Q,T)-biautomaton is not an isomorphism (i.e., not closed or not consistent):
  - If not closed ( $ϵ_{min} \in / E$ ), enlarge $Q$ (add $q a$ for some $q, a$ ).
  - If not consistent ( $ϵ_{min} \in / M$ ), enlarge $T$ (add $a u$ for some $a, u$ that breaks consistency).
2. Build hypothesis $H (Q, T)$ . Ask equivalence query.
3. If Teacher answers NO with counterexample $w$ : Add $w$ and its prefixes to $Q$ .
Until Teacher answers YES. Return $H (Q, T)$ .

Perspectives

Explore conditions on a monad T such that $K l (T)$ has properties for minimization/learning.

Problem with free algebras (not ideal): doesn’t have good factorizations / products

move to larger categories:

Move to Eilenberg-Moore algebras (e.g., join-semilattices for Rel-valued automata) if $K l (T)$ is not suitable.
Extend to tree automata.
Weighted automata over number rings.
Learning nominal automata (building on work on automata in toposes).

Tony's Wiki

Explorer

A Categorical Approach to Automata Learning and Minimization

References

Tutorial Overview

Automata with Effects: A Categorical View

Word Automata as Functors

Output Categories and Monads

Changing Output Categories via Adjunctions

Lifting Adjunctions

Automata Minimization: Categorical Perspective

Classical DFA Minimization

Minimization of $C$ -Automata

Three Ingredients for Categorical Minimization

Automata Learning: The $L^{*}$ Algorithm Categorically

$L^{*}$ Revisited Categorically

FunL* Algorithm Sketch

Perspectives

Graph View

Table of Contents

Tony's Wiki

Explorer

A Categorical Approach to Automata Learning and Minimization

References

Tutorial Overview

Automata with Effects: A Categorical View

Word Automata as Functors

Output Categories and Monads

Changing Output Categories via Adjunctions

Lifting Adjunctions

Automata Minimization: Categorical Perspective

Classical DFA Minimization

Minimization of C-Automata

Three Ingredients for Categorical Minimization

Automata Learning: The L∗ Algorithm Categorically

L∗ Revisited Categorically

FunL* Algorithm Sketch

Perspectives

Graph View

Table of Contents

Minimization of $C$ -Automata

Automata Learning: The $L^{*}$ Algorithm Categorically

$L^{*}$ Revisited Categorically