SemanticModels.jl¶

SemanticModels Logo

Teaching computers to do science
Papers are useless, all the information is in code
Model Augmentation and Synthesis
Arbitrary models are complex, but transformations are simpler
Project Repo github.com/jpfairbanks/SemanticModels.jl

What is Modeling?¶

Make an initial model $ y \approx \beta x $
Make a better model $ y \approx \beta x + \gamma y $
Interpret $\beta, \gamma $ to understand the world

Science as nested optimization¶

Fitting the data is a regression problem:

$$h^* = \min_{h\in {H}} \ell(h(x), y)$$

Institutional process of discovery is

$$\max_{{H}\in \mathcal{M}} expl(h^*)$$ where $expl$ is the explanatory power of a class of models $H$.

The explanatory power is some combination of
- generalization,
- parsimony,
- and consistency with the fundamental principles of the field.

Modeling Frameworks¶

Most frameworks are designed before the models are written

Domain
Algebra
Learning
Optimization
Modeling

SemanticModels is a post hoc modeling framework

SIR model of disease¶

ODE based simulation¶

A mathematical model of disease spread¶

\begin{align} \frac{dS}{dt}&=-\frac{\beta IS}{N}\\\\ \frac{dI}{dt}&=\frac{\beta IS}{N}-\gamma I\\\\ \frac{dR}{dt}&=\gamma I \end{align}

Predictions¶

Ebola Outbreak

(a) Cumulative number of infected individuals as a function of time (day) for the three countries Guinea, Liberia and Sierra Leone.
A Khalequea, and P Senb, "An empirical analysis of the Ebola outbreak in West Africa" 2017

Agent based simulation¶

In [11]:

abstract type AgentModel end
mutable struct StateModel <: AgentModel
    states
    agents
    transitions
end

In [13]:

#using AgentModels <- hypothetical ABM library

function main(nsteps)
    n = 20
    a = fill(:S, n)
    ρ = 0.5 + randn(Float64)/4 # chance of recovery
    μ = 0.5 # chance of immunity
    T = Dict(
        :S=>(x...)->rand(Float64) < stateload(x[1], :I) ? :I : :S,
        :I=>(x...)->rand(Float64) < ρ ? :I : :R,
        :R=>(x...)->rand(Float64) < μ ? :R : :S,
    )
    sam = StateModel([:S, :I, :R], a, T, zeros(Float64,3))
    newsam = step!(sam, nsteps)
    counts = describe(newsam)
    return newsam, counts
end

Out[13]:

main (generic function with 1 method)

Statistical Models¶

using LsqFit
function f(x, β) 
    return β[1] .* x + β[2]
end

function main()
    X = load_matrix("file_X.csv")
    target = load_vector("file_y.csv")
    a₀ = [1.0]

    fit = curve_fit(f, X, target, a₀)
    return fit
end

main()

Category Theory¶

CT is the mathematics of structure preserving maps. Every field of math has a notion of homomorphism where two objects in that category have similar structure

Sets, Groups, Fields, Rings
Graphs
Databases

CT is the study of structure in its most general form.

Graphs as Categories¶

Each graph is a category¶

$ G = (V,E) $
$Ob(G) = V$
$Hom_G(v,u) = (v\leadsto u) \in E$

The category of graphs¶

Graph Homomorphism $f: G\to H$ st $(v\leadsto u) \in G \implies (f(v) \leadsto f(u)) \in H$
$Ob(Graph)$ is the set of all graphs
$Hom_{Graph}(G,H)$ is the set of all graph homomorphisms between $G,H$

Models as Categories¶

Each model is a Category¶

An SIR model structure

Category of Models¶

The family of compartmental models

Semantic Models applies Category Theory¶

We have built a novel modeling environment that builds and manipulates models in this category theory approach.

Contributions:

We take general code as input
Highly general and extensible framework
Goal: Transformations obey the functor laws.

Example¶

Show the workflow demo

Type Graphs¶

Computers are good at type checking
Can we embed our semantics into the type system?

An ABM of SIR disease spread

Refining the model¶

Convert categorical values into singleton types:

An more refined ABM

The type system "understands" the agents now¶

Convert categorical values into singleton types:

An more refined ABM

Conclusion¶

SemanticModels.jl github.com/jpfairbanks/SemanticModels.jl is a foundational technology for teaching machines to reason about scientific models
Thinking in terms of transformations on models is easier than thinking of models themselves.
A good type system can reason over modeling concepts