Визначення типів уточнення

На роботі мені було доручено виводити інформацію про тип динамічної мови. Я переписую послідовності висловлювань у вкладені letвирази, як-от так:

return x; Z            =>  x
var x; Z               =>  let x = undefined in Z
x = y; Z               =>  let x = y in Z
if x then T else F; Z  =>  if x then { T; Z } else { F; Z }

Оскільки я починаю з загальної інформації про тип і намагаюся вивести більш конкретні типи, природним вибором є типи уточнення. Наприклад, умовний оператор повертає об'єднання типів його істинних і хибних гілок. У простих випадках це працює дуже добре.

Однак я натрапив на корч, намагаючись зробити такий тип:

function g(f) {
  var x;
  x = f(3);
  return f(x);
}

Який переписаний на:

\f.
  let x = undefined in
    let x = f 3 in
      f x

HM зробить висновок і, отже, . Фактичний тип, який я хочу мати змогу зробити, це: $\mathtt{f} : \mathtt{Int} \to \mathtt{Int}$ $\mathtt{g} : (\mathtt{Int} \to \mathtt{Int}) \to \mathtt{Int}$

g : \forall τ_{1} τ_{2} . (I n t \to τ_{1} \land τ_{1} \to τ_{2}) \to τ_{2}

$\mathtt{g} : \forall \tau_1 \tau_2. \:(\mathtt{Int} \to \tau_1 \land \tau_1 \to \tau_2) \to \tau_2$

I’m already using functional dependencies to resolve the type of an overloaded + operator, so I figured it was a natural choice to use them to resolve the type of f within g. That is, the types of f in all its applications together uniquely determine the type of g. However, as it turns out, fundeps don’t lend themselves terribly well to variable numbers of source types.

Anyway, the interplay of polymorphism and refinement typing is problematic. So is there a better approach I’m missing? I’m currently digesting “Refinement Types for ML” and would appreciate more literature or other pointers.

programming-languages logic type-theory type-inference machine-learning data-mining clustering order-theory reference-request information-theory entropy algorithms algorithm-analysis space-complexity lower-bounds formal-languages computability formal-grammars context-free parsing complexity-theory time-complexity terminology turing-machines nondeterminism programming-languages semantics operational-semantics complexity-theory time-complexity complexity-theory reference-request turing-machines machine-models simulation graphs probability-theory data-structures terminology distributed-systems hash-tables history terminology programming-languages meta-programming terminology formal-grammars compilers algorithms search-algorithms formal-languages regular-languages complexity-theory satisfiability sat-solvers factoring algorithms randomized-algorithms streaming-algorithm in-place algorithms numerical-analysis regular-languages automata finite-automata regular-expressions algorithms data-structures efficiency coding-theory algorithms graph-theory reference-request education books formal-languages context-free proof-techniques algorithms graph-theory greedy-algorithms matroids complexity-theory graph-theory np-complete intuition complexity-theory np-complete traveling-salesman algorithms graphs probabilistic-algorithms weighted-graphs data-structures time-complexity priority-queues computability turing-machines automata pushdown-automata algorithms graphs binary-trees algorithms algorithm-analysis spanning-trees terminology asymptotics landau-notation algorithms graph-theory network-flow terminology computability undecidability rice-theorem algorithms data-structures computational-geometry

— Jon Purdy
джерело

You've stumbled on the fact that inference of static invariants for higher-order languages is quite difficult in practice, in addition to being undecidable in theory. I'm not sure what the definitive answer to your question is, but note several things:

Polymorphism and refinement types behave poorly together, as you have noted, in particular the notion of most general type is lost. A consequence of this is that analyses based on refinement types in the presence of polymorphism may need to chose between whole-program analysis (as opposed to compositional analysis) and the use of heuristics to decide which type you want to assign to your program.
There is a strong relationship between inferring refinement types and:
1. Computing the abstract interpretation of your program
2. Computing loop invariants in an imperative language.

With this in mind, here are a few disorganized references on the inference of refinement types. Note that there are many differing flavors of refinement types: I tend to be more interested in refinements of inductive datatypes, so this list may be skewed in that direction.

Start with the classics: Relational Abstract Interpretation of Higher-Order Functional Programs by Cousot & Cousot. This explains how to extend abstract interpretation to higher-order programs using relational semantics.
Liquid Types by Rhondon, Kawaguchi and Jhala. This is very evolved work, that combines HM and a type of predicate refinement to infer security annotations (array bound checks for instance) for ML style programs. The inference proceeds in 2 steps; the first is HM inference of type annotations, which guide the choice of refinements to perform.
I should probably also mention the work by Fournet, Swarmy, Chen, Strub... on $F^*$ , an extension of $F^\#$ which seems similar to the liquid types approach, but with the goal of verifying cryptographic protocols and algorithms for distributed computing. I'm not sure how much published work there is on the inference of annotations in this case.
There is a nice paper by Chin and Khoo on inference of a specific kind of refinement types: types with size annotations.
The ATS programming language is a system which allows various refinements and provides facilities for writing programs with them. However the annotations may be arbitrarily complex (and thus undecidable) and therefore may require user interaction. I believe that there is a form of inference for these types, I'm not sure which article to recommend however.
Last, but not least The Cartesian Product Algorithm, by Ole Agesen. While not mentioning refinement types explicitly, this seems to be the work closest to solving the problem you seem to have. Don't be fooled by the mention of parametric polymorphism in the abstract: they look to infer concrete types, which are just tuples of possible atomic types. Emphasis is given on efficiency. I recommend reading this article first to see if it solves your problem.

Side note: type inference in the presence of intersection types can be very undecidable: in the simplest form, $\lambda$ -terms with intersection type are exactly the strongly normalizing terms. Tread softly around them :)

— cody
джерело