Functional Data Structures and Algorithms: a Proof Assistant Approach

dwohnitmok 16 hours ago

IIRC this book unfortunately only proves correctness directly and not runtime. Its runtime proofs are based off an abstraction of the algorithm suitable for direct manipulation by proofs rather than the actual implementation in code.

Does anybody know of any languages that let you prove properties about the runtime of a function directly implemented in the language?

auggierose 13 hours ago

> Its runtime proofs are based off an abstraction of the algorithm suitable for direct manipulation by proofs rather than the actual implementation in code.
What is the difference? You are aware that the code is also only an abstraction, right?
- zelphirkalt 11 hours ago
  
  The difference is, that proving something about an abstraction doesn't prove, that you made no mistakes when translating that abstraction into the actual code running, and therefore you have not proven anything of value about the actually running code.
  - auggierose 9 hours ago
    
    If the abstraction maintains the properties you care about, PROVABLY, there is no problem. As is the case in this case. Again, the code you see in Isabelle is already an abstraction, it is not "running".
    
    dwohnitmok 5 hours ago
    
    > If the abstraction maintains the properties you care about, PROVABLY, there is no problem.
    This approach doesn't do that. The translation from the actual executing code to the representation used for runtime analysis is done entirely informally and not checked at all by Isabelle.
    
    auggierose 5 hours ago
    
    It explains the underlying runtime model they assume, and derives abstractions for the runtime t based on that, which are provably correct up to O(t) under the assumptions.
    That does not help you much if you want to know how many seconds this will run. Instead, it tells you the asymptotic runtime complexity of the various algorithms, which is all you can really expect for general functional programs without a concrete machine model.
    
    dwohnitmok 2 hours ago
    
    What I mean is that there is no relationship in Isabelle between their cost function and their actual algorithm.
    The process the book goes through for a function f is the following:
    1. Define `f`
    2. Create a correctness predicate (call it `Correct`) and prove `forall x, Correct(f(x))`
    3. Use a script to do some code generation to generate a function `time_f`
    4. Prove that `time_f` fulfills some asymptotic bound (e.g. `exists c, forall x, x > c -> time_f(x) < a * x^2`)
    Nowhere is `time_f` actually ever formally related to `f`. From Isabelle's point of view they are two completely separate functions that have no relationship to one another. There is only an informal, English argument given that `time_f` corresponds to `f`.
    Ideally you'd be able to define some other predicate `AsymptoticallyModelsRuntime` such that `AsymptoticallyModelsRuntime(f, time_f)` holds, with the obvious semantic meaning of that predicate also holding true. But the book doesn't do that. And I don't know how they could. Hence my original question of whether there's any system that lets you write `AsymptoticallyModelsRuntime`.
    
    auggierose an hour ago
    
    Yes, I know what you mean, but there is a relationship, it is just that some of that relationship is described outside of Isabelle, but nevertheless provably. Ultimately, math is like that, provably so.
    You could do what you want by making that argument explicit formally and machine-checked, but then you have to do a lot more work, by translating all of the components of the informal proof into formal ones. This will not give you any more insight than what the book already describes. But of course you could take it as an example of something that should be easy once you grasp the informal proof, but is actually quite a lot of work.
mckirk 7 hours ago

Unfortunately proving anything about a concrete imperative implementation is orders of magnitude more complex than working with an abstraction, because you have to deal with pesky 'reality' and truly take care of every possible edge-case, so it only makes sense for the most critical applications. And sometimes there just isn't a framework to do even that, depending on your use case, and you'd have to sit down a PhD student for a while to build it. And even then you're still working with an abstraction of some kind, since you have to assume some kind of CPU architecture etc.
It really is more difficult to work with 'concrete implementations' to a degree that's fairly unintuitive if you haven't seen it first-hand.
- dietr1ch 6 hours ago
  
  I can't fathom how crazy it gets to model once you try to consider compilers, architectures, timings, temperatures, bit-flips & ECCs, cache misses, pseudo and "truly" random devices, threads, other processes, system load, I/O errors, networking.
  To me it seems mandatory to work with some abstraction underneath that allows factoring a lot of different cases into a smaller set of possibilities that needs to be analysed.
  It's also how we manage to think in a world where tiny little details do give you a likely insignificantly different world-state to think about.
yuppiemephisto 14 hours ago

https://markushimmel.de/blog/my-first-verified-imperative-pr...
Lean
- saithound 14 hours ago
  
  Verification of "runtimes" in the sense of GP is not mentioned at all in the article you linked.

zkmon 16 hours ago

Some algorithms such as binary search give an incorrect view of the overall cost. The search has a prerequisite of sorting. So, the assumption is, the data is sorted once, and searched several times, making the sorting cost insignificant.

What if the data is used for only a single lookup? For this case, actually a sequential search would have lower cost compared to sorting and binary search. Infact, sequential search may beat sorting and binary search for upto about 100 lookups. So I think it is important to consider overall cost.

auggierose 12 hours ago

It is not an "incorrect view of the overall cost".
Binary search talks about how long the search takes with this particular algorithm assuming the data is sorted. It does not talk at all about scenarios where the data is not sorted. In particular, it does not provide any views on costs without this assumption, let alone an incorrect one.
Yes, YOU need to consider when it is appropriate to use binary search. That is the case with all algorithms you will ever apply, and goes without saying.
fiddlerwoaroof 16 hours ago

But how the data got sorted is irrelevant to the speed of the algorithm: for example, you could use binary search as part of an algorithm to find the insertion point of a new element in an always sorted data structure, meaning that sorting the data is never necessary.
- zkmon 14 hours ago
  
  The overall journey matters. For example, for some flight journeys, the flight-time is only a fraction of the overall time taken by the journey, which could makes it faster if you use road or rail transport. Flight speed doesn't matter.
  - kryptiskt 4 hours ago
    
    But that is an unanswerable question which depends on how the data structure is used. The reasonable thing is to calculate the cost for the operations separately and let whoever uses the algorithms figure out what that means for their use case.