Что представляют собой codata в контексте программирования?

Это базовый алгоритм, потому что с каждой итерацией он вызывает себя для данных, которые больше, чем были раньше:

iterate f x =  x : iterate f (f x)

Он похож на стиль накопителя хвостовой рекурсии, но его накопитель неявный, а не передается в качестве аргумента. И было бы бесконечно, если бы не лень. Так что codata - это просто результат конструктора значений в WHNF, вроде как(a, thunk)? Или кодата - это скорее математический термин из теории категорий, который не имеет полезного представления в области программирования?

Последующий вопрос: является ли рекурсия значений синонимом коркурсии?

1 ответ

Решение

Я думаю, что для ответа на ваши вопросы требуется много объяснений, поэтому вот большой длинный ответ с конкретными ответами на ваши вопросы в конце.

Данные и коды имеют формальные математические определения в терминах теории категорий, поэтому дело не только в том, как они используются в программе (т.е. не только в "контексте приложения", о котором вы упомянули в комментариях). В Haskell это может показаться таким, потому что особенности языка (в частности, отсутствие завершения и ленивость) в конечном итоге стирают различие, поэтому в Haskell все данные также являются кодовыми и наоборот, но это не обязательно должно быть так, и есть языки, которые делают различие более ясным.

Both data and codata do have useful representations in the programming domain, and those representations give rise to natural relationships to recursion and corecursion.

It's quite hard to explain these formal definitions and representations without quickly getting technical, but roughly speaking, a data type for, say, a list of integers, is a type L together with a constructor function:

makeL :: Either () (Int, L) -> L

that is somehow "universal" in that it can fully represent any such construction. (Here, you want to interpret the LHS type Either () (Int, L) to mean that a list L is either the empty list Left () or a pair Right (h, t) consisting of the head element h :: Int and a tail list t :: L.)

To start with a counterexample, L = Bool is not the data type we're looking for, because even though you could write:

foo :: Either () (Int, Bool) -> Bool
foo (Left ()) = False
foo (Right (h, t)) = True

to "construct" a Bool, this can't fully represent any such construction. For example, the two constructions:

foo (Right (1, foo (Left ()))) = True
foo (Right (2, foo (Left ()))) = True

give the same Bool value, even though they used different integers, so this Bool value is insufficient to fully represent the construction.

In contrast, the type [Int] is an appropriate data type because the (almost trivial) constructor function:

makeL :: Either () (Int, [Int]) -> [Int]
makeL (Left ()) = []
makeL (Right (h, t)) = h : t

fully represents any possible construction, creating a unique value for each one. So, it's somehow the "natural" construction for the type signature Either () (Int, L) -> L.

Similarly, a codata type for a list of integers would be a type L together with a destructor function:

eatL :: L -> Either () (Int, L)

that is somehow "universal" in the sense that it can represent any possible destruction.

Again, starting with a counterexample, a pair (Int, Int) is not the codata type we're looking for. For example, with the destructor:

eatL :: (Int, Int) -> Either () (Int, (Int, Int))
eatL (a, b) = Right (a, (b, a))

we can represent the destruction:

let p0 = (1, 2)
    Right (1, p1) = eatL p0
    Right (2, p2) = eatL p1
    Right (1, p3) = eatL p2
    Right (2, p4) = eatL p3
...continue indefinitely or stop whenever you want...

but we can't represent the destruction:

let p0 = (?, ?)
    Right (1, p1) = eatL p0
    Right (2, p2) = eatL p1
    Right (3, p3) = eatL p2
    Left () = eatL p3

On the other hand, in Haskell, the list type [Int] is an appropriate codata type for a list of integers, because the destructor:

eatL :: [Int] -> Either () (Int, [Int])
eatL (x:xs) = Right (x, xs)
eatL [] = Left ()

can represent any possible destruction (including both finite or infinite destructions, thanks to Haskell's lazy lists).

(As evidence that this isn't all hand-waving and in case you want to relate it back to the formal math, in technical category theory terms, the above is equivalent to saying that the list-like endofunctor:

F(A) = 1 + Int*A   -- RHS equivalent to "Either () (Int,A)"

gives rise to a category whose objects are constructor functions (AKA F-algebras) 1 + Int*A -> A. A data type associated with F is an initial F-algebra in this category. F also gives rise to another category whose objects are destructor functions (AKA F-coalgebras) A -> 1 + Int*A. A codata type associated with F is a final F-coalgebra in this category.)

In intuitive terms, as suggested by @DanielWagner, a data type is a way of representing any construction of a list-like object, while a codata type is a way of representing any destruction of a list-like object. In languages where data and codata are different, there's a fundamental asymmetry -- a terminating program can only construct a finite list, but it can destruct (the first part of) an infinite list, so data must be finite, but codata can be finite or infinite.

This leads to another complication. In Haskell, we can use makeL to construct an infinite list like so:

myInfiniteList = let t = makeL (Right (1, t)) in t

Note that this would not be possible if Haskell didn't allow lazy evaluation of non-terminating programs. Because we can do this, by the formal definition of "data", a Haskell list-of-integer data type must also include infinite lists! That is, Haskell "data" can be infinite.

This probably conflicts with what you might read elsewhere (and even with the intuition that @DanielWagner provided), where "data" is used to refer to finite data structures only. Well, because Haskell is a little weird and because infinite data isn't allowed in other languages where data and codata are distinct, when people talk about "data" and "codata" (even in Haskell) and are interested in drawing a distinction, they may use "data" to refer to finite structures only.

The way recursion and corecursion fit in to this is that the universality properties naturally give us "recursion" to consume data and "corecursion" to produce codata. If L is a list-of-integer data type with constructor function:

makeL :: Either () (Int, L) -> L

then one way of consuming a list L to produce a Result is to define a (non-recursive) function:

makeResult :: Either () (Int, Result) -> Result

Here, makeResult (Left ()) gives the intended result for an empty list, while makeResult (Right (h, t_result)) gives the intended result for a list whose head element is h :: Int and whose tail would give the result t_result :: Result.

By universality (i.e., the fact that makeL is an initial F-algebra), there exists a unique function process :: L -> Result that "implements" makeResult. In practice, it will be implemented recursively:

process :: [Int] -> Result
process [] = makeResult (Left ())
process (h:t) = makeResult (Right (h, process t))

Conversely, if L is a list-of-integer codata type with destructor function:

eatL :: L -> Either () (Int, L)

then one way of producing a list L from a Seed is to define a (non-recursive) function:

unfoldSeed :: Seed -> Either () (Int, Seed)

Here, unfoldSeed should produce a Right (x, nextSeed) for each desired integer, and produce Left () to terminate the list.

By universality (i.e., the fact that eatL is a final F-coalebra), there exists a unique function generate :: Seed -> L that "implements" unfoldSeed. In practice, it will be implemented corecursively:

generate :: Seed -> [Int]
generate s = case unfoldSeed s of
  Left () -> []
  Right (x, s') -> x : generate s'

So, with all that said, here are the answers to your original questions:

  • Technically, iterate f is corecursive because it's the unique codata-producing function Int -> [Int] that implements:

    unfoldSeed :: Seed -> Either () (Int, Seed)
    unfoldSeed x = Right (x, f x)
    

    by means of generate as defined above.

  • In Haskell, corecursion that produces codata of type [a] relies on laziness. However, strict codata representations are possible. For example, the following codata representation works fine in Strict Haskell and can be safely fully evaluated.

    data CoList = End | CoList Int (() -> CoList)
    

    The following corecursive function produces a CoList value (and I made it finite just for fun -- it's easy to produce infinite codata values, too):

    countDown :: Int -> CoList
    countDown n | n > 0 = CoList n (\() -> countDown (n-1))
                | otherwise = End
    
  • So, no, codata isn't just the result of values in WHNF with form (a, thunk) or similar and corecursion is not synonymous with value recursion. However, WHNF and thunks provide one possible implementation and are the implementation-level reason that a "standard" Haskell list data type is also a codata type.

Другие вопросы по тегам