Although still relatively new, iteratee/enumerator style I/O has achieved a fairly large userbase in the Haskell community. Like many young technologies, the design space is fairly broad, with
several different packages available from Hackage. Upon surveying the alternatives, I came to realize that there are so many alternatives in part because different developers have chosen to focus on different properties of iteratees. Doubtless this is a contributing factor to much of the confusion surrounding iteratees; when a user's mental model is closely aligned to the developer's, the package is likely to be much easier for the user to understand.
This post isn't meant to explain how to use iteratees, rather to explore what I believe are factors that influenced various design decisions in different packages. In some cases I have first- or second-hand knowledge of those decisions, but sometimes I'm extrapolating from the code. If you find one or two packages to be sensible and can't understand why anyone would use something different, you may find enlightenment here. Or not. For those who've never used an iteratee package, I hope that these posts might help point you to a good starting option. Wizards probably know this already. Note that just because I think the designers have a particular viewpoint for their code doesn't mean that other elements won't be present; these are all just different ways of looking at iteratees. All packages will have these properties to some extent.
Glossary
Some definitions, to be perfectly clear (the adventurous can skip this)
- iteratee: some sort of resource consumer. The defining feature of an iteratee is that it provides a mechanism to pump it with data until it returns a result. I may use "iteratee" generally, referring to code in an iteratee style.
- enumerator: some sort of resource provider. It stuffs data into an iteratee until it gets a result.
- iteratee style, iteratee/enumerator style: writing code that makes use of iteratees, enumerators, and other associated functions
- iteratee package: a library to provide functions used to write in iteratee style.
- underlying monad: for iteratees that are monad transformers, the monad which is being transformed. Often either IO or a monad stack based on IO.
1: Iteratees as Resource Managers
One of the design goals of Oleg Kiselyov's original iteratee work was improved resource management. Lazy I/O can make it difficult to reason about resource management (
half-closed?), while strict I/O traditionally does not compose well (compare to network programming in C). With iteratees, these problems are much closer to being solved. Input streams are opened on demand, evaluated strictly, and closed immediately, with no possibility of the handle or other resource leaking.(1) And iteratees are quite composable (perhaps too much so!), solving the other issue as well.
These properties make iteratees extremely useful for network programming, as network sockets are a scarce resource. However, this usage highlights a particular drawback: iteratees solve the resource management problem for
input, but don't address it for
output.
I believe that for this issue, all developers currently agree the best solution for output resource management is to use a monadic region, such as that provided by
regions or
ResourceT . This is currently the only approach guaranteed to run cleanup code with complicated monad stacks in the presence of exceptions.(2) In some cases
Control.Exception.bracket can be used, however it's often not possible to call it at a convenient time, leaving resources open for longer than necessary.
Having established that a monadic region is necessary for output resource management, using the same mechanism for input resource management as well is a reasonable decision. Developers taking this view may then write their package so that even input streams may cause resource leaks without the resource management layer, unlike the original design.(3)
2: Iteratees as Stream Processors
Iteratee packages with a stream processing bent focus on composibility as a design goal. The libraries usually provide a multitude of stream processors. They are frequently designed with a UNIX-pipeline-style model in mind; data can be passed through one processor into the next and so on, until finally the last processor in the chain consumes the resulting data. There may be clear demarcations between produces, consumers, and processors, or perhaps not. There are frequently combinator functions to make these connections, sometimes with a syntax inspired by unix pipes. Many of the functions will be similar to traditional Haskell list-processing functions, such as maps, accumulators, and filters.
People with this viewpoint tend to view stream processor composition, which Oleg refers to as an "Enumeratee", as the most important/useful means of composing iteratees.
Although I've presented stream processing and resource management as two different viewpoints, I addressed them together because packages generally can't exclude either element. One approach is to divide responsibilities, with a dedicated resource manager and combinators that can use those resources. Other packages may use the iteratee's own handling of input resource management, leaving output resource management to the user's discretion.
Conclusion
For the curious, I would place
conduit and
iterIO squarely in the stream processor camp, with
enumerator having strong leanings this way as well. To some extent all iteratee packages provide similar stream-processing capability; I make these judgements based upon
what I think the designers think iteratees are for. If your favorite iteratee package hasn't shown up yet, maybe next time!
Notes
(1) There is at least one way to force a resource to be open longer than intended. Since iteratees are generally implemented as monad transformers, it is possible to craft an iteratee over
ContT IO that generated a lazy stream from an input source, available within another iteratee. The resource would be closed when the outer iteratee was evaluated, making the outer iteratee a type of monadic region. I consider this an egregious abuse of Cont, however, and
extremely unlikely to happen accidentally.
(2) I'm fairly outspoken in my dislike for exceptions. They mostly shouldn't be used. The main purpose of exceptions is to make it easy to reason about how to deal with various conditions, and they
suck at this. There are basically three kinds of exceptions:
- Stuff you can fix, like a file not existing. This should be modeled by Either or Maybe or something similar. It's much easier to reason about a Maybe than having to figure out where in IO some result will be evaluated. This is by far the most common reason programmers use exceptions, and (in Haskell at least) it's the worst choice because Haskell provides a better mechanism for dealing with this problem.
- Stuff you can't fix, like out of memory conditions. The only sensible thing to do is terminate the program and let the OS clean it up. You shouldn't try to catch these, and if you do catch one you shouldn't try to do anything, because that might fail too. I suppose it's okay to model these with an exception, but maybe they shouldn't be exposed to the program logic at all.
- Asynchronous exceptions. I begrudgingly accept that these may need to be used in concurrent code.
(3) The basic resource provided by an OS is generally a
cursor, e.g. handle. Oleg Kiselyov
argues that cursors and enumerators are isomorphic, although enumerators have certain desirable properties that should make them preferred. The traditional design of an iteratee package is to provide enumerators that wrap access to a cursor in a closure, preventing the cursor from leaking. Packages like
pipes or
conduit that require a monadic region for safe access to input hold the cursor as data and don't provide enumerators-as-closures. In essence they evolved from cursor to enumerator back to cursor. Because of this I'm not entirely convinced they should be classed as "iteratee packages" at all, though they certainly occupy the same design space.