Thursday, March 12, 2009

future directions for iteratee

I have recently completed two major projects that were taking up nearly all of my time. Now that they are done, I'd like to work on the next version of the iteratee package. Here are a few ideas that I hope to include soon.

1. Remove the explicit Maybes
This was suggested by Oleg, and it seems to be a good idea. Every iteratee that produces a value other than () uses Maybe to represent the possibility of failure (always an option when doing IO). By incorparating the Maybe into the type of IterateeG, these will no longer need to be explicit. This will also make IterateeGM an instance of MonadZero to the extent that Maybe is an instance of that class.

Status: I've made a patch that does this, but it doesn't yet work properly with convStream. I haven't managed to track down the problem. Likely to be included, assuming I can solve this issue in a reasonable time frame.

2. Stream type class (and simplify StreamChunk)
I keep hoping for a ListLike type class to enter common use. Barring that, I have some ideas for breaking up the necessary functions of StreamChunk into separate type classes, such as the patch to use a monoid instance submitted by Bas van Dijk.

Status: Changes will be made, it is likely that StreamChunk will be broken into multiple smaller classes. Any addition of a Stream type class will wait until after point 4 is resolved.

3. More utility iteratees (foldl, filter, others?)
Status: Likely to be included. Changes to StreamChunk will make these easier to support.

4. Type-safe seeking
If iteratees are parameterized by Stream, the type of the stream should indicate if seeking is supported. I have an outline for how to implement this, but haven't done any work yet.

Status: Needs research, this will probably wait for the next major version bump.

5. Improved error handling
Bas van Dijk submitted a patch to change the type of a stream error from String to Error a. Others have suggested other possible changes as well.

Status: Needs more research, this will likely wait for the next major version bump.

enumerator/iteratees and output

I have recently received a few questions about writing output while using enumerator-based I/O. In some cases users have attempted to make enumerators (like enumFd) that will handle output as well, but have difficulty actually making it work.

I think these problems stem from an incorrect application of the enumerator model of I/O. When using enumerators, a file (or other data source) is a resource that can be enumerated over to process data, exactly as a list can be enumerated over in order to access the data contained in the list. Compare the following:

foldl f init xs

enumFd "SomeFile" ==<< stream2list

In the first function, 'xs' is the data to be processed, 'foldl' tells how to access individual items in the data collection, and 'f' and 'init' do the actual processing. In the second, "SomeFile" is the data, 'enumFd' tells how to access the data, and 'stream2list' does the processing. So how does writing fit in? The output file obviously isn't the data source, and it doesn't make sense to enumerate over your output file as there's no data there to process. So it must go within the Iteratee. It turns out that making an iteratee to write data is relatively simple:

> import Data.Iteratee
> import System.IO
> import Control.Monad
>
> writeOut :: FilePath -> IterateeGM [] Char IO ()
> writeOut file = do
> h <- liftIO $ openFile file WriteMode
> loop h
> where
> loop :: Handle -> IterateeGM [] Char IO ()
> loop h = do
> next <- Data.Iteratee.head
> case next of
> Just c -> liftIO $ hPutChar h c >> loop
> Nothing -> liftIO $ hClose h


Add some error handling and you've got a writer. This version could be polymorphic over different StreamChunk instances by generalizing the type (FlexibleContexts may be required as well). Other stream-specific versions could be written that would take advantage of the specific StreamChunk instance (e.g. using Data.ByteString.hPut instead of hPutChar).

I hope this will serve as a very basic introduction to output when using enumerators. In addition to a generic writer like this, it may frequently be beneficial to define special-purpose writers. In a future post I will show a writer that seeks within the output file using a threaded State monad.