Sound and Software: haskell

Showing posts with label haskell. Show all posts

Saturday, November 22, 2014

Exception handling and cleanup

So there's been a bit of a slog recently on whether or not cleanup actions (e.g. stuff you'd put in the second argument to bracket or finally) should be run in an implicit uninterruptibleMask. I have argued that this is a desirable change. I'm writing this post mostly to lay out the logic I've used to reach (and support) this position, because it's a bit tricky (much like the issue itself).

A bit of background: currently, if an interruptible function is used within an exception handler, asynchronous exceptions may arrive and cause the handler to behave in unexpected ways. In particular, if an async exception arrives during an interruptible period in a cleanup action run by bracket[1], you can get resource leaks. This is noted in the documentation, but it can be surprisingly difficult to get right. Notably, this has resulted in several bugs in GHC itself (details in the thread).

Two arguments have thus far been presented against the proposal, one that the change is undesired and secondly that more caution should be exercised before making such a silent semantic change, especially as it's possible that currently-working (although likely resource-leaking) programs would deadlock. I reject the first argument (for reasons explained below) and have some sympathy for the second, although I believe the benefits outweigh the costs. I'll refer to the argument that this change is undesired as Undesired below.

Basically there are two cases of "complex" cleanup actions that have been under discussion. Here are naive (incorrect) examples of each:

cleanup1 (Structure h1 h2) = hClose h1 >> hClose h2

cleanup2 var = withMVar var >>= hClose

The first, cleanup1, is a handler that has to perform multiple, unrelated actions. The second, cleanup2, is a handler that has to perform a blocking call to access a resource, then close that resource.

Undesired argues that cleanup1 is better written hClose h1 `finally` hClose h2. I agree that is a better solution (although I tend to use nested brackets myself). That structure is nearly fully exception-safe (except hClose itself isn't async-safe), and it handles synchronous exceptions properly. In particular, uninterruptibleMask is neither necessary nor sufficient to write cleanup1 properly[2].

There is no way to safely write cleanup2 without using uninterruptibleMask[3]. It is possible to write a much more sophisticated handler that could eventually time out and possibly take an alternative action, but I don't believe I've ever seen an example of such code. Even then, it still wouldn't close the handle.

Undesired further argues that adding an uninterruptibleMask to bracket will tend to make people think that the naive handler in the first case is safe, even though it isn't, and therefore we shouldn't add it. Users should instead know to use finally appropriately in that case, and uninterruptibleMask appropriately in the second.

I reject this conclusion. I believe the second case is quite common, likely more common than the first. Both Handle and Process use this structure internally (and hence are both vulnerable to async-exception issues that can only be fixed with uninterruptibleMask, which their cleanup functions do not currently employ). I also think it's quite common in client code when there's a shared structure that requires cleanup. Such cleanups can only be handled safely via uninterruptibleMask, and in general use of that function should be discouraged.

If we want to be leak-free (in the second case, and with hClose), we need to call uninterruptibleMask somewhere. Where should we call it? The obvious choices are in the cleanup action (e.g. within hClose), within bracket, or manually by the user. I reject the last as being undesired and error-prone; we certainly don't want the common idiom to be .. `finally` (uninterruptibleMask_ (hClose h)). I believe within bracket is better than within hClose. First of all, we really only expect exception-safety in the context of a function like bracket. Having a bare hClose is already not regarded as a safe idiom, so having it be vulnerable to async exceptions in another manner should not be a problem. Given that, I believe it's better to change the semantics of bracket so that all such handlers will work properly, rather than requiring that Handle, Process, and others be fixed (and countless 3rd-party libraries audited. I have at least two of my own I need to audit.).

Some opponents of the proposal have presented various control-flow constructs built upon bracket that would lead to deadlock under the proposed semantics. If hClose et al used uninterruptibleMask internally, those control flows would not be expressible at all, unless a new function "interruptibleHClose" were added. Similarly for Process and 3rd-party code. If instead bracket is changed, then we'd need to preserve the current semantics as "interruptibleBracket". Anyone relying on this specific control flow would just need to change from bracket to interruptibleBracket to preserve the current semantics, instead of needing to change hClose, process termination functions, dbClose, etc.[4]

Changing bracket means we'll probably fix a bunch of currently-buggy code (anything like cleanup2 that doesn't use uninterruptibleMask, including Handle and Process cleanups), and it will be less work to adapt for anyone relying upon the current semantics.

As I mentioned above, IMHO there is merit to the argument that we should be cautious when making a silent change like this. However, I think the pain is unavoidable. The current situation is not async-safe, and cannot be made async-safe without changing semantics (i.e. adding uninterruptibleMask somewhere). Here are all the alternative possibilities that have thus far been discussed:

Users need to be intimately familiar with exceptions and interruptible operations to write truly exception-safe code, including knowing when to wrap built-in functions in uninterruptibleMask (the status quo).
Add uninterruptibleMask internally to hClose, certain process-related functions, and anywhere else similar "cleanup2". Anyone relying upon the current interruptible behavior must audit their code, and (if available) switch to interruptible variants. 3rd-party code must all be audited. Concerns about unkillable threads/deadlocks apply.[5]
Add uninterruptibleMask internally to bracket, finally, etc. Anyone relying upon the current interruptible behavior must audit their code, and switch to interruptible variants. 3rd-party code will be async-safe, although if it's currently vulnerable to sync exceptions it will remain vulnerable. (the proposal). Concerns about unkillable threads/deadlocks apply.

Of these choices, I consider the first untenable. I simply do not believe we should expect this level of async exception sophistication from most users. I will gladly accept the possibility of unkillable threads in this context to achieve the greater exception safety of 2 or 3, of which I contend 3 is better. It places the least burden of knowledge on the greatest number of users, involves less code-change than the second option, and also leads to the greatest safety by default.

[1] From now on I'll just write "bracket", but will generally mean the whole cleanup-handler family of functions.

[2] Except uninterruptibleMask is the only way to make hClose fully async-exception-safe.

[3] Or its semantic equivalent. It's often possible to change from a blocking operation to looping over a non-blocking operation under mask, but this has the same effect of making the code unresponsive to async operations.

[4] Given the examples that have been presented thus far, I strongly believe that any user relying upon the interruptibility of cleanup operations understands the Haskell exception system intimately and therefore is in the best position to adapt to any semantic changes of exception-handling functions.

[5] I expect the extra risk from changing bracket to be uninterruptible to be minimal over changing hClose etc, as Handles are ubiquitous.

Thursday, October 4, 2012

runtime meta-programming in Haskell

In the spirit of Edward Yang, half-baked ideas lie ahead

There are many reasons programmers may want to use metaprogramming facilities, but one of the most common is for greater efficiency. The premise is simple: if runtime data becomes available at different times, a metaprogramming environment may allow the programmer to generate specialized versions of code based upon certain data which is known sooner, leading to more efficient computations.

A running example in a great deal of literature is the power function. Given pow a b = b ^ a, for certain values of a a much more specialized function can be created, e.g.


pow 1 b = b
pow 2 b = b*b
pow 4 b = let sq = b*b in sq*sq
pow n b = b ^ n

but writing out all possible cases like this is far from ideal. It's tedious and error-prone, and it still won't result in the hoped-for performance gains. Another problem is that it's generally not possible for the programmer to enumerate every possible case. A better solution is desired.

Template Haskell is generally disparaged as a possible solution because it's a compile-time facility. A meta-programming approach to the pow function would be this Template Haskell expression:


mkPow 0 = [| const 1 |]
mkPow n = [| \x -> x * $(unMeta $ mkPow (n-1)) x |]

while this works, Template Haskell splices may only be evaluated at compile-time. For a wide range of interesting metaprogramming problems, template haskell simply isn't available. However, by using the GHC API, we can make use of Template Haskell during runtime. By using a package like plugins or hint, the code is remarkably simple.

I've placed a proof-of-concept implementation on github. meta-expressions are created in the Meta newtype, which wraps Template Haskell's ExpQ type. The Meta monad has an extra phantom type parameter to assist with type safety and inference. Here's a simple example of Meta in use, with mkPow essentially as defined above:


{-# LANGUAGE TemplateHaskell #-}
{-# OPTIONS_GHC -Wall #-}

module Data.Meta.Example where

import Data.Meta.Meta
import Data.Meta.ExampleExpr

main :: IO ()
main = do
  n1 <- readLn
  let expr1 = mkPow n1 :: Meta (Int -> Int)
  n2 <- readLn
  let expr2 = mkPow n2 :: Meta (Int -> Int)
      expr3 = compose expr1 expr2
  result <- metaCompile expr3
  case result of
    Left err -> putStrLn "failed" >> putStrLn err
    Right fn -> do
      j <- readLn
      print $ fn j
      k <- readLn
      print $ fn k

Incidentally, this approach seems remarkably similar to the ideas in Geoffrey Mainland's presentation at ICFP 2012. The big differences are that he defines his own quasi-quoters instead of using Template Haskell, and he also uses a compiler other than GHC. Both of these are necessary if you want to metaprogram in a language other than Haskell, naturally.

Tuesday, June 12, 2012

Language-ObjC released

I have uploaded a new package to hackage, language-objc. This package provides an AST, parser, and pretty-printer for Objective-C. It's a fork of the excellent language-c package, so if you're familiar with that you should be able to pick up language-objc pretty quickly.

Source repository is on github.

If you "cabal unpack language-objc; cd language-objc-0.4.2.0/examples/", there are a number of examples that can get you started. For the truly impatient:


import Language.ObjC
import qualified Data.ByteString.Char8 as B

-- parsing bytestrings
objcTranslUnit = B.pack "int a=0;"

errOrAST = parseC objcTranslUnit nopos

-- parsing from a file
fileAST sourceFile = do
    result <- parseCFile (newGCC "gcc") Nothing [] sourceFile
    return $ either (error . show) id result

A few things to note:

Input files must be preprocessed. 'parseCFile' will do this for you. Non-preprocessed source can only be parsed in very limited situations (did you know that 'id' is a typedef? And 'id' is also commonly used as an identifier to methods/functions, even in system libraries, so it can't be a keyword?).
The objective-C extensions have only been lightly tested, but I think most of the core functionality works. I am unaware of a full specification for the language, which makes it difficult to say if it's complete.
Most of the code in Language.ObjC.Analysis is not yet compatible with Objective-C extensions.
One difference from language-c is that certain GNU attributes can no longer be parsed due to grammar ambiguities.
Attempting to compile at -O2 requires a lot of memory due to the giant Happy parser that's produced.

I am most grateful to Benedikt Huber, Manuel Chakravarty, Duncan Coutts, Bertram Felgenhauer, and all contributors to language-c, as they're responsible for most of the code. Bugs are mine. Comments and patches welcome.

Monday, June 11, 2012

Understandings of Iteratees

Although still relatively new, iteratee/enumerator style I/O has achieved a fairly large userbase in the Haskell community. Like many young technologies, the design space is fairly broad, with several different packages available from Hackage. Upon surveying the alternatives, I came to realize that there are so many alternatives in part because different developers have chosen to focus on different properties of iteratees. Doubtless this is a contributing factor to much of the confusion surrounding iteratees; when a user's mental model is closely aligned to the developer's, the package is likely to be much easier for the user to understand.

This post isn't meant to explain how to use iteratees, rather to explore what I believe are factors that influenced various design decisions in different packages. In some cases I have first- or second-hand knowledge of those decisions, but sometimes I'm extrapolating from the code. If you find one or two packages to be sensible and can't understand why anyone would use something different, you may find enlightenment here. Or not. For those who've never used an iteratee package, I hope that these posts might help point you to a good starting option. Wizards probably know this already. Note that just because I think the designers have a particular viewpoint for their code doesn't mean that other elements won't be present; these are all just different ways of looking at iteratees. All packages will have these properties to some extent.

Glossary

Some definitions, to be perfectly clear (the adventurous can skip this)

iteratee: some sort of resource consumer. The defining feature of an iteratee is that it provides a mechanism to pump it with data until it returns a result. I may use "iteratee" generally, referring to code in an iteratee style.
enumerator: some sort of resource provider. It stuffs data into an iteratee until it gets a result.
iteratee style, iteratee/enumerator style: writing code that makes use of iteratees, enumerators, and other associated functions
iteratee package: a library to provide functions used to write in iteratee style.
underlying monad: for iteratees that are monad transformers, the monad which is being transformed. Often either IO or a monad stack based on IO.

1: Iteratees as Resource Managers

One of the design goals of Oleg Kiselyov's original iteratee work was improved resource management. Lazy I/O can make it difficult to reason about resource management (half-closed?), while strict I/O traditionally does not compose well (compare to network programming in C). With iteratees, these problems are much closer to being solved. Input streams are opened on demand, evaluated strictly, and closed immediately, with no possibility of the handle or other resource leaking.(1) And iteratees are quite composable (perhaps too much so!), solving the other issue as well.

These properties make iteratees extremely useful for network programming, as network sockets are a scarce resource. However, this usage highlights a particular drawback: iteratees solve the resource management problem for input, but don't address it for output.

I believe that for this issue, all developers currently agree the best solution for output resource management is to use a monadic region, such as that provided by regions or ResourceT . This is currently the only approach guaranteed to run cleanup code with complicated monad stacks in the presence of exceptions.(2) In some cases Control.Exception.bracket can be used, however it's often not possible to call it at a convenient time, leaving resources open for longer than necessary.

Having established that a monadic region is necessary for output resource management, using the same mechanism for input resource management as well is a reasonable decision. Developers taking this view may then write their package so that even input streams may cause resource leaks without the resource management layer, unlike the original design.(3)

2: Iteratees as Stream Processors

Iteratee packages with a stream processing bent focus on composibility as a design goal. The libraries usually provide a multitude of stream processors. They are frequently designed with a UNIX-pipeline-style model in mind; data can be passed through one processor into the next and so on, until finally the last processor in the chain consumes the resulting data. There may be clear demarcations between produces, consumers, and processors, or perhaps not. There are frequently combinator functions to make these connections, sometimes with a syntax inspired by unix pipes. Many of the functions will be similar to traditional Haskell list-processing functions, such as maps, accumulators, and filters.

People with this viewpoint tend to view stream processor composition, which Oleg refers to as an "Enumeratee", as the most important/useful means of composing iteratees.

Although I've presented stream processing and resource management as two different viewpoints, I addressed them together because packages generally can't exclude either element. One approach is to divide responsibilities, with a dedicated resource manager and combinators that can use those resources. Other packages may use the iteratee's own handling of input resource management, leaving output resource management to the user's discretion.

Conclusion

For the curious, I would place conduit and iterIO squarely in the stream processor camp, with enumerator having strong leanings this way as well. To some extent all iteratee packages provide similar stream-processing capability; I make these judgements based upon what I think the designers think iteratees are for. If your favorite iteratee package hasn't shown up yet, maybe next time!

Notes

(1) There is at least one way to force a resource to be open longer than intended. Since iteratees are generally implemented as monad transformers, it is possible to craft an iteratee over ContT IO that generated a lazy stream from an input source, available within another iteratee. The resource would be closed when the outer iteratee was evaluated, making the outer iteratee a type of monadic region. I consider this an egregious abuse of Cont, however, and extremely unlikely to happen accidentally.

(2) I'm fairly outspoken in my dislike for exceptions. They mostly shouldn't be used. The main purpose of exceptions is to make it easy to reason about how to deal with various conditions, and they suck at this. There are basically three kinds of exceptions:

Stuff you can fix, like a file not existing. This should be modeled by Either or Maybe or something similar. It's much easier to reason about a Maybe than having to figure out where in IO some result will be evaluated. This is by far the most common reason programmers use exceptions, and (in Haskell at least) it's the worst choice because Haskell provides a better mechanism for dealing with this problem.
Stuff you can't fix, like out of memory conditions. The only sensible thing to do is terminate the program and let the OS clean it up. You shouldn't try to catch these, and if you do catch one you shouldn't try to do anything, because that might fail too. I suppose it's okay to model these with an exception, but maybe they shouldn't be exposed to the program logic at all.
Asynchronous exceptions. I begrudgingly accept that these may need to be used in concurrent code.

(3) The basic resource provided by an OS is generally a cursor, e.g. handle. Oleg Kiselyov argues that cursors and enumerators are isomorphic, although enumerators have certain desirable properties that should make them preferred. The traditional design of an iteratee package is to provide enumerators that wrap access to a cursor in a closure, preventing the cursor from leaking. Packages like pipes or conduit that require a monadic region for safe access to input hold the cursor as data and don't provide enumerators-as-closures. In essence they evolved from cursor to enumerator back to cursor. Because of this I'm not entirely convinced they should be classed as "iteratee packages" at all, though they certainly occupy the same design space.

Sunday, April 29, 2012

Adding state support to BNFC

The BNF converter is pretty cool tool. The idea is simple: provide a grammer written in Labelled BNF, and it spits out a lexer, parser, pretty printer, and abstract syntax tree. The very nice part is that the generated code can be in several formats: Alex/Happy/Haskell, flex/bison/C/C++, or JLex/CUP/Java. It's especially convenient when you're developing a language, as you can make changes as you go and automatically keep your AST, parser, and printers in sync.

BNFC takes as input a file written in Labelled BNF, which is essentially a BNF grammar with a label attached to each production rule. Brackets "[ ]" indicate lists rather than optional items, and there are few extensions from minimal BNF other than pragmas to define lexer tokens and positions.

Writing one grammar file instead of 4 separate modules is very attractive. Unfortunately BNFC has a serious problem - only context-free grammars are supported.

LBNF is sufficient to represent context-free grammars, and within this domain the system works very well. But despite the claims of the LBNF Report, many interesting languages are not context-free, and they still need to be lexed and parsed. The most notable example is probably C, in which a string can be lexed as an identifier or a type name, depending on previous typedefs and other things. I present here a small set of extensions to enable support for passing state between the lexing and parsing stages, enabling a wider variety of parsers to be constructed from LBNF.

A darcs repository of BNFC, with state support as described here, is available from http://www.tiresiaspress.us/haskell/bnfc/. These features are only supported in the Haskell/Alex backend (Alex-3 is required).

An initial examination

The approach I took is founded on two simple ideas. First, the lexer needs to access to a state table to determine the correct token, and secondly, the parser needs to update the state table in response to certain rules.

Lexing rules

In LBNF, tokenizing rules are specified like this:

token Ident (alpha [alpha|digit|'_']*) ;

where "Ident" is the type of the token produced, and the remainder is a regular expression to determine valid "Ident" strings.

A stateful token rule is similar. The implementation supports multiple sets of state, so it's necessary to specify which state set the tokenizer should check.

state token TypedefName typedefNameSet [alpha|digit|'_']+ ;

First the lexer matches the regular expression against the input string. It then takes the match and checks if it's present in "typedefNameSet". If so, a token of type "TypedefName" is produced. Otherwise an "Ident" is produced.(1).

Parsing Rules

Standard production rules are extended with pragmas that specify state update annotations:

-- after this rule, the final Ident will be lexed as a TypedefName
Etypedef.  Declaration ::= "typedef" TypeName Ident <% state typedefNameSet add 3 %> ;

The number "3" specifies that a token string derived from the 3rd production (Ident) is to be added to the set "typedefNameSet".(2)

Similarly, names can be removed from a state set:

-- after this rule, the second token will be lexed as an Ident
UnTydeclarator. Direct_declarator ::= TypeName TypedefName <% state remove 2 %> ;

When removing a token from a set, it's unnecessary to specify the set the token is being removed from.

Extended support

Initial Values

Initial members of the state sets can be specified in the grammar via the pragma TODO.

Scoping

The implementation maintains a stack of state values in order to support lexical scoping of changes. Two additional pragmas, "state push" and "state pop", provide access to stack manipulations. A common usage would be:

PushParen.  PushParen ::= "(" <% state push %>
PopParen.   PopParen  ::= ")" <% state pop %>

It is an error to pop the last frame off the state stack.

The implementation only supports one stack for all state sets. It is not currently possible to represent multiple state sets with differing scoping rules.

A drawback of the current implementation is that production rules such as PushParen and PopParen are no longer terminals. Therefore the implementation creates abstract representations for them, although no additional meaning is conveyed than if they were omitted completely. I hope to address this in the future.

Future Work

Deriving the state string

A major question is, how to determine which token is added to the state set for a given production? The above cases are simple. In a production rule like:

Etypedef.  Declaration ::= "typedef" TypeName Ident <% state typedefNameSet add 3 %> ;

"Ident" is a token, so it's simple to derive the string that should be added to the set. However, it's frequently desirable to modify state based upon a non-terminal rule, such as:

Rule. SomeProd ::= Declaration <% state stateset add 1 %> ;

How should the token be determined from the Declaration?

The current implementation only supports state adjustments based upon tokens and lists of tokens, e.g. [Ident]. Code generated from rules like the above "Rule" currently results in a compile-time error due to missing instances. The instances can be manually provided in a separate file, although creating this file can be tedious. I plan to add an additional pragma to the LBNF grammar that would allow for the necessary instances to be automatically generated.

Notes

(1) - The astute reader will notice that the TypedefName regex matches strictly more identifiers than the built-in "Ident" regex. Therefore this match provides a "backdoor" into Ident, allowing strings that don't match the "Ident" regex to actually be produced as "Ident"s. I find this behavior useful, and in any case it can be mitigated simply by using the Ident regex for state tokenizing rules. Another possibility would be to use a user-provided alternative in case the match fails.

(2) - An alternative notation would omit the numbers and place the pragma adjacent to the production.

Sunday, March 25, 2012

Pure delay lines

While working on my dsp-extras project, I needed a pure delay line. Based on previous work, I used Data.Sequence.Seq, and was somewhat surprised to see a very large performance hit when compared to an undelayed signal. Obviously there was more to creating pure buffers in Haskell than my previous microbenchmarks revealed. I resolved to get the performance back.

I created a test script to measure the performance of different delay line implementations in a real-world application. The application simply reads in a sound file, applies a fractional delay, and writes the output. I measured the performance with buffer sizes ranging from 1 to 10000 samples. Due to interpolation (I used cubic interpolation for all my tests), I couldn't actually measure a 1-sample buffer; the program actually creates a 4-sample buffer for the smallest possible delay. All other buffer sizes are accurate though.

I tested the following buffer types:

Data.Sequence.Seq - cons elements onto the front and drop from the tail as each element is pushed
Data.Vector.Unboxed.Vector - a naive copying implementation
T10 - a 10 element, strict data type (e.g. data T10 a = T10 !a !a ...)
T25 - a 25 element, strict data type
Evil - based upon Data.Vector.Unboxed.Vector, the Evil buffer thaws the vector, mutates a single element, and returns the updated vector. This buffer is closest to an imperative circular buffer implementation, although it violates referential transparency. Hence the Evil moniker.
ComboBuffer - the ComboBuffer consists of two unboxed vectors and an update list. After the update last grows large enough, it's frozen into a vector and the oldest data is dropped. Writing elements is performed in amortized O(1) time. The read behavior is more complex.

The results are summarized online here. A few notes about interpreting the results:

Given times are the result of a single run, although they are fairly stable over multiple executions.
The chart shows times in a log scale to help differentiate times for small buffers. Otherwise the O(n) behavior of Vector obliterates these differences.
Current is the behavior of an algorithm that adaptively selects a buffer type based upon size.
All the results (except one) are for reading off the end of the buffer. For most buffer types this shouldn't matter. It's optimal for Seq. For the ComboBuffer, reads from the last half of the buffer can be done in O(1), but reads from the first half are more complex, as a list lookup may be required. The Current - first quarter data is the result of accessing points 0.25 of the way into the buffer.

So for reads from the end of the buffer, my ComboBuffer is pretty good. It needs some tuning though, especially for reads from near the front of the buffer.

Faking it with higher rank (Existential Quantification pt. 2)

In my previous post I provided a short background on Existential Quantification. In this post, I'll provide an alternative for some uses and suggest why it may be preferred.

Suppose we want to create a data store that uses lists for small amounts of data, but otherwise uses vectors. We can easily use E.Q.

import Data.Vector (Vector)
import qualified Data.Vector as V
class Buffer c where
  type El c :: *
  init :: Int -> El c -> c
  (!)  :: c -> Int -> El c
  push :: c -> El c -> c
 
instance Buffer [a] where
  type El [a] = a
  init = replicate
  (!)  = (!!)
  push lst el = el : init lst
 
instance Buffer (Vector a) where
  type El (Vector a) = a
  init = V.replicate
  (!)  = (V.!) 
  push vec el = V.cons el $ V.unsafeInit vec
 
data EBuffer a = forall c. (Buffer c, El c ~ a) => EBuffer c
 
initBuffer :: Int -> EBuffer Double
initBuffer sz
  | sz < 10   = EBuffer (init sz 0 :: [Double])
  | otherwise = EBuffer (init sz 0 :: Vector Double)

and now we have our existentially-quantified buffer.

Unfortunately this code doesn't offer anywhere near the hoped-for performance (at least not for the problem that led to this post). The problem is that now that the actual type of buffer is hidden in the EBuffer data, the compiler is no longer able to inline function calls to (!) or push. Instead it needs to go through the dictionary methods, adding a lot of extra overhead. Even worse, a lot of further optimization opportunities are lost.

It turns out that there is another approach that doesn't suffer from this problem. The approach is based on higher-rank types. Normal Haskell code uses rank-1 type variables, which means that any variables in a type signature are instantiated by the calling code. However, with the RankNTypes extension, it's possible to write code of any rank desired. Consider the function

hr1 :: (forall a. a -> a) -> (Char, Int)
hr1 myId = (myId 'a', myId 0)

The first parameter to hr1 is a rank-2 function. It needs to be a function that can take an input of any type, and return a value with the same type. The only total function that can do this is id, and calling hr1 id returns exactly what we would expect.

(What happens if you change the type signature to hr1 :: (a -> a) -> (Char, Int) ? )

Armed with higher-rank types, we can replace the existentially-quantified type in two steps:

1. Refactor whichever code uses the data so that it's in the form someFunc :: Buffer buf => buf -> AResult. This part usually isn't too difficult.

2. Create another function like following:

runWithBuffer
  :: Int
  -> (forall buf. (Buffer buf, El buf ~ Double) => buf -> b)
  -> b
runWithBuffer sz fn
  | sz < 10   = let buf = init sz 0 :: [Double]
                in fn buf
  | otherwise = let buf = init sz 0 :: Vector Double
                in fn buf

With our new runWithBuffer function, we've successfully decoupled the buffer selection algorithm from the usage site without using existential quantification. Plus, in many cases GHC generates better code with more opportunities for further optimization.

A slightly more extensive example is available in my dsp-extras codebase. The class and various buffer types are in the Data.RingBuffer modules. The buffers are used in Language.AuSL.CompileArr, with the function optimizeBufType performing the duties of runWithBuffer.

Saturday, March 24, 2012

Existential Quantification pt. 1

If you have experience with OOP, you're already an expert on existential quantification. In brief, existential quantification is the reason why you can write code like this pseudocode:


class MyList: 
  private List myList;
  init(Int sz):
    if sz < 10: 
      self.myList = new LinkedList();
    else:
      self.myList = new ArrayList();

In this example the private member variable myList is existentially quantified. For any instance of MyList, after init is called the compiler knows that a data structure exists and that it implements the List interface, but it doesn't know the exact type of myList. Any time you declare a variable of an interface type, the data structure referenced by the variable is existentially quantified.

One of the most common uses of existentially quantified data is in implementing the Strategy design pattern, which allows for an algorithm to be selected at runtime. The MyList class above hints at this usage; for very small lists it uses a LinkedList but otherwise uses an ArrayList. The idea of selecting an optimal algorithm at runtime based upon some known quantity is very common in high-performance code (e.g. fftw, Atlas, introsort, many others), and existential quantification is a very common way to implement it.

It's very common for programmers with an OOP background to assume that existential quantification and interfaces always go together, which leads to surprise that the following Haskell code doesn't compile:


getNum :: Num a => Char -> a
getNum 'd' = 0 :: Double
getNum _  = 0 :: Int

In OOP languages, this works. Both Double and Int are instances of Num, and the only promise the type of getNum makes about its output is that it has a Num instance. However, type variables aren't existentially quantified in Haskell. This means that, instead of returning whichever Num instance it likes, getNum must be able to provide any Num that the caller demands.

However, thanks to the ExistentialQuantification language extension, e.q. is supported in Haskell (GADTs also provide this feature and more). It can only be used within new data types however, not bare functions. So we need to make a wrapper to hold existentially quantified data:

{-# LANGUAGE ExistentialQuantification #-}
data ENum = forall a. Num a => ENum a

getNum :: Char -> ENum
getNum 'd' = ENum (0::Double)
getNum _   = ENum (0::Int)

Now we have our existentially-quantified data, and all is right with the world. For now.

Friday, September 16, 2011

splaytree package released

I've just uploaded a new package to Hackage, splaytree-0.1. Splaytree provides pure functional splay trees, implemented using monoidal annotations. A splay tree is a self-balancing tree with the property that recently-accessed nodes remain near the root, and thus subsequent accesses can be quite cheap. As for the monoid annotations, it's the same technique as used in finger trees (see here, here, and here). The idea is that each node of a tree holds an annotation in addition to data. If the type of the annotation is chosen properly, it can be used to prune branches when searching. One clever aspect of this technique is that different data structures can be implemented simply by choosing different annotations.

Currently three different data structures are available, Data.SplayTree.Set, Data.SplayTree.Seq, and Data.SplayTree.RangeSet. 'Set' provides a set with standard operations. Not much time has been spent optimizing this library, but performance is competitive with unordered-containers for many operations (inserts can be quite slow though). Seq is a sequence type, similar to Data.Seq.

RangeSet is perhaps the most interesting of the available containers. Similar to an IntervalSet, a RangeSet is used for storing a list of Ranges. A RangeSet, however, guarantees that all ranges are non-overlapping. When a new range is inserted into the RangeSet, it is combined with any ranges currently in the set with which it overlaps.

Example (Note that a Range is a starting point and duration):


Prelude Data.SplayTree.RangeSet Data.Foldable> let rs = singleton (range 0 4)
Prelude Data.SplayTree.RangeSet Data.Foldable> toList rs
[Range {rMin = 0, rang = 4}]
Prelude Data.SplayTree.RangeSet Data.Foldable> toList $ insert rs (range 3 2)
[Range {rMin = 0, rang = 5}]
Prelude Data.SplayTree.RangeSet Data.Foldable>

Comments and patches welcome.

Friday, September 2, 2011

My mind is blown

Recently I've read two blog posts which have really started to change how I think about programming. A while ago I found a solution to the word numbers problem which completely stunned me with its simplicity and generality. Today I stumbled upon Apfelmus's "monoids and finger trees" blog post, which is very timely since I can use the same approach to both simplify and generalize a data structure I've been working on recently.

With this in mind, I would like to move from my current awareness (being able to follow along with these discussions) to being able to recognize and apply very general structures. Somewhat tongue-in-cheek, I'd like to know how to make good use of Edward Kmett's packages. Advice would be appreciated.

Monday, April 11, 2011

Thoughts on FRP

(N.B. Code presented in this post is theoretical. I haven't tried to compile it, although I do expect it would work.)

Although various implementations have been under development for some time, FRP has yet to reach a really satisfying breakthrough. The best semantic model has been that of Conal Elliot's Reactive, but unfortunately it lacks a Smart Enough Compiler and suffers from subtle space leaks. The most performant is probably Yampa, which is arrow-based. Arrows turn some people off; I personally find them too verbose and don't like the syntax for recursion.

Apfelmus has recently released a new FRP library, reactive-banana, which shows some early promise towards being a best-of-all-possible-worlds. In this post I'll examine some of the features of reactive-banana to see why I think this.

Introduction

Reactive-banana is very similar to the semantic model provided by Reactive, although much more minimal. One fundamental data type is the Event, which can be thought of as a list of values with their time of occurrence, i.e. Event a = [(Time, a)]. The other fundamental type is the Behavior, which can be thought of as a value that changes with time, i.e. Behavior a = \Time -> a. This is according to the reactive-banana documentation, which also notes that in this implementation Behaviors aren't continuous but are actually step functions.

So how does it look? Consider a simple system with a button and a toggle: when the button is pressed it generates a 0 if the toggle is off, and 1 when the toggle is on. We could model it like this:


togEvent :: Event Bool
buttonEvent :: Event ()

-- Upon receiving an Event (), mixEvents generates an Event Int
-- The value of the Int depends upon the most recent Event Bool status
mixEvents :: Event Bool -> Event () -> Event Int

The togEvent and buttonEvent sources are created by connecting to the window framework's event system (see reactive-banana-wx for an example), but mixEvents is more complex. We could model it with the pure function


mixEventsF :: Bool -> () -> Int
mixEventsF True _ = 1
mixEventsF _    _ = 0

so we just need to find a way to lift this into the FRP system. Behaviors have an Applicative instance, so let's try it:


mixEvents = apply (mixEventsF <$> behavior False togEvent) buttonEvent

and we've got it. If we prefer, we could define mixEventsF :: Bool -> Int, but then we'd need to fmap const onto the Behavior to create a Behavior (() -> Int) we could apply to an Event.

So how does this work? The Behavior functions as an Event Accumulator. It stores every Bool received from togEvent, and when a button press event is received creates an event from applying the Bool and () to the mixEventsF function. One clue that this is the proper way to think of Behaviors is by the different ways they can be constructed:

lifting pure values, and always (these are semantically equivalent, pure is defined as always)
the behavior smart constructor (only hold the most recent Event)
various accumulate functions on Events

So if a Behavior is an event accumulator, why do we need it at all? Why not just use events directly, explicitly storing values when we need them? In my view, the answer has to do with abstracting implementation from the user.

Push vs Pull

One tension in the FRP design space relates to a push-driven (data-driven) vs pull-driven (demand-driven) architecture. Push-driven offer theoretically better performance, as data sources typically change relatively slowly (e.g. mouse clicks relative to screen refresh), so this minimizes recomputation. However, pull-driven seems to fit the functional paradigm better. Also, if a Behavior models a continuous function, pushing updates at the sample rate could be very wasteful.

The Behavior type (at least in reactive and reactive-banana) moderates between these two models. We can see how this works by examining the semantics of our mixEvents function.


-- Upon receiving an Event (), mixEvents generates an Event Int
-- The value of the Int depends upon the most recent Event Bool status
mixEvents :: Event Bool -> Event () -> Event Int

Our implementation created a Behavior (() -> Int) by combining a pure function with the Event Bool stream. This behavior is then sampled whenever an Event () is received. By converting a discrete Event into something which can be later sampled, the Behavior localizes a change from push-driven to pull-driven. Thus Event Bools are stored and pulled from as necessary, while Event ()s are pushed through and become Event Ints, which are then pushed until they're also changed into Behaviors (or reactimate'd).

Continuous Time

Another Behavior feature is that semantically (traditionally, but not in reactive-banana) they are continuous functions over time. There are a lot of reasons to like this feature, which is notably missing from the reactive-banana semantic model. Originally I was somewhat skeptical, however I've come to the conclusion that the reactive-banana model of Behaviors as step functions is more correct. Since a Behavior is an accumulator of Events, it makes sense that the Behavior would change only when new Events are received. As an Event is a datum at a discrete time, the Behavior must therefore be a discrete-valued function. In particular, there is a conflict inherent in overloading the meaning of a Behavior to both respond to Events and be continuous in time.

Furthermore, I don't think the programmer loses anything by the reactive-banana model. It's still possible to write functions that are continuous in time and lift them to Behaviors. This is because the reactive-banana framework has no notion of time. If you want a time value, you need to provide one yourself, perhaps via a hardware clock.


-- a pure, continuous function
-- using UTCTime to conveniently tie in with getTime.
createShape :: UTCTime -> Shape

-- an event source tied to the window toolkit's redraw event
redrawScreen :: Event ()

-- function to draw a shape to the screen
drawShape :: Shape -> IO ()

getTime :: IO UTCTime
getTime = Data.Time.Clock.getCurrentTime

reactiveSystem :: Prepare ()
reactiveSystem =
 reactimate $ fmap drawShape
                   (apply (pure createShape) (mapIO getTime redrawScreen))

Let's examine how this works. Fundamentally, when the window system requests a screen redraw, we want to check the current time, generate the correct shape for the time, and finally update the display. Our system provides the redraw requests via an Event (), which is mapped to the current time, creating an Event UTCTime. The createShape function is first lifted to a Behavior then applied to the Event UTCTime, creating an Event Shape. Finally the drawShape function is mapped over this, creating an Event (IO ()) which can be reactimate'd, i.e. each IO () is performed as it becomes available. In my opinion, by removing time from the reactive-banana framework, both the described systems and the implementation are greatly simplified, with no loss of expressiveness.

Simultaneitys

One aspect that seems to cause a lot of difficulty is simultaneous events, multiple Events which occur at the same time. There are a few ways these can arise, such as long sampling periods, merging Event streams, and recursive definitions. reactive-banana specifies that the order of simultaneous events is undefined, providing the orderedDuplicate function to deal with certain cases where timing is important.

I don't like this solution. Semantically, it would be nice that an event stream were strictly increasing in time, disallowing simultaneous events entirely. I suspect that this could be made usable by providing the following two primitive functions:


mergeCombine :: (a -> a -> a) -> Event a -> Event a -> Event a

-- When two events arrive simultaneously, create two output events with the right event delayed slightly
mergeOrdered :: Event a -> Event a -> Event a

The mergeOrdered function introduces a slight delay to preserve discrete events, whereas mergeCombine has a function to combine two events to one. mergeCombine handles the common cases of dropping one event via const and combining with mappend, as well as more specialized functions. I think these functions, together with the guarantee that an Event stream only produces single events, are sufficient for a workable system.

Conclusion

So that's my initial reaction to reactive-banana. The library is quite minimal, but so far it's sufficiently expressive for every problem I've thought of. The new semantics of Behavior seem conducive to developing a correct model of how the implementation works, thereby making it simpler for users to reason about performance and operations. Requiring users to explicitly manage time may eventually become a burden, however it's a direction worth exploring. For many problems user code doesn't depend upon time anyway.

Thursday, February 3, 2011

a better audio language

One of my frequent frustrations with audio programming languages is that they're relatively unsafe, at least when compared to languages with sophisticated type systems. The problem is generally worsened with EDSLs. Consider this example csound:


  aNull delayr 2
  atap  deltap 1
        delayw asig

Most csound dsl's will provide 3 functions corresponding to delayr, deltap, and delayw, which are translated to the dsl in a very direct manner. If the delayr is left out, the dsl code is translated to invalid csound, detected only by the csound compiler. I find this frustrating because we can do better.

What if the delayr generates a token, which is then required by all functions that want to use it?


 -- Haskell-mode now
 myDelay asig = do
   (dtok, _) <- delayr 2     adelay <- deltap dtok 1     delayw dtok asig     return adelay

This is better, but it's still possible to accidentally omit a delayw from the Haskell code. Again, this is undetected except by the csound compiler at the final stage. What we really need is a way to ensure that a user can run delays freely within the context of a delay line, but only in that context. In Haskell, computations within contexts are generally represented by applicative functors. Let's try that approach:


 -- don't export the constructor
 newtype DelayCtxt a = DelayCtxt { unDelay :: CsoundInterp a }

 deltap' :: CsoundInterp KSig -> DelayCtxt ASig
 deltap' dtime = DelayCtxt $ dtime >>= deltap

 runDelay :: CsoundInterp ASig -> CsoundInterp Float -> DelayCtxt a -> CsoundInterp a
 runDelay mAsig mtime delcomp = do
   maxdel <- mtime     asig <- mAsig     delayr maxdel     rval <- unDelay delcomp     delayw asig     return rval

This is more like it. Since the DelayCtxt constructor isn't exported, the only way to create a DelayCtxt value is with deltap'. The only way to consume a DelayCtxt is with runDelay, which automatically adds the required delayr and delayw.

Unfortunately the user still needs to manually add the maximum delay time, but that's unavoidable if you're going to support varying delay times. Consider that the input signal could be coming from a controller in real-time; to statically guarantee it's under a given bound would require a bounded signal type, which would be equivalent to this static delay bound anyway.

This is the motivating example behind X-DSP, which was presented at the 2011 SEAMUS conference at the University of Miami.

Tuesday, December 14, 2010

installing gtkglext Haskell bindings with gtk-osx

I just managed to install the Haskell bindings to gtkglext with the quartz-based gtk-osx, and I want to record the necessary steps so I don't forget. This assumes you have a working ghc (ghc-7), gtk-osx, and XCode. As a quick overview, I needed to do the following:

install gtkglext (c library)
install gtk (Haskell package)
install gtkglext (Haskell package)

1. Install gtkglext (c)

I downloaded the "Quartz hack full patched source" and followed the instructions at http://answerpot.com/showthread.php?324502-GtkGLExt+OS+X+Quartz+hack+patch. There are a few problems:
The file "docs/reference/gtkglext/html/index.sgml" doesn't exist.
The headers aren't installed.
Several references are made to undefined constants. Unfortunately this error doesn't show up until much later, when GHC tries to link every symbol under the sun.

jhbuild shell
tar -xf gtkglext-1.2.0.osx-hack.tar
cd gtkglext-1.2.0
patch -p1 < gtkglext_c.patch
./configure && make
touch docs/reference/gtkglext/html/index.sgml
sudo make install

2. Install gtk (Haskell)

The Haskell package "gtk" currently (as of gtk-0.12.0) doesn't build with a quartz-based gtk+. I downloaded the source ("cabal unpack") and edited the file "Graphics/UI/Gtk/General/Structs.hsc" to disable the drawableGetID function.


jhbuild shell
cabal unpack gtk
cd gtk
patch -p1 < gtk.patch
cabal install

If you installed the gtkglext libraries to a standard folder this should be all that's necessary. If cabal can't find gdkglext-quartz or gtkglext-quartz, you'll need to add the library paths at this stage.

3. Install gtkglext (Haskell package)

This was painful.

There are a few different problems to deal with:
1. The gtkglext install didn't install the headers, so we need to specify them manually.
2. We're not using pkg-config, so we edit the gtkglext.cabal file, remove the pkg-config check, and specify the libraries to link to.
3. Several symbols are undefined in gtkglext, so we need to remove references to them from the Haskell source. The undefined symbols are:

gdk_gl_config_get_screen
gdk_gl_config_get_depth
gdk_gl_context_get_gl_config
gdk_gl_context_get_share_list
gdk_gl_context_is_direct
gdk_gl_context_get_render_type

AFAICT these were never defined in gtkglext, although they're listed in the headers.

There's also another argument to glWindowNew, which we need to add.

Finally, the demo program needs to be patched, otherwise it just sits there and doesn't update.

So execute the following, replacing PATH_TO_GTKGLEXT with wherever you unpacked/built gtkglext:


jhbuild shell
cabal unpack gtkglext
cd gtkglext-0.12.0
patch -p1 < gtkglext.patch
cabal install --extra-include-dir="PATH_TO_GTKGLEXT" --extra-include-dir="PATH_TO_GTKGLEXT/gdk" --extra-include-dir="$PREFIX/include/cairo/" --extra-include-dir="$PREFIX/include/glib-2.0/" --extra-include-dir="$PREFIX/lib/glib-2.0/include/" --extra-include-dir="$PREFIX/include/gtk-2.0/" --extra-include-dir="$PREFIX/include/pango-1.0/" --extra-include-dir="$PREFIX/include/atk-1.0/" --extra-include-dir="$PREFIX/lib/gtk-2.0/include/"
cd demo
make
./RotatingCube

And you should get a new window with a colorful cube, spinning in full OpenGL glory.

I don't know why it's necessary to specify all the gtk+ header locations manually here, when they're automatically found for building gtk.

The patch files can be downloaded from here.

Thursday, March 12, 2009

future directions for iteratee

I have recently completed two major projects that were taking up nearly all of my time. Now that they are done, I'd like to work on the next version of the iteratee package. Here are a few ideas that I hope to include soon.

1. Remove the explicit Maybes
This was suggested by Oleg, and it seems to be a good idea. Every iteratee that produces a value other than () uses Maybe to represent the possibility of failure (always an option when doing IO). By incorparating the Maybe into the type of IterateeG, these will no longer need to be explicit. This will also make IterateeGM an instance of MonadZero to the extent that Maybe is an instance of that class.

Status: I've made a patch that does this, but it doesn't yet work properly with convStream. I haven't managed to track down the problem. Likely to be included, assuming I can solve this issue in a reasonable time frame.

2. Stream type class (and simplify StreamChunk)
I keep hoping for a ListLike type class to enter common use. Barring that, I have some ideas for breaking up the necessary functions of StreamChunk into separate type classes, such as the patch to use a monoid instance submitted by Bas van Dijk.

Status: Changes will be made, it is likely that StreamChunk will be broken into multiple smaller classes. Any addition of a Stream type class will wait until after point 4 is resolved.

3. More utility iteratees (foldl, filter, others?)
Status: Likely to be included. Changes to StreamChunk will make these easier to support.

4. Type-safe seeking
If iteratees are parameterized by Stream, the type of the stream should indicate if seeking is supported. I have an outline for how to implement this, but haven't done any work yet.

Status: Needs research, this will probably wait for the next major version bump.

5. Improved error handling
Bas van Dijk submitted a patch to change the type of a stream error from String to Error a. Others have suggested other possible changes as well.

Status: Needs more research, this will likely wait for the next major version bump.

enumerator/iteratees and output

I have recently received a few questions about writing output while using enumerator-based I/O. In some cases users have attempted to make enumerators (like enumFd) that will handle output as well, but have difficulty actually making it work.

I think these problems stem from an incorrect application of the enumerator model of I/O. When using enumerators, a file (or other data source) is a resource that can be enumerated over to process data, exactly as a list can be enumerated over in order to access the data contained in the list. Compare the following:

foldl f init xs

enumFd "SomeFile" ==<< stream2list

In the first function, 'xs' is the data to be processed, 'foldl' tells how to access individual items in the data collection, and 'f' and 'init' do the actual processing. In the second, "SomeFile" is the data, 'enumFd' tells how to access the data, and 'stream2list' does the processing. So how does writing fit in? The output file obviously isn't the data source, and it doesn't make sense to enumerate over your output file as there's no data there to process. So it must go within the Iteratee. It turns out that making an iteratee to write data is relatively simple:


> import Data.Iteratee
> import System.IO
> import Control.Monad
>
> writeOut :: FilePath -> IterateeGM [] Char IO ()
> writeOut file = do
>  h <- liftIO $ openFile file WriteMode
>  loop h
>  where
>  loop :: Handle -> IterateeGM [] Char IO ()
>  loop h = do
>   next <- Data.Iteratee.head
>   case next of
>    Just c -> liftIO $ hPutChar h c >> loop
>    Nothing -> liftIO $ hClose h

Add some error handling and you've got a writer. This version could be polymorphic over different StreamChunk instances by generalizing the type (FlexibleContexts may be required as well). Other stream-specific versions could be written that would take advantage of the specific StreamChunk instance (e.g. using Data.ByteString.hPut instead of hPutChar).

I hope this will serve as a very basic introduction to output when using enumerators. In addition to a generic writer like this, it may frequently be beneficial to define special-purpose writers. In a future post I will show a writer that seeks within the output file using a threaded State monad.

Tuesday, February 3, 2009

Build a better WAVE reader, pt 2

In the last post, I looked at the beginnings of using an Iteratee to create a WAVE file reader. I achieved a runtime of 25 seconds, compared to 1.9 seconds for my benchmark code.

Oleg's Iteratee code uses a Stream type which internally represents data as Chunk [a]. Haskell lists have many great properties, but they are not particularly efficient for this type of numerical processing. There are many better options, including Arrays, UVectors, and StorableVectors.

Unfortunately all of these options I would want to use introduce a type class restriction on types of elements in the array. This isn't a problem for sound files in particular, although it does greatly limit the utility of the Iteratees in general. One of the Iteratee examples shows how Iteratees can be layered for text processing, using IterateeGM String m a iteratees layered on IterateeGM Char m a iteratees to provide operations on words in a text stream. This is no longer possible, as String is not an instance of the type classes necessary to use one of these more compact data structures. In the end, the performance benefits should be worth it.

I chose StorableVector to replace the basic list, because it seemed handy while I was experimenting. StorableVector turns out to have another good property which is key to achieving the goal of performance close to C.

Adapting the Iteratees to use StorableVector is straightforward. Unfortunately when we've finished, performance is actually worse! In fact, it's as bad as the original RBIO-based version. It turns out that one key function is the culprit: conv_stream.

conv_stream converts between two streams. It is capable of converting multiple elements from one stream to one (or no) elements of the other, using a user-supplied conversion function. The wave file reader uses conv_stream to convert from a stream of Word8's to a stream of Doubles so the user can operate on Doubles, with the library handling all conversions transparently. The original type of conv_stream is:

conv_stream :: Monad m =>
IterateeGM el m (Maybe [el']) -> EnumeratorN el el' m a

Unfortunately, this means our data needs to unpacked from the StorableVector Word8 into separate Word8's, those Word8's assembled into Just [Double], and finally packed into a StorableVector Double for the next Iteratee. This is very inefficient, but easily fixed. Just change the type of conv_stream to

conv_stream :: (Storable el, Storable el', Monad m) =>
IterateeGM el m (Maybe (Vec.Vector el')) -> EnumeratorN el el' m a

and we're done. In fact, the implementation doesn't change at all from the original, just the type. Now no packing or unpacking need take place.

With the new conv_stream, it's necessary to change the conversion function. This is the IterateeGM which converts Word8's into Doubles. I have actually split this code into two parts, a function (called unroller because the original list-based version was a manually-unrolled endian_read) which converts Word8s into Word16 or Word32 as necessary, and a normalization function to normalize the result and convert from Words to Double. The normalization function should be fine as is, all that's necessary to change is the unroller function. Ideally, it would operate on StorableVectors directly. Fortunately this is possible. All that's necessary is to get a ForeignPtr Word8 from a StorableVector, and cast it to ForeignPtr Wordn. A somewhat hackish 16-bit converter can be implemented as follows:

import qualified Foreign.Ptr as FP
import qualified Foreign.ForeignPtr as FFP
import qualified Data.StorableVector as Vec

unroll_16 :: (Monad m) => IterateeGM Word8 m (Maybe (Vec.Vector Word16))
unroll_16 = liftI $ IE_cont step
where
step (Chunk vec)
| Vec.null vec = unroll_16
| Vec.length vec == 1 = liftI $ IE_cont $ step' vec
| Vec.length vec `rem` 2 == 0 = liftI $ IE_done (convert_vec vec) (Chunk $ Vec.empty)
| True = let (h, t) = Vec.splitAt (Vec.length vec - 1) vec
in
liftI $ IE_done (convert_vec h) (Chunk t)
step stream = liftI $ IE_done Nothing stream
step' i (Chunk vec)
| Vec.null vec = liftI $ IE_cont $ step' i
| Vec.length vec `rem` 2 == 1 = let vec' = Vec.append i vec
in
liftI $ IE_done (convert_vec vec') (Chunk $ Vec.empty)
| True = let (h, t) = Vec.splitAt (Vec.length vec - 1) vec
in
liftI $ IE_done (convert_vec $ Vec.append i h) (Chunk t)
step' _i stream = liftI $ IE_done Nothing stream
convert_vec vec = let (fp, off, len) = VB.toForeignPtr vec
f = FP.plusPtr (FFP.unsafeForeignPtrToPtr fp) off
fp' = (FFP.castForeignPtr $ unsafePerformIO $ FFP.newForeignPtr_ f) :: FFP.ForeignPtr Word16
in
Just $ VB.fromForeignPtr fp' (len `div` 2)

and that's it! The big hack is that this only works on little-endian platforms. To be correct, the bytes would need to be swapped on big-endian systems only. Implementing a suitable swap function is currently an excercise for the reader. Note that unsafeForeignPtrToPtr should be safe as there is no finalizer associated with the ForeignPtr, and in any case the memory pointed to won't be GC'd until after the Iteratee is complete.

At this point, the data should be handled entirely as a StorableVector, with no conversions to or from any intermediate lists at any point. Here are results from running the test:

$ time ./test_iter_sb
Just (AudioFormat {numberOfChannels = 2, sampleRate = 44100, bitDepth = 16})
Just 0.977568895535142

real 0m1.661s
user 0m1.335s
sys 0m0.140s

And the proof is in the pudding. This is the fastest Haskell implementation yet, even faster than hsndfile in a lazy stream.

A bit of work remains to be done. It should be cleaned up, with the hack fixed properly. Also I've currently only implemented unroll_16. For 24-bit audio, I expect the creation of a Word24 type and Storable instance will be necessary in order to employ the same technique. However, I think I've proven that a pure-Haskell, high performance audio I/O library is possible.

Monday, February 2, 2009

Build a better WAVE reader, pt 1

I have recently been working on creating an optimal audio I/O library in Haskell. By optimal, I mean the library should do all of the following:

Have a clean, functional, usable interface. Operations should be composable, and the user should not have to deal with recursion, folds or the like. Procedural-style interfaces (anything exposing filehandle-type operations to the user) are out too.
Be space efficient. Some Haskell implementations suffer from space leaks, or can have space leaks if the user isn't careful with data.
Have performance comparable to hsndfile, a Haskell binding to libsndfile.
Be implemented purely in Haskell. This makes it interesting.

The standard for audio I/O performance in Haskell has been hsndfile, as was discussed on haskell-cafe not long ago. Testing shows that using hsndfile to read data in a lazy stream fashion (similar to lazy ByteStrings) is about 10 times faster than the next-fastest implementation. Not good for Haskell's claim of having speed comparable to C! Still, I'm convinced it's possible to do better.

My test case consists of finding the peak normalized amplitude of a 16-bit stereo WAVE file. The audio duration is about 6 minutes, with a total file size of about 66 MB. This task involves reading and processing the entire file. The processing is quite minimal, so the speed of an implementation should depend primarily on the efficiency of reading and normalizing audio data.

Using hsndfile in a lazy stream does provide an interface which is familiar to many Haskell users, but I'm not particularly fond of it. If the user is not careful, the entire file can be retained in memory after processing. For a large audio file, this leads to an unacceptable situation. It should be impossible for data to be retained without explicit action from the user. Also, dependencies on foreign libraries can be difficult for some (i.e. Windows) users to resolve.

So having decided to create a native Haskell implementation, the first place to start is with the semantics and interface. I recently read Oleg K.'s presentations on Iteratee processing in Haskell, and I'm convinced this is the best model for functional I/O. It just feels right, what else can I say? Read the paper, see the code.

A preliminary version of Iteratee-based processing was benchmarked as "Enumerator" in the Haskell-art discussion. It was roughly comparable with other Haskell solutions, but has some problems. Notably it doesn't support seeking within the file, which would be nice. After I developed that, Oleg released a new version of Iteratee which does support seek operations via the RBIO monad. So I dropped Enumerator in favor of the new Iteratee + RBIO.

Oleg helpfully provides a TIFF reader with his Iteratee code. I based my wave reader on that, using the same technique of storing Enumerators for each wave sub-chunk in an IntMap. Since they aren't tagged, a list probably would have served just as well, but since most wave files only have 2 or 3 chunks anyway it doesn't seem like it matters much. Changes here aren't going to have a large performance impact.

The first version to use Iteratee + RBIO executes in about 45 seconds. Ouch.

RBIO uses IORef's to communicate seek requests and answers between Iteratees and Enumerators. Unfortunately IORefs are slow, so I'll start by getting rid of them. The data type

data IterateeG el m a = IE_done a (StreamG el)
| IE_cont (StreamG el -> IterateeGM el m a)

can be changed to

data IterateeG el m a = IE_done a (StreamG el)
| IE_cont (StreamG el -> IterateeGM el m a)
| IE_seek FileOffset (StreamG el -> IterateeGM el m a)

This also necessitates adding one more case to (>>=) on Iteratees:

iter_bind m f = m >>== docase
where
docase (IE_done a (Chunk [])) = f a
docase (IE_done a stream) = f a >>== (\r -> case r of
IE_done x _ -> liftI $ IE_done x stream
IE_cont k -> k stream
iter -> liftI iter)
docase (IE_cont k) = liftI $ IE_cont ((>>= f) . k)
docase (IE_seek off k) = liftI $ IE_seek off ((>>= f) . k)

After these changes, the provided Iteratees can be adapted to use the new structure relatively easily. The enumerator that reads from a file, enum_fd_random, must also be updated to understand seek requests. At this point we can remove RBIO entirely, and our execution time is about 33 sec. Adding a few carefully-chosen INLINE's and SPECIALIZATION's gets runtime down to 25 sec. Better, but we're very far from the goal of being comparable with C.

The next step is to change the Iteratee's internal representation of data.