Tuesday, October 11, 2011

J. Roger Hindley is coming to Maynooth

I have recently learned that J. Roger Hindley (of Hindley-Milner) will be giving a talk next month on the history of Curry-Howard in the Mathematics Dept. at NUI Maynooth.  I will definitely be there!

Friday, September 16, 2011

splaytree package released

I've just uploaded a new package to Hackage, splaytree-0.1.  Splaytree provides pure functional splay trees, implemented using monoidal annotations.  A splay tree is a self-balancing tree with the property that recently-accessed nodes remain near the root, and thus subsequent accesses can be quite cheap.  As for the monoid annotations, it's the same technique as used in finger trees (see here, here, and here).  The idea is that each node of a tree holds an annotation in addition to data.  If the type of the annotation is chosen properly, it can be used to prune branches when searching.  One clever aspect of this technique is that different data structures can be implemented simply by choosing different annotations.

Currently three different data structures are available, Data.SplayTree.Set, Data.SplayTree.Seq, and Data.SplayTree.RangeSet.  'Set' provides a set with standard operations.  Not much time has been spent optimizing this library, but performance is competitive with unordered-containers for many operations (inserts can be quite slow though).  Seq is a sequence type, similar to Data.Seq.

RangeSet is perhaps the most interesting of the available containers.  Similar to an IntervalSet, a RangeSet is used for storing a list of Ranges.  A RangeSet, however, guarantees that all ranges are non-overlapping.  When a new range is inserted into the RangeSet, it is combined with any ranges currently in the set with which it overlaps.

Example (Note that a Range is a starting point and duration):

Prelude Data.SplayTree.RangeSet Data.Foldable> let rs = singleton (range 0 4)
Prelude Data.SplayTree.RangeSet Data.Foldable> toList rs
[Range {rMin = 0, rang = 4}]
Prelude Data.SplayTree.RangeSet Data.Foldable> toList $ insert rs (range 3 2)
[Range {rMin = 0, rang = 5}]
Prelude Data.SplayTree.RangeSet Data.Foldable> 

Comments and patches welcome.

Friday, September 2, 2011

My mind is blown


Recently I've read two blog posts which have really started to change how I think about programming.  A while ago I found a solution to the word numbers problem which completely stunned me with its simplicity and generality. Today I stumbled upon Apfelmus's "monoids and finger trees" blog post, which is very timely since I can use the same approach to both simplify and generalize a data structure I've been working on recently.

With this in mind, I would like to move from my current awareness (being able to follow along with these discussions) to being able to recognize and apply very general structures.  Somewhat tongue-in-cheek, I'd like to know how to make good use of Edward Kmett's packages.  Advice would be appreciated.

Wednesday, July 6, 2011

Circular Buffers

(A followup with updated results is available)

An extremely useful tool in DSP programming is the circular buffer.  Basically, a circular buffer is useful for keeping an array of n elements of a data stream.  Implementing this in Haskell is not straightforward, because the usual implementation relies upon mutating arrays.  As Haskell is a pure language, mutable data can only be used in restricted contexts, which can make a mutable circular buffer somewhat awkward.  It offends my sensibilities if nothing else.

(Code presented in this article is basically the first thing I thought of.  Improvements are welcome, provided they don't detract too much from readability.  All code is placed in the public domain.)

The ideal Haskell implementation would be pure, and for certain cases a high-performance solution is available.  If you only need access to the end of the buffer (i.e. a FIFO queue),  real-time pure functional implementations are well-known (Okasaki).  But what if you want to read from the middle of the buffer?  In an imperative implementation, this is usually handled by either keeping multiple read pointers, or by using an absolute offset to the start of the buffer.  It's not clear how to apply either of these techniques to Okasaki's queue implementation though, and the obvious approach of dequeueing as many times as required has O(n) complexity.

An obvious approach is to write an implementation using mutable data.  The implementation would no longer be pure unfortunately.  Here's a simple implementation which runs in IO, based upon mutable vectors:

module Data.RingBuffer.Mutable (
  RingBuffer
 ,newInit
 ,push
 ,length
 ,(!)
)

where

import qualified Data.Vector.Unboxed.Mutable as V

import Foreign.ForeignPtr
import Foreign.Storable
import Control.Applicative
import GHC.Prim

data RingBuffer a = RB (V.MVector RealWorld a) (ForeignPtr Int)

newFp :: Storable a => a -> IO (ForeignPtr a)
newFp a = do
  fp <- mallocForeignPtr
  withForeignPtr fp $ \p -> poke p a
  return fp

readFp :: Storable a => ForeignPtr a -> IO a
readFp fp = withForeignPtr fp $ \p -> peek p

writeFp fp a = withForeignPtr fp $ \p -> poke p a

newInit :: (V.Unbox a, Num a) => Int -> IO (RingBuffer a)
newInit sz = RB <$> (V.replicate sz 0) <*> newFp 0
{-# INLINE newInit #-}

(!) :: (V.Unbox a) => RingBuffer a -> Int -> IO a
(!) (RB vec ref) ix = do
  off <- readFp ref
  let len = V.length vec
  V.unsafeRead vec ((ix+off) `rem` len)
{-# INLINE (!) #-}

push :: V.Unbox a => RingBuffer a -> a -> IO ()
push (RB vec ref) el = do
  off <- readFp ref
  let len = V.length vec
  let off' = (off+1) `rem` len
  writeFp ref off'
  V.unsafeWrite vec off el
{-# INLINE push #-}

So what would an efficient pure implementation look like?

Probably the fanciest pure data structure is the finger tree, most commonly known by the implementation in Data.Sequence.Seq.  Finger trees have many interesting attributes, including O(1) access to both ends.  They do not have O(1) access to interior elements, but better than O(n).  It's pretty simple to implement a similar interface with a Seq:

module Data.RingBuffer (
  RingBuffer
 ,new
 ,newInit
 ,push
 ,length
 ,(!)
)

where

import           Prelude hiding (length)

import qualified Data.Sequence as S

newtype RingBuffer a = RB (S.Seq a) deriving (Eq, Ord, Show)

-- | Create a new RingBuffer, initialized to all 0's, of the given size
new :: (Num a) => Int -> RingBuffer a
new = newInit 0
{-# INLINE new #-}

-- | Create a new RingBuffer from a given initial value
newInit :: a -> Int -> RingBuffer a
newInit i sz | sz <= 0 = error "can't make empty ringbuffer"
newInit i sz           = RB (S.replicate sz i)
{-# INLINE newInit #-}

-- | Get the total size of a RingBuffer.
length :: RingBuffer a -> Int
length (RB vec) = S.length vec
{-# INLINE length #-}

-- | Look up a value in a RingBuffer.
(!) :: RingBuffer a -> Int -> a
(!) (RB vec) = S.index vec
{-# INLINE (!) #-}

-- | Push a new value into a RingBuffer.  The following will hold:
--     NewRingBuffer ! 0 === added element
--     NewRingBuffer ! 1 === OldRingBuffer ! 0
push :: RingBuffer a -> a -> RingBuffer a
push (RB vec) el   = case S.viewr vec of
  v' S.:> _ -> RB $ el S.<| v'
  _         -> error "internal error"
{-# INLINE push #-}
Ok, so it's not quite the same, but close enough.

So how does this compare, performance-wise?  I whipped up some criterion code to check it out.


module Main where

import Criterion.Main
import System.Random
import Control.Monad

import qualified Data.RingBuffer.Mutable as RM
import qualified Data.RingBuffer as R
import Data.Vector.Unboxed (Unbox)
import System.IO.Unsafe

rndSeed = 49872

-- test performance as a queue
queueMut :: (Unbox a, Num a) => Int -> Int -> a -> IO ()
queueMut reps sz a = do
  buf <- RM.newInit sz
  replicateM_ reps (buf RM.! (sz-1) >> RM.push buf a)

-- | test performance of multiple reads
multireadMut :: (Unbox a, Num a) => Int -> Int -> a -> IO ()
multireadMut nReads sz a = do
  let rs = take nReads $ randomRs (0,sz-1) $ mkStdGen rndSeed
  buf <- RM.newInit sz
  replicateM_ iters $ do
    mapM_ (\ix -> buf RM.! ix) rs
    RM.push buf a

queueP :: Int -> Int -> a -> a
queueP reps sz a = fst $ (iterate go (a, R.newInit a sz)) !! reps
 where
  go (a', buf) = a' `seq` (buf R.! (sz-1), R.push buf a)

multireadP :: Int -> Int -> a -> [a]
multireadP nReads sz a = fst $ (iterate go (replicate nReads a, R.newInit a sz)) !! iters
 where
  go (last, buf) = length last `seq` (map (buf R.!) rs, R.push buf a)
  rs = take nReads $ randomRs (0,sz-1) $ mkStdGen rndSeed
  

numReps = 100
numReads = 50

iters = 10000

main = defaultMain
  [
   bgroup "Mutable/Int" [
     bgroup "Queue" [
       bench "8"      $ queueMut numReps 8     (10::Int)
      ,bench "128"    $ queueMut numReps 128    (10::Int)
      ,bench "1024"   $ queueMut numReps 1024   (10::Int)
      ,bench "10240"  $ queueMut numReps 10240  (10::Int)
      ,bench "102400" $ queueMut numReps 102400 (10::Int)
     ]
     ,bgroup "Multiread" [
       bench "8"     $ multireadMut numReads 8     (10::Int)
      ,bench "128"    $ multireadMut numReads 128    (10::Int)
      ,bench "1024"   $ multireadMut numReads 1024   (10::Int)
      ,bench "10240"  $ multireadMut numReads 10240  (10::Int)
      ,bench "102400" $ multireadMut numReads 102400 (10::Int)
      ]
   ]
  ,bgroup "Mutable/Double" [
     bgroup "Queue" [
       bench "8"     $ queueMut numReps 8     (10::Double)
      ,bench "128"    $ queueMut numReps 128    (10::Double)
      ,bench "1024"   $ queueMut numReps 1024   (10::Double)
      ,bench "10240"  $ queueMut numReps 10240  (10::Double)
      ,bench "102400" $ queueMut numReps 102400 (10::Double)
     ]
     ,bgroup "Multiread" [
       bench "8"     $ multireadMut numReads 8     (10::Double)
      ,bench "128"    $ multireadMut numReads 128    (10::Double)
      ,bench "1024"   $ multireadMut numReads 1024   (10::Double)
      ,bench "10240"  $ multireadMut numReads 10240  (10::Double)
      ,bench "102400" $ multireadMut numReads 102400 (10::Double)
     ]
   ]
  ,bgroup "Pure/Int" [
     bgroup "Queue" [
       bench "8"     $ whnf (queueP numReps 8)     (10::Int)
      ,bench "128"    $ whnf (queueP numReps 128)    (10::Int)
      ,bench "1024"   $ whnf (queueP numReps 1024)   (10::Int)
      ,bench "10240"  $ whnf (queueP numReps 10240)  (10::Int)
      ,bench "102400" $ whnf (queueP numReps 102400) (10::Int)
     ]
     ,bgroup "Multiread" [
       bench "8"     $ nf (multireadP numReads 8)     (10::Int)
      ,bench "128"    $ nf (multireadP numReads 128)    (10::Int)
      ,bench "1024"   $ nf (multireadP numReads 1024)   (10::Int)
      ,bench "10240"  $ nf (multireadP numReads 10240)  (10::Int)
      ,bench "102400" $ nf (multireadP numReads 102400) (10::Int)
     ]
   ]
  ,bgroup "Pure/Double" [
     bgroup "Queue" [
       bench "8"     $ whnf (queueP numReps 8)     (10::Double)
      ,bench "128"    $ whnf (queueP numReps 128)    (10::Double)
      ,bench "1024"   $ whnf (queueP numReps 1024)   (10::Double)
      ,bench "10240"  $ whnf (queueP numReps 10240)  (10::Double)
      ,bench "102400" $ whnf (queueP numReps 102400) (10::Double)
     ]
     ,bgroup "Multiread" [
       bench "8"     $ nf (multireadP numReads 8)     (10::Double)
      ,bench "128"    $ nf (multireadP numReads 128)    (10::Double)
      ,bench "1024"   $ nf (multireadP numReads 1024)   (10::Double)
      ,bench "10240"  $ nf (multireadP numReads 10240)  (10::Double)
      ,bench "102400" $ nf (multireadP numReads 102400) (10::Double)
     ]
   ]
  ]

A few points about the benchmarks.  First, both queue and multiread perform multiple cycles in each benchmark.  Secondly, the pure benchmarks both seq outputs at each cycle to ensure that work is actually performed.  So how do these implementations compare?
It looks like not only is the Seq-based implementation pure, it outperforms my mutable-vector implementation and scales better too.  Completely unexpected.

Unless somebody points out a major flaw in my testing methodology or implementations, I'll be using Data.Sequence.Seq for my circular buffer needs from now on.

Benchmark data

Edit: thanks to Henning Thielemann for suggested improvements.  I've updated the benchmark code with his suggestions regarding mkStdGen.  I also have updated the mutable buffer to use a ForeignPtr to hold the offset rather than an IORef.  The performance is better, but not drastically so. I haven't put up new images yet, although I have uploaded the new benchmark data.

Friday, May 6, 2011

Linux Audio Conference 2011

Streams are live now.  Check it out if you couldn't make it in person.

http://streamer.stackingdwarves.net/

Thursday, May 5, 2011

GTK on OSX

I finally got tired of the default gtk theme on my macbook, so I decided to do something about it.  I've been using gtk2 no_x11 variant provided by macports, but unfortunately all the theme engines on macports require X11.  I tried the GtkQuartzEngine, and it seems to work perfectly with a little tweaking.

1.  Check out git://github.com/jralls/gtk-quartz-engine.git
2.  Cd into the gtk-quartz-engine repo and run aclocal && autoreconf -vi && autoconf (I used the macports versions)
3. ./configure --prefix=/opt/local && make && make install
4.  Copy the sample gtkrc file to ~/.gtkrc-2.0.

And it all just worked.  It looks decent, albeit slightly buggy.  It's necessary to set the prefix to /opt/local so the engine is installed into the same location as the macports gtk libs.

This seems like a relatively simple candidate for adding to macports.

Monday, April 11, 2011

Thoughts on FRP

(N.B. Code presented in this post is theoretical.  I haven't tried to compile it, although I do expect it would work.)

Although various implementations have been under development for some time, FRP has yet to reach a really satisfying breakthrough.  The best semantic model has been that of Conal Elliot's Reactive, but unfortunately it lacks a Smart Enough Compiler and suffers from subtle space leaks.  The most performant is probably Yampa, which is arrow-based.  Arrows turn some people off; I personally find them too verbose and don't like the syntax for recursion.

Apfelmus has recently released a new FRP library, reactive-banana, which shows some early promise towards being a best-of-all-possible-worlds.  In this post I'll examine some of the features of reactive-banana to see why I think this.

Introduction

Reactive-banana is very similar to the semantic model provided by Reactive, although much more minimal.  One fundamental data type is the Event, which can be thought of as a list of values with their time of occurrence, i.e. Event a = [(Time, a)].  The other fundamental type is the Behavior, which can be thought of as a value that changes with time, i.e. Behavior a = \Time -> a.  This is according to the reactive-banana documentation, which also notes that in this implementation Behaviors aren't continuous but are actually step functions.

So how does it look?  Consider a simple system with a button and a toggle: when the button is pressed it generates a 0 if the toggle is off, and 1 when the toggle is on.  We could model it like this:

togEvent :: Event Bool
buttonEvent :: Event ()

-- Upon receiving an Event (), mixEvents generates an Event Int
-- The value of the Int depends upon the most recent Event Bool status
mixEvents :: Event Bool -> Event () -> Event Int

The togEvent and buttonEvent sources are created by connecting to the window framework's event system (see reactive-banana-wx for an example), but mixEvents is more complex.  We could model it with the pure function

mixEventsF :: Bool -> () -> Int
mixEventsF True _ = 1
mixEventsF _    _ = 0

so we just need to find a way to lift this into the FRP system.  Behaviors have an Applicative instance, so let's try it:

mixEvents = apply (mixEventsF <$> behavior False togEvent) buttonEvent

and we've got it.  If we prefer, we could define mixEventsF :: Bool -> Int, but then we'd need to fmap const onto the Behavior to create a Behavior (() -> Int) we could apply to an Event.

So how does this work?  The Behavior functions as an Event Accumulator.  It stores every Bool received from togEvent, and when a button press event is received creates an event from applying the Bool and () to the mixEventsF function.  One clue that this is the proper way to think of Behaviors is by the different ways they can be constructed:
  • lifting pure values, and always (these are semantically equivalent, pure is defined as always)
  • the behavior smart constructor (only hold the most recent Event)
  • various accumulate functions on Events
So if a Behavior is an event accumulator, why do we need it at all?  Why not just use events directly, explicitly storing values when we need them?  In my view, the answer has to do with abstracting implementation from the user.

Push vs Pull

One tension in the FRP design space relates to a push-driven (data-driven) vs pull-driven (demand-driven) architecture.  Push-driven offer theoretically better performance, as data sources typically change relatively slowly (e.g. mouse clicks relative to screen refresh), so this minimizes recomputation.  However, pull-driven seems to fit the functional paradigm better.  Also, if a Behavior models a continuous function, pushing updates at the sample rate could be very wasteful.

The Behavior type (at least in reactive and reactive-banana) moderates between these two models.  We can see how this works by examining the semantics of our mixEvents function.

-- Upon receiving an Event (), mixEvents generates an Event Int
-- The value of the Int depends upon the most recent Event Bool status
mixEvents :: Event Bool -> Event () -> Event Int

Our implementation created a Behavior (() -> Int) by combining a pure function with the Event Bool stream.  This behavior is then sampled whenever an Event () is received.  By converting a discrete Event into something which can be later sampled, the Behavior localizes a change from push-driven to pull-driven.  Thus Event Bools are stored and pulled from as necessary, while Event ()s are pushed through and become Event Ints, which are then pushed until they're also changed into Behaviors (or reactimate'd).

Continuous Time

Another Behavior feature is that semantically (traditionally, but not in reactive-banana) they are continuous functions over time.  There are a lot of reasons to like this feature, which is notably missing from the reactive-banana semantic model.  Originally I was somewhat skeptical, however I've come to the conclusion that the reactive-banana model of Behaviors as step functions is more correct.  Since a Behavior is an accumulator of Events, it makes sense that the Behavior would change only when new Events are received.  As an Event is a datum at a discrete time, the Behavior must therefore be a discrete-valued function.  In particular, there is a conflict inherent in overloading the meaning of a Behavior to both respond to Events and be continuous in time.

Furthermore, I don't think the programmer loses anything by the reactive-banana model.  It's still possible to write functions that are continuous in time and lift them to Behaviors.  This is because the reactive-banana framework has no notion of time.  If you want a time value, you need to provide one yourself, perhaps via a hardware clock.

-- a pure, continuous function
-- using UTCTime to conveniently tie in with getTime.
createShape :: UTCTime -> Shape

-- an event source tied to the window toolkit's redraw event
redrawScreen :: Event ()

-- function to draw a shape to the screen
drawShape :: Shape -> IO ()

getTime :: IO UTCTime
getTime = Data.Time.Clock.getCurrentTime

reactiveSystem :: Prepare ()
reactiveSystem =
 reactimate $ fmap drawShape
                   (apply (pure createShape) (mapIO getTime redrawScreen))

Let's examine how this works.  Fundamentally, when the window system requests a screen redraw, we want to check the current time, generate the correct shape for the time, and finally update the display.  Our system provides the redraw requests via an Event (), which is mapped to the current time, creating an Event UTCTime.  The createShape function is first lifted to a Behavior then applied to the Event UTCTime, creating an Event Shape.  Finally the drawShape function is mapped over this, creating an Event (IO ()) which can be reactimate'd, i.e. each IO () is performed as it becomes available.  In my opinion, by removing time from the reactive-banana framework, both the described systems and the implementation are greatly simplified, with no loss of expressiveness.

Simultaneitys

One aspect that seems to cause a lot of difficulty is simultaneous events, multiple Events which occur at the same time.  There are a few ways these can arise, such as long sampling periods, merging Event streams, and recursive definitions.  reactive-banana specifies that the order of simultaneous events is undefined, providing the orderedDuplicate function to deal with certain cases where timing is important.

I don't like this solution.  Semantically, it would be nice that an event stream were strictly increasing in time, disallowing simultaneous events entirely.  I suspect that this could be made usable by providing the following two primitive functions:

mergeCombine :: (a -> a -> a) -> Event a -> Event a -> Event a

-- When two events arrive simultaneously, create two output events with the right event delayed slightly
mergeOrdered :: Event a -> Event a -> Event a

The mergeOrdered function introduces a slight delay to preserve discrete events, whereas mergeCombine has a function to combine two events to one.  mergeCombine handles the common cases of dropping one event via const and combining with mappend, as well as more specialized functions.  I think these functions, together with the guarantee that an Event stream only produces single events, are sufficient for a workable system.

Conclusion

So that's my initial reaction to reactive-banana.  The library is quite minimal, but so far it's sufficiently expressive for every problem I've thought of.  The new semantics of Behavior seem conducive to developing a correct model of how the implementation works, thereby making it simpler for users to reason about performance and operations.  Requiring users to explicitly manage time may eventually become a burden, however it's a direction worth exploring.  For many problems user code doesn't depend upon time anyway.

Thursday, February 3, 2011

a better audio language

One of my frequent frustrations with audio programming languages is that they're relatively unsafe, at least when compared to languages with sophisticated type systems. The problem is generally worsened with EDSLs. Consider this example csound:

aNull delayr 2
atap deltap 1
delayw asig
Most csound dsl's will provide 3 functions corresponding to delayr, deltap, and delayw, which are translated to the dsl in a very direct manner. If the delayr is left out, the dsl code is translated to invalid csound, detected only by the csound compiler. I find this frustrating because we can do better.

What if the delayr generates a token, which is then required by all functions that want to use it?

-- Haskell-mode now
myDelay asig = do
(dtok, _) <- delayr 2 adelay <- deltap dtok 1 delayw dtok asig return adelay
This is better, but it's still possible to accidentally omit a delayw from the Haskell code. Again, this is undetected except by the csound compiler at the final stage. What we really need is a way to ensure that a user can run delays freely within the context of a delay line, but only in that context. In Haskell, computations within contexts are generally represented by applicative functors. Let's try that approach:

-- don't export the constructor
newtype DelayCtxt a = DelayCtxt { unDelay :: CsoundInterp a }

deltap' :: CsoundInterp KSig -> DelayCtxt ASig
deltap' dtime = DelayCtxt $ dtime >>= deltap

runDelay :: CsoundInterp ASig -> CsoundInterp Float -> DelayCtxt a -> CsoundInterp a
runDelay mAsig mtime delcomp = do
maxdel <- mtime asig <- mAsig delayr maxdel rval <- unDelay delcomp delayw asig return rval
This is more like it. Since the DelayCtxt constructor isn't exported, the only way to create a DelayCtxt value is with deltap'. The only way to consume a DelayCtxt is with runDelay, which automatically adds the required delayr and delayw.

Unfortunately the user still needs to manually add the maximum delay time, but that's unavoidable if you're going to support varying delay times. Consider that the input signal could be coming from a controller in real-time; to statically guarantee it's under a given bound would require a bounded signal type, which would be equivalent to this static delay bound anyway.

This is the motivating example behind X-DSP, which was presented at the 2011 SEAMUS conference at the University of Miami.