Sunday, March 25, 2012

Pure delay lines

While working on my dsp-extras project, I needed a pure delay line.  Based on previous work, I used Data.Sequence.Seq, and was somewhat surprised to see a very large performance hit when compared to an undelayed signal.  Obviously there was more to creating pure buffers in Haskell than my previous microbenchmarks revealed.  I resolved to get the performance back.

I created a test script to measure the performance of different delay line implementations in a real-world application.  The application simply reads in a sound file, applies a fractional delay, and writes the output.  I measured the performance with buffer sizes ranging from 1 to 10000 samples.  Due to interpolation (I used cubic interpolation for all my tests), I couldn't actually measure a 1-sample buffer; the program actually creates a 4-sample buffer for the smallest possible delay.  All other buffer sizes are accurate though.

I tested the following buffer types:
  • Data.Sequence.Seq - cons elements onto the front and drop from the tail as each element is pushed
  • Data.Vector.Unboxed.Vector - a naive copying implementation
  • T10 - a 10 element, strict data type (e.g. data T10 a = T10 !a !a ...)
  • T25 - a 25 element, strict data type
  • Evil - based upon Data.Vector.Unboxed.Vector, the Evil buffer thaws the vector, mutates a single element, and returns the updated vector.  This buffer is closest to an imperative circular buffer implementation, although it violates referential transparency.  Hence the Evil moniker.
  • ComboBuffer - the ComboBuffer consists of two unboxed vectors and an update list.  After the update last grows large enough, it's frozen into a vector and the oldest data is dropped.  Writing elements is performed in amortized O(1) time.  The read behavior is more complex.
The results are summarized online here.  A few notes about interpreting the results:
  • Given times are the result of a single run, although they are fairly stable over multiple executions.
  • The chart shows times in a log scale to help differentiate times for small buffers.  Otherwise the O(n) behavior of Vector obliterates these differences.
  • Current is the behavior of an algorithm that adaptively selects a buffer type based upon size.
  • All the results (except one) are for reading off the end of the buffer.  For most buffer types this shouldn't matter. It's optimal for Seq.  For the ComboBuffer, reads from the last half of the buffer can be done in O(1), but reads from the first half are more complex, as a list lookup may be required.  The Current - first quarter data is the result of accessing points 0.25 of the way into the buffer.
So for reads from the end of the buffer, my ComboBuffer is pretty good.  It needs some tuning though, especially for reads from near the front of the buffer.


  1. Hi John. Great posts! I need something similar to this for my project (circular lists - for delay, filtering, and saving slices of recent history in response to certain features in the signal) - is something about this size up on hackage? I found x-dsp but it looks a little heavy.

    I wrote my own (really novice) implementation at . But I don't want to do something redundant if the niche is filled by something better already.


    1. Hi Greg,

      Yeah, x-dsp isn't really for this. Plus it's sort of an experiment that didn't entirely work out.

      All of these buffers do live in publicly-available code, however it's part of a larger package that isn't really useful yet. The buffering portion is at . The ComboBuffer works pretty well, except for reads in the first quarter of the buffered output. I think it would be great for the use cases you list on your implementation. Seq has good asymptotic behavior, but the constant factors appear to be *really* big compared to other implementations.

      I can split out the buffer stuff and put it in a standalone package. I'll try to do so this weekend.

    2. Thanks for the pointers. The combo buffer code looks interesting. I'll keep my eye out for your split, and in the meantime copy some of your ideas into my code. If you find time to make a library out of the buffering part of dsp-extras, that's much appreciated! Alternatively I could copy your ideas over to my thing.

    3. I've just put combobuffer on github and hackage.

      Patches welcome.

  2. Just saw your message. This is great, thanks! I'll learn a lot reading through your library. Wondering (probably naively) how the benchmark story for vectors changes with this new 'generalized stream fusion' stuff I was pointed to.

    1. Generalized stream fusion ( doesn't do much for this problem, I suspect. With a simple vector-based implementation, writing to the buffer results in a chain of read/writes as the vector is copied. GSF would replace this with a single write and a memcpy. This would certainly be an improvement, but it's still O(n). What we really want is to not have to copy the entire vector each time. The problem is sharing memory without breaking referential transparency.

      Although thinking about this has given me an idea for a new implementation that might be better all around. I'll try to code it up and check the performance. If it works, I'll add it to the combobuffer package.

  3. Just saw your message. This is great, thanks! I'll learn a lot reading through your library. Wondering (probably naively) how the benchmark story for vectors changes with this new 'generalized stream fusion' stuff I was pointed to.