I’m working on a project to try and expose audio spectrum data from Firefox’s audio element. Today I give an update on yesterday’s progress, since there are pretty pictures, and then ask some questions.
Yesterday I wrote about our first steps to find and extract audio spectrum data from the <audio> element. At the end of my post I wondered aloud whether or not the numbers I’d produced were meaningful.
After that post I spent some time working with Al MacDonald and Thomas Saunders–my partners in these experiments–and a number of interesting things happened. First, what were simply numbers to me proved to be meaningful to them, and after helping them build Firefox with my changes, they started playing with the data. Al made a simple audio test case, and Thomas worked the data into a JavaScript friendly form. Next Al took the audio and analyzed the wave form in Audacity, before creating a real-time canvas visualization of the data we’d extracted from the browser. The results speak for themselves.
Having an early success is encouraging, but like one of my students is fond of saying, success brings you to your next problem. Knowing that this data is meaningful means I now need to figure out the right way to expose it in the DOM. When I get this data I’m deep in C++, nowhere near JS running in the browser. What I need is a proper API for making this data available.
My choice of words is important: “right way,” “proper API.” I could (and probably will for my next test) just drop-kick the data across the the content boundary. But what’s the right way to do this? I did some investigations into our implementation of DOM events last night. I was thinking that maybe I should pass the data out within a custom DOM event. However, I don’t see any events that use this model. There doesn’t seem to be much data pushed with the event. Another option is to dispatch an event and then fill a buffer that can be read via a getAudioData() call. But even when I get this working (if this is the right thing to do), I’ll next have to worry about how to make that data meaningful in terms of sync with the actual audio that the user is hearing. A little bit out of sync is almost as bad as being totally random, especially if you’re trying to time visualizations or other UI updates to sound.
So I’m convinced we’re on track, and also feeling a bit lost. I know that “the perfect is the enemy of the good,” so I won’t halt my work until I can settle all these questions. It’s clear to me that I’m going to have to get this wrong before I get it right. But I’d value some input from those closer to our DOM implementation and the JS community on which paths to explore. Thankfully, Chris Blizzard has started that ball rolling by introducing us to some more people. The most enjoyable part of experiments like these for me is the chance to work in community.
4 Comments
I wouldn’t mind helping to bridge the gap between our current C++ stdout console statements and an actual JavaScript API, as I would like to become more familiar with the Mozilla framework.
I think its very true that, in this case, the perfect is the enemy of the good. Whether data is dispatched as an event through the DOM, or exposed as a Javascipt getAudioData() method doesn’t seem all that important.
What developers today want is the raw data, and I’d say give it to them as simply as possible. They’ll interpret it as they wish, and it will be their responsibility to manage browser performance and synchronization issues.
One consideration, however, may be to let the developer choose how much of the sound data they are accessing. Perhaps a call to getAudioData() has one parameter that defaults to 1, but, if resources are limited, one could call getAudioData(.5) or getAudioData(.1) in order to get a trimmed down version of the data. This would give developers the option to provide access to sound data based on the user’s browsers’s capabilities, and hedge off sync and timing issues.
Just some morning thoughts, but thanks for the distraction from the work I already have at hand
Can you get (arbitrary parts of) this data before the audio/video is played (or parts of it, at least?). If so, that might be the easiest way to solve the sync issue. Assuming the rate (number of elements in the data vs. time) is known (or passed by the caller?), and the audio element exposes current audio position, that and normal time tracking (new Date(), setTimeout) should be plenty to ensure sync. Just have an API that allows access to the raw data between position X and Y (in seconds or raw data offset, doesn’t make much difference if the rate is known).
Additionally it would be nice ( at some stage ) to be able to delay the audio by x milliseconds. This would allow one to grab 512 bits from the buffer, process the FFT ( although that would ideally be done in C ), then process the graphics, and push everything back out at the same time. Otherwise graphics based on data from a Fast Fourier Transform will be out of sync. I guess that’s something that will get covered by write access to the audio buffer at some point anyway. Also, would it be beneficial to work on byte arrays at this time and transfer some of that methodology to the Canvas pixel array which is super-mega-breath-takingly-slow?
I’ll admit I have only quickly scanned your post here, but I think I may have done something like you are looking for. If it’s in the same vein, let me know and I’ll send along the code. I should mention that I do use flash for these ends:
javascript spectral analyzer
One Trackback
[...] Bread and Circuits Philosophy for the programming set, served on home made bread Skip to content About – David HumphreyReading Open Data « Experiments with audio, part II.I [...]