I’m working with an ever growing group of web, audio, and Mozilla developers on a project to expose audio data to JavaScript from Firefox’s audio and video elements. Today we show you how much JavaScript can really do.
Since my last post, quite a few new people have joined our group, a lot has changed in our implementation, and we’ve achieved a few things worth writing about. I also can’t keep these demos under wraps any longer, so it’s time for another post.
One of the first pieces of advice I got in the bug, when I started writing this patch to expose audio data in Firefox, was to use Vlad’s new typed arrays (aka WebGL Arrays). My first implementation used an array-like object to expose the audio data, and JS arrays for writing samples. Both worked well, but neither was as fast as we’d like, and it meant various hacks to work around performance issues. Vlad was kind enough to give me a crash course on how to implement them via quickstubs, and over the past few weeks, Yury Delendik and I have worked long hours to rewrite our entire implementation to use them.
Along with the suggestion to use typed arrays also came a less welcome suggestion: remove the FFT calculation from C++ and do it all in JavaScript. When I suggested this in our #audio irc channel, a lot of people were upset, saying that this was a bad idea, that we’d never be fast enough, etc. However, I pulled it out anyway in order to force them to try. Corban responded by rewriting his dsp.js library to use Float32Array, which can now do 1000 FFTs on a 2 channel * 2048 sample buffer in 490ms, or 0.49ms per fft (js arrays take 2.577ms per fft, so a lot faster!). And one of the biggest critics of my decision to pull the native FFT, Charles Cliffe, went off to prove me wrong, but ended up with two stunning WebGL based audio visualizations (demos here and here, videos here and here).
What I like most about these (other than the fact that he’s written the music, js libs, and demo) is that these combine a whole bunch of JavaScript libraries: dsp.js, cubicvr.js and beatdetection.js, and processing.js. Some people will tell you that doing anything complex in a browser is going to be slow; but Charles is masterfully proving that you can do many, many things at once and the browser can keep pace.
Corban and Ricard Marxer have been busy exploring how far we can push audio write, and managed to also produce some amazing demos. The first is by Ricard, and is a graphic equalizer (video is here):
The second is by Corban, and shows a JavaScript based audio sampler. His code can loop forward or backward, change playback speed, etc. (video is here):
Chris McCormick has been working on porting Pure Data to JavaScript, and already has some basic components built. Here’s one that combines processing.js and webpd (video is here):
I think that my favourite demo by far this time around is one that I’ve been waiting to see since we first began these experiments. I’ve written in the past that our work could be used to solve many web accessibility problems. A few weeks ago I mentioned on irc that someone should take a shot at building a text to speech engine in JavaScript, now that we have typed arrays. Yury quietly went off and built one based on the flite engine. When you run this, remember that you’re watching a browser speak with no plugins of any kind. This is all done in JavaScript (demo is here, video is here):
In order to do this he had to overcome some interesting problems, for example, how to load large binary voice databases into the page. The straightforward approach of using a JS array was brittle, with JS sometimes running out of stack space trying to initialize the array. After trying various obvious ways, Yury decided to use the web to his advantage, and pushed the binary data into a PNG, then loaded it into a canvas, where getImageData allows him to access the bytes very quickly, using another typed array. The browser takes care of downloading and re-inflating the data automatically. Here’s what the database looks like:
What began as a series of experiments by a small group of strangers, has now turned into something much larger. Our community continues to grow, and the scope and scale of the projects being done on our API is increasing. At the same time, through the work of Doug Schepers and Chris Blizzard, we’ve managed to get the attention of the W3C, which have now started an Audio Incubator Working Group to look at how to standardize this stuff. One of my colleagues in these experiments, Al MacDonald, has been asked to chair the group, which already has members from Mozilla, Google, and the BBC. You can get involved and follow @AudioXG for updates.
If you’d like to stay connected to this work, you can join this bug, where I’ll be posting a patch for review in the next week or so (current patch is here). You can see our Audio Data API documentation, with tutorials and examples (this was recently completely rewritten, if you’ve looked at it before). You can also grab builds there, which I’m making right now and will be done in the next few hours.

24 Comments
Fantastic! HTML5 audio is going to pick up much faster than video, because Vorbis is totally free and totally the best lossy codec in existence! Awesome demos, awesome tracks too :=)
Not to nag you, but; will there be Linux x64 builds available this time? Reason I’m asking is because the 32bit builds won’t start due to “missing” gtk libraries and I would really like to try out the demos.
Anyhow, props to you all for the awesome work and quick progress. I can’t wait to see this going stable!
@Te I can probably spin-up a 64-bit Linux build on Fedora, sure. I’ll post it on the API wiki page where the rest of the builds go.
this is pretty sick. should be extremely useful for section 508 compliance. wow. great work.
Extraordinary work guys. Simply stunning. Well done.
@David
That’s awesome, thanks.
There has been an addon that does this for a while now. The flite component currently only works on linux and it’s big because it includes 4 voices but …
Jim, it’s time for dinner. Quit speaking about things you don’t understand and wash up for your favorite, meatloaf.
Aww, meatloaf again mom? That’s not fair! I wanna play with the big kids!
Thanks so much for blogging about all this. Big props to all involved. As a musician and who is more fluent in javascript than music notation, this is huge news.
Wow David, et al this is so cool. Keep up the great work! It’s so encouraging to see such great innovation coming out of the Mozilla community.
Great work! Makes me look forward the browser-based, “dj-style” mixing solutions in the form of add-ons to appear. The future looks promising
The TTS example almost nowhere uses typed arrays … lots of missed opportunities by using traditional JS arrays in many places! Also, there is no text analysis / normalization, not to speak of poor quality audio. It seems a fraction of flite was ported only. Working in the field, I am confident in saying this approach will NOT scale in a long time.
Now, it would be quite another story to port a decent TTS engine using Google’s NativeClient plugin…
I’m wondering if this could help the development of open source musical notation via the web. Nothing really compares to Sibelius right now, and there’s no real way for composers to notate music without some bought for program.
Well I’ll be. I check out a cool link on Slashdot about audio processing in Mozilla, and it turns out it’s written by one of my old profs. Cool stuff, David!
How was that data-png generated?
I’d like to try out a couple of things using that technique for my own data!
Thanks for the post!
@Matt, here you go:
<script src="big_array_literal.js"></script>
<script>
var len = big_array_literal.length;
var width = 1024, height = Math.ceil(len / 3 / width);
var c = document.createElement("CANVAS");
c.width = width; c.height = height;
var ctx = c.getContext("2d");
var d = ctx.getImageData(0,0,width,height);
for(var i=0, j=0;i<len;i++) {
d.data[j++] = big_array_literal[i];
if((j % 4) == 3) d.data[j++] = 255;
}
ctx.putImageData(d,0,0);
document.write("<a href='" + c.toDataURL() + "'>image</a>");
</script>
I was foolishly trying to do it on the command line! Thanks much!
Wow, that’s really some cool effects!
This is soooo going to kill Flash! Kill it! Kill it!!!!
Aha!!! Nice effect dear.. I like this..
This looks very promising, great work!
I would love to experiment with this API but the link to the builds provided in the Mozilla Wiki article (https://wiki.mozilla.org/Audio_Data_API#Obtaining_Code_and_Builds) appears to be dead. Is there a place where I can download a patched build to experiment with the Audio Data API? (win32)
@Joram thanks for letting me know about that, I’ll get some new builds done and posted…
@David
Thanks for the builds, will give it a try soon now..!
18 Trackbacks
[...] you haven’t been keeping track of David Humphrey’s work to bring audio manipulation to Firefox, you’re missing out. He’s made an update post with a huge number of demos, requiring [...]
[...] builds let you do just that: crazy audio visualizations, a graphic equalizer, even text-to-speech, all in JavaScript! Work in progress; you need a special build of Firefox (videos available), being worked on via [...]
[...] sorprende tutti con la possibilità di gestire immagini (Canvas) e audio (esperimenti con Firefox, manipolazione in realtime di samples) direttamente attraverso JavaScript, prefigurando un ritorno alla centralità del client. [...]
[...] an article that demonstrates practical examples of audio processing in the browser via javascript. The piece features text to audio, synthesis, EQ and more. Check out this video [...]
[...] http://vocamus.net/dave/?p=1092 var a2a_config = a2a_config || {}; a2a_config.linkname="Experiments with audio: expose audio data to JavaScript"; a2a_config.linkurl="http://ladog.info/experiments-with-audio-expose-audio-data-to-javascript/"; [...]
[...] 想象一下用JavaScript操作HTML音频数据:加拿大多伦多Seneca学院的David Humphrey教授与Mozilla及Web开发者合作展开一项试验性的Firefox开发工作(Youtube视频):利用JavaScript实现音频可视化、一个图形均衡器、甚至是文本到语音。 [...]
[...] is a series of articles on the Bread and Circuits blog that covers innovations in web audio. The most recent post in the series covers some truly amazing work being done in processing audio in the browser. There [...]
[...] artykuł na: Experiments with audio, part X Tags: 1092-var, data-, experiments, expose-audio, javascript-on-may, ladog-info, with-audio [...]
[...] More experiments with audio from Dave Humphrey, including some links to some very cool WebGL/CubicVR visualisations from Charles Cliffe. [...]
[...] udało się nawet wykazać, że wbrew pozorom może to być wydajne rozwiązanie, jeśli tylko zmodyfikować odpowiednie biblioteki JS, choć z początku reakcje nawet samych [...]
[...] Via | Vocamus [...]
[...] Via | Vocamus [...]
[...] Humphrey and the hit squad of audio gurus have some new amazing demos for us. Perfect for a Friday. This is all through the rich Mozilla Audio API work which will [...]
[...] and Processing.js on Vimeo video 2 Web Audio Data API – Text to Speech Demo on Vimeo Via | Vocamus fonte: Web del futuro, l'audio p.s. nix questi video non son di youtube quindi ho messo i link [...]
[...] you ponder that (and I’m open to suggestions), here’s more reading for you: Experiments with audio, part X [Dave Humphrey's increasingly-awesome [...]
[...] en una versión de desarrollo de Firefox. Pues bien, a finales de mayo David Humphrey publicaba la décima entrega de los experimentos con el audio desarrollados por dicho grupo de desarrolladores: Since my last post, quite a few new people have [...]
[...] was talking with notmasteryet, one of the audio wizards that came up with this hack of streaming byte data into a png for Text to Speech use within browsers, and he told me to use bit shifting instead of string conversions to plug the information into the [...]
[...] tanto sobre eso… nos lleva a cosas como esta: quake2-gwt-port – Project Hosting on Google Code Experiments with audio, part X Libraries Processing.org HTML5 Demos and Examples CSS3 . Info – All you ever needed to know about [...]