Introducing DXR – Bringing Static Analysis to the Web

For a while now, whenever I’ve had spare time, I’ve been “building a boat in my basement.”  Today I’m pleased to bring it out and launch it:  I’d like to announce the first public release of DXR, version 0.1.

DXR is a two things.  First, it’s a method for collecting type, member, statement, macro, etc. information about C++, IDL, and soon JavaScript using Mozilla’s static analysis tools (e.g., Dehydra), a hacked build system, and a bunch of scripts.  Second, it’s a web-based tool for mapping this information back onto source code, and allowing users to query and look-up this information.

There are already some really well-known tools for creating source code cross-references, from LXR to MXR to OpenGrok.  While these are all based on “search” and various ways of indexing text,  DXR is about reducing the need to do searches, and instead being able to look-up data.

The best way for me to describe it is to show you, so I’ve put together a short demo (watch on YouTube or if you’re using a snazzy Firefox 3.5 beta, you can view this .ogv file directly).

I’ve put together a live demo of DXR at http://dxr.proximity.on.ca/dxr/ which has both mozilla-central and comm-central available.  A couple points to be aware of with this site.  First, this data is not being updated daily, and I’m using revisions 7a8502b70fdf and 7b153b079c94 respectively.  Second, it’s not running on a proper server, so I don’t guarantee up-time or performance.  Once the code is improved a bit we’ll hopefully move it to a proper box, use something other than SQLite for the back-end, and start doing regular indexing.

Having said all that, it’s quite usable.  If you’re a Mozilla developer, you’ll be able to navigate your way around and quickly see how this is different from what we have today.  If you’re not familiar with the Mozilla code, here are some examples you can look at:

You can get the DXR source code here, (using Mercurial: hg clone http://hg.mozilla.org/webtools/dxr/) and patches and contributions are indeed welcome!  The current release and code represents a lot of experiments and poking around in the dark to see if this was even possible.  While the code is still very much demo quality,  I think it’s safe to say that we’ve moved from “I wonder if this is doable” to “there are still a bunch of bugs.”

A few other things I’ll point out about this release:

  • The data represents a Linux-only C++/IDL analysis (i.e., no C, no Mac- or Windows-only code).  To date this has not been done on Mac or Windows.  As such, you’ll probably hit files where there seems to be incomplete data, and this is probably why.  Dehydra works on Mac, and we’ll likely us cross-compiling to get a Windows analysis.
  • This code is heavily tied to eccentricities in the Mozilla code base (I can tell you first hand that our code has many strange and scary corners!).  Could you use this to index other large C++ programs?  Yes, but not without modification.  If you’re interested in trying, get in touch and I’ll help guide you through the process.
  • There are lots of places where we have bugs in the data.  Some of them are due to GCC bugs, some to Dehydra bugs, and lots are due to my bugs using these tools.  For example, I don’t currently deal with globals, C++ fragments in IDL, etc.  If you wonder why something isn’t clickable, or why you click and get an empty pop-up, “it’s a bug.”
  • I’m still struggling for a way to show all the data I have.  The current UI is based on my own ideas, and the feedback of a small group of Mozilla developers.  There’s more we could collect and show if I could figure out how, and I’d love some UX/UI people to give a hand with that.  If you have recommendations, also let me know.

I said earlier that DXR is two things, and the web app is just one way I can imagine using this data.  Could this data be integrated into Eclipse or Komodo or emacs?  Sure it could, and it’s only waiting for you to do it.  I’d be happy to provide pointers or work with you to get the data in a form that makes this possible.  Speaking of which, you can download my SQLite databases, mozilla-central.sqlite.zip and comm-central.sqlite.zip, on their own.  The database schema is here.

I’m really happy that this is finally in a state I can share with others.  My next steps will be to improve the code and quality of the data.  I’d also like to start adding other data: anything that can be mapped to source code lines and tokens is a potential candidate (performance data, bug info, documentation links, hooks for rewriting tools, etc.).  Maybe you’ve got ideas and would like to get involved.  I look forward to hearing from you.

The best way to get in touch is either via email, the Mozilla Static-Analysis mailing list, or on irc (I’m humph on moznet and can be found in #static, among other places).

This entry was posted in CDOT, Mozilla, Mozilla Education, Seneca, Teaching Open Source. Bookmark the permalink. Both comments and trackbacks are currently closed.

18 Comments

  1. Ben Hearsum
    Posted June 28, 2009 at 11:25 pm | Permalink

    It’s awesome to see this out the door finally, you’ve been working on this for so long now.

  2. Andrew Sutherland
    Posted June 28, 2009 at 11:53 pm | Permalink

    Congrats / Hooray!

  3. Posted June 29, 2009 at 12:26 am | Permalink

    Humph, you sir are awesome.

    I love how you took on this huge project and pushed it over the line. I see the blood, sweat and tears that went into this and it’s no small achievement. Big congratulations, leading by example, as always.

  4. Posted June 29, 2009 at 1:26 am | Permalink

    Very well done. This is a massive project and you’ve managed to handle it exceptionally well. Cheers!

  5. Posted June 29, 2009 at 1:43 am | Permalink

    This is more important to Western Civilization than the Battle of Thermopylae.

  6. Posted June 29, 2009 at 6:19 am | Permalink

    neat-o

    I tiny observation – the file view page could say what file is open (useful when you jump between defn & declr)

    Steve

  7. Posted June 29, 2009 at 3:00 pm | Permalink

    Beautiful! I’ve been looking forward to this for quite a while now.

    I’ve been thinking about how CouchDB might be a great fit for this type of data. Hopefully I’ll have some time poking at how it could look. My vacation is in 5 weeks, this makes me look forward to it even more :) .

    Thanks for this. I hope to skunk work this in at work somehow, we have a couple million lines of C that needs to be seen like this.

  8. Posted June 29, 2009 at 4:14 pm | Permalink

    This is serious hotness! Nice going!

  9. Posted June 29, 2009 at 4:40 pm | Permalink

    That looks freakin’ awesome. I’ve spent so many hours of my life in lxr, mxr, bonsai, xulplanet and MDC I don’t even want to think about it. This looks like one of the most amazing source code navigation tools ever to exist!

  10. AndersH
    Posted June 29, 2009 at 7:39 pm | Permalink

    When you have used c# with Resharper (or similar), something like this is something, you just take for granded. It’s all build in to the ide and always up-to-date.
    But when tracemonkey (or its successors) becomes fast enough (if it is not already), hopefully most of the c++ will go away.

  11. skierpage
    Posted June 30, 2009 at 5:11 am | Permalink

    Wowzah, greatness.

    It would be nice if macros also had a Declaration/Implementation link. As your NS_ENSURE_SUCCESS macro example shows, viewing the macro raises more questions, e.g. What is NS_FAILED?

    Related: The definitions aren’t recursive, I can’t click in the box. I think that’s probably good, it avoids leaving my location and having to use the back button.

    Make the pop-up’s close target easier to hit by closer to the edge and/or bigger.

    It’s still hard to get the value of status codes, e.g. NS_OK_EMPTY_NAME
    Maybe magically show the constant something ends up as, like the parenthetical Value() you show for some members such as nsIAccessibleRole::ROLE_GRAPHIC.

    Great stuff!

  12. skierpage
    Posted June 30, 2009 at 5:20 am | Permalink

    Some typedefs like PRuint32, PRbool declarations are 404s, e.g.
    /home/dave/dxr/objdir-opt/dist/include/prtypes.h:305

    Often the declaration of a local variable or function parameter is visible nearby, maybe you could highlight its line number as you roll over the Declaration in the pop-up.

  13. Asrail
    Posted July 1, 2009 at 6:08 pm | Permalink

    Wonderful!

  14. Posted July 2, 2009 at 10:18 pm | Permalink

    Awesome. This tool is amazingly useful.

    I see http://zenit.senecac.on.ca/wiki/dxr/ now redirects to this DXR. any plans on bringing back that type of interface? It worked beautifully in a SSB.

  15. Andrew Smith
    Posted July 4, 2009 at 2:23 pm | Permalink

    Nice demo, you should have made it a long time ago. Or you should have made a student make it :) I know you’re very busy.

  16. Brian
    Posted August 15, 2009 at 5:31 pm | Permalink

    Why on earth did you tie this to Mozilla’s source code so tightly?

  17. Posted August 16, 2009 at 5:32 pm | Permalink

    @Brian: I did this in order to aid my work on Mozilla, so the tight integration with Mozilla is a natural consequence of my own development focus. Also, the tools I’m using are tools being developed by Mozilla. Having said that, I think it would be great to move outward and generalize this technique. I’d love to have all of Feodra, for example, available this way. That’s certainly on my radar. Maybe you want to help?

  18. Posted December 29, 2009 at 9:53 am | Permalink

    best best best!!!

5 Trackbacks

  1. [...] Humphrey just released his DXR bombshell. The “basic” idea behind DXR is extraction of the rich semantic [...]

  2. [...] bodies.  Its namesake, Dehydra, is a static analysis tool for C++ and has already given us great things (dxr, also in its [...]

  3. By Bread and Circuits » Extra Curricular on September 10, 2009 at 3:44 pm

    [...] along side your students, and I also plan to do just that with mine.  Before I went on holidays I released a piece of open source software that I’m particularly proud of called DXR.  When I say “proud” I don’t [...]

  4. By Break it real good on September 30, 2009 at 11:17 am

    [...] process he uses the highly useful __FILE__ and __LINE__ macros.  I love this.  All of my work on DXR to extract every last drop of semantic info from the Mozilla source and build relies on shenanigans [...]

  5. By FSOSS 2009 is next Friday Oct 30 on October 21, 2009 at 1:30 pm

    [...] a speaker.  Taras and I will be wowing the crowd with talk of static analysis tools, dehydra, and dxr.  In addition to us, there are other great Mozilla-related talks going on, [...]