Project: Build a Simple Multi-Touch Surface Platform
February 15th, 2009Boy, am I going to be ashamed of this post a year from now. Gratuitous multi-touch, comrades. Shoehorn one into your next gallery or get left behind.
The multi-touch exhibit bandwagon has left the station. We’ve all by now seen Microsoft Surface. (We’ve all by now had Microsoft endlessly dangle the prospect of loaning us one.) Ideum has just launched a really nice museum-targeted version they are calling mt2 and unveiling with a Yahoo! Maps-based interactive at the Don Harrington Discovery Center.
Why fight it? Your phone is already ringing. And museumssuck.com is here to abet the insanity with a quickie getting-started-on-the-cheap glimpse of surface multi-touch.
I’m not going to talk about, like, adaptive thresholding adjacency algorithms. Because I’m not that smart. Welcome to museumssuck.com. But supergeeks who are into that have made production-ready-ish software that does it and you and I can download it for free to make multi-touch interactives with a wow-factor-shelf-life of about twelve more weeks. So hurry up!
Here’s the least you’ll need:
The magic pixie dust is the vision framework software. There are lots now it seems. Microsoft has whatever it has, and I think there’s an SDK, but you’re on your own with your Microsoft. I’m pretty sure Ideum used NUI’s Snowflake 1.0 on the mt2. The three big free and open source projects, though, are: reacTIVision, Touchlib, and tBeta (which isn’t to diss BBTouch, Bespoke, or Touché because I’m not qualified.) I’m a reacTIVison fan, but by no means an authoritative one. Read up and choose what suits you.
What, you may reasonably ask, is a vision framework? Well, lots of frameworks means lots of answers to that question. If you’ll permit me to be broadbrush and reacTIVison-oriented, it’s ninja optical-tracking, blob-detection-type software. Your input device is a camera and your output is a bunch of data about who’s touching what, where, and when. In the case of reacTIVision (and most others) this data is OpenSound Control (OSC) messages via UDP in the TUIO protocol. If I just blew your mind there, take a wikipedia detour. It’s pretty easy stuff, really. Which may clear up as you read about the next pieces you’ll need…
You’ll need an OSC client. That is, you’ll need a way to hear OSC messages via UDP (or in some cases MIDI.) And then you’ll need a TUIO library. This leads us off into more platform-specific directions than I mean to go. If you’re a C++ person, your app will be the OSC client and there is tons of sample client code out there. You Flash kids are going to need something called Flosc. Flosc is a Java-based server that can hear OSC as UDP and then make it into XML for Flash. And if it sounds to you like that may cause speed issues, well, you’d be right. But it works. Youtube “multitouch flash” and see. But there is an OSC path to every pot. And a TUIO library for that matter. C#, C++, Java, Python, Flash, MaxMSP, Director, Pure Data, Quartz, Processing.
And that’s it. Software-wise. Next you’ll need to make the actual multi-touch surface thinger. Mind you, there are simulators that exist for development purposes, for you software-only or small cubicle participants. The reacTIVison folks make a TUIO simulator that you can use with any platform. Download that and the stuff above and you’re pretend-multi-touching. Thank you, drive through.
But you’re going to need a table eventually, right? So what is the least you’ll need then? Well, obviously you’ll need a…
Camera. We could talk for days about camera variables. Or so long as the we didn’t include me, we could. There’s lots to know about cameras and multi-touch and that’s why there are so absurdly many blogs in this world. At museumssuck.com, tech posts come big picture/small budget. You’re working-ish today and you’re on your own from there. So let’s talk hundred dollar webcams, shall we?
Let’s step back, though, for the sake of a core concept. Your table has one surface. The input is a camera. The output is (probably) a projector. You see the problem already, don’t you? The surface has to be cleanly illuminated if the camera is going to be able to see correctly what is happening on it, and the table has to be dark if the visitor is going to be able to see the projected images. The solution is to do these tasks in different light spectra. The projector uses (for obvious reasons) the visible(-to-humans) spectrum. Your camera, then, will use IR. This means two things to your purchase of a cheapo multi-touch camera. The first is that your camera must be able to see IR. Most will. But sometimes they will deliberately filter it. Usually with a filter you can just remove, but sometimes there will be an IR filtered lens. That won’t work. The second is that your camera must NOT see visible light. And if you’re dealing in webcams, this means you are going to need a special, separate IR pass filter. Handily, your visitors come from the factory already unable to see IR. Now back to general camera junk.
Get Firewire not USB. Higher frame rate, higher resolution, lower compression (means lower driver overhead.) If you’re planning to jam this all into reasonably-sized casework, you’ll also need to choose a camera that will take a separate wide-angle lens. Oh, and get CCD, not CMOS. $111 will get you a Unibrain Fire-i that is all of these things.
IR LEDs. Because you’re doing this museumssuck.com-style, you will need some wide-angle IR illuminators. What for? To light up your surface with IR so your IR camera can see it better. This is called Diffused Illumination and is the quickest, dirtiest method for doing optical multi-touch. Other common and affordable methods are Frustrated Total Internal Reflection, Diffused Surface Illumination, and Laser Light Plane. You’ll probably end up learning about these, because they have significant advantages over DI. But they’re trickier to build and (more importantly) trickier to blog about, and the basic operating principle is the same. So: DI, it is. Next…
A surface. Where would you be without this tutorial? You need a surface. A sheet of plexiglass’ll do. Glass. Acrylic. Clear, flat, won’t flex. And it will need a diffuser layer for the projector image. But it needs to let some light through. Tracing paper works. Vellum. Mylar. Lee Filter. You get the idea.
If you’ll allow that the software implied a computer, that’s it really. Connect a firewire cam to a PC, aim it an IR-illuminated sheet of vellum-covered plex, download and run reacTIVision and -poof- you’re a multi-touch whatzit. Mind you, that doesn’t include an image to touch, which is going to be limiting, so let’s add…
A projector. Actually that somewhat betrays the spirit of museumssuck.com, because you can use an LCD for prototyping or if your surface is small enough. Whatever. I don’t have much to say about either anyway. Don’t forget about wide-angle considerations for casework and such.
Now you’re really done. Or museumssuck.com done anyway. Here’s a diagram from the reacTIVision site:
Blah, blah, blah, download and run reacTIVision, and -poof- is where we left off. You’re done. If spitting OSC messages counts as done. You’re spitting OSC messages. By default you are spitting them on localhost (127.0.0.1) port 3333. You can change that by editing your reactivision.xml config file. Which brings us back to the OSC client and TUIO library you chose above. Together those’ll turn spat OSC messages into usable classes for your preferred programming language.
Last step: Stretch and shrink pictures. Drag pictures. Spin pictures. Two of you at once. You drag a stretched picture while she spins a shrunk picture. Drag and spin a Google map! Enjoy it while it lasts. In two years every Elo will have plug-and-play capacitive multi-touch and your Google map spinning projection table will be deleted from your résumé and posted as a joke on museumssuck.com.
Oh, and I’ll see you on the brilliant and friendly nuigroup.com forums. Because you’ll be there a lot soon.
