I did a small experiment to see how difficult it would be to track joints from video material. I was meaning to do this before back in the DOS days after seeing the cool rotoscoping in Another World (Out of this World), but back then it was too difficult to get the video into some form that I could easily read from code.
What I wanted was dancing material with an unchanging background and a static camera. Now there is YouTube so it is really easy to find this. Next I downloaded the FLV movie using a 3rd party tool (there are many). I used "FLV Extract" to get the video stream from the FLV file as an AVI file. Then I used Virtualdub to crop it to a short segment. Converted that segment into a GIF animation with GIF Movie Gear, then finally imported the animation to a Flash timeline.
Next I attempted to track the left wrist of the dancer to see how much work it is. As a first test it took about 5 minutes of concentrated effort to mark 50 frames. Mostly I was so slow because I needed to move forward and create keyframes in the timeline and drag the joint marker with my mouse. I think if I made a small tool that let me just tap on a joint with my drawing pad, I could manage in under 5 seconds per joint per frame.
At least 20 joints would be necessary to make a stick figure dance like the dancer. Three minutes of YouTube video at 12 fps is 2160 frames times 20 joints is 43200 taps times 5 seconds is 60 hours! And of course this data is 2D, so it isn't even clear what could be accomplished with it. One thing did occur to me though -> you could use this as an affordable motion capture solution for games. Suppose you have a front camera and a side camera filming the same footage, so 120 hours of work to track it. Send the task to China, suppose it costs $5/hour -> total $600 for probably all of the motions needed for a small game.