The figure below illustrates the overall scale and structure of a carlovision photoset. The upper area shows thumbnails of corrected stereo views, with one "Time Location" (one revolution of the camera rig) per row. Each row contains 16 "gaze directions". This particular photoset contains 102 rows (not all shown), representing about an hour of shooting. Folders and files have been logically named to simplify use and exchange by the different software components.
A suite of software programs was written to convert the thousands of photos from the cameras into optically corrected and logically structured photosets, and to then display these in response to user input. The figure below illustrates the overall flow of information in the carlovision system.
Programs were written in C# express, free from Microsoft, and run on Windows 7 PCs. C# and the resulting programs have proven to be capable, unbuggy and stable.
The 2 frame animation at right shows (in a very exaggerated manner), the issues involved with stereo pairing. The animation flips between a corrected pair and a pair with the full gamut of errors: X, Y, Rotation, and Scale.
Even with significant efforts to "square up the cameras", a twin camera rig will always present alignment errors... lenses have some play which affects how the image lands on the sensor, mechanical elements expand and contract, etc.
Although humans can see 3D when small errors exists in a stereo pair, for comfortable viewing, it is important to minimize these errors. MasterPair Generator is an interactive application created for this purpose.
Fundamental to this system is the controlled rotation of the camera rail, stopping at particular compass headings each revolution (aka azimuth indexing). Although it may seem that such a stepper motor based pan head could achieve "perfect" azimuth repeatability, it cannot. The step motor's gearhead has a tiny bit of elasticity, which, especially in windy conditions, causes the stop positions to be slightly off. Although these azimuth errors are only on the order of 0.1 degree, this translates to 10 pixels, which is noticable. A mechanical fix would add lots of weight and cost, so the issue is corrected in software by the Dejitter application.
The animation at upper right shows extreme jitter. This is a closeup of material shot in high (~40mph) gusting wind, which, in addition to azimuth error, also caused tripod movement (rotation and translation).
The animation at lower right shows the effect of the Dejitter software.
CREATING PANORAMIC 3D CONTENT
It is an interesting challenge to transform the original photos below into some sort of merged 3D view.
Pairing the left and right photos of a single gaze direction is a well understood problem, but when multiple gaze directions are involved, especially when these are distemporal (shot at slightly different times) and containing moving elements, things get a little wierd.
There is no perfect solution, but the chosen "mosaic" approach works pretty well.
Original photos, above; transformed into mosaic stereo panoramic, below. (click below to see larger size)
THE TROUBLE WITH STITCHING
Stitching software is amazing: it can take several individual photos of overlapping gaze directions and magically blend them into a smooth single photo - within limits. Below we see what Photoshop's stitcher comes up with given our 5 left camera photos. Looks good - much more unified than the choppy mosaic version. So why not go with stitching?
Fig. A) Mosaic: clear vertical boundary
Fig. B) Stitcher: where's the boundary?
Upon closer inspection, we can see that the stitcher did some strange things.
Besides the fact that it refused to incorporate the outer 2 pictures, we see in figure B above how it handled a difficult boundary zone: in a confusing and seemingly arbitrary way. There is even a little chunk from the far right photo mixed in there! The stitcher, in fact, is so smart that its actions seem arbitrary. It tries to pick a boundary in such a way as to not break up objects. In figure C we see the actual boundary it came up with for GazeLoc 3 and how it kept the man's face intact.
The unpredictable meandering boundaries are a disaster for stereo pairing. For stereo pairs to work, the left and right component images must have correspondence, and existing stitching software breaks this correspondence.
The creation of mosaics is simple and deterministic - guaranteeing correspondence and good 3D viewing. Figure A shows the mosaic version of the boundary. The sharp discontinuity, rather than being disguised by blending is emphasized with the black line. Although annoying to some people at first, the lines are soon forgotten. And they are a more truthful about the photography: these are 2 different photos taken at 2 different moments.
The mosaic panorama approach usually works a lot better than these shown examples - this is a worst case scenario: camera rail pitched downward 20 degrees, close quarters and lots of movement. However, there is no avoiding the occasional chopped person.
Fig. C) Actual stitcher boundary: content, not position, based
Result of feeding 5 left camera photos into Photoshop's "Photomerge" stitching software
How the 3DPrerender software creates a mosaic (figure at right)
Since the cameras were pitched downward for this scene, trapezoidal transforms must be performed first. Then, carefully figured rectangular central portions of each trapezoid are grabbed and positioned on the output canvas.
Reasonably good match-up can be achieved with this simple method, especially if there are no close up objects in the shot.
Aside: The Fisheye Suggestion
A question / suggestion has come up several times is: why bother with motorized panning to take a series of pictures when a fisheye lens stereo rig could grab a 180 degree field of view in one shot, thus eliminating all panorama merging issues?
To understand why this wouldn't work so well, imagine a pair of fisheye equipped cams with 6" separation pointed straight ahead. This separation, also known as stereo base, is key to the 3D effect.
While it's true that both cams can see way over to the left, they are no longer separated 6" relative to that direction. In fact the effective separation drops to zero at 90 degrees left or right.
So there would be variable stereopsis in the field - strong in the middle, dropping to zero towards the sides. Furthermore, detail would plummet, since the relevant area (that other than sky and ground) of the scene would contain orders of magnitude fewer pixels.
A panning (azimuth indexed) cam rig has consistent stereopsis throughout 360 degrees, and far more pixels where they are needed.
click to see larger version
Prerendering for Playback Responsivness
Prerendering is the creation of various versions of a view in advance. The sole purpose of this is to allow the player software to do less work and thus be more responsive. Image scaling, cropping, formatting and trapezoidal transforms are processor intensive, and best done once, in advance. Prerendered filesets are made for specific player hardware, for example 1920 x 1080 side by side jpegs for the HD mirror viewer, and 1920 x 1080 over/under jpegs for 3DTVs.
Prerendering display-sized jpegs uses lots of disk space, but allows the player to show any desired view at a respectable 20-30 frames per second.
The Playback System consists of 3 elements: the Tactile user interface, the computer and the display. The computer acts as a storehouse and server of images: based on the user requests, (via tactile user interface) delivers the relevant jpeg to the full area of the display.
The tactile user interface contains a USB based Phidgets 8-8-8 module for reading user inputs. This device is readable by the player software.
There are currently 2 display output modes: Reflective Stereoscope, which produces correct aspect side-by-side stereo pairs, and 3DTV, which produces "verticle squished" over-under pairs.