We’re used to hearing about how applications such as Flickr and Google Earth
are providing revolutionary new ways of looking at digital images.
But a technology development announced by Microsoft at the
ACM
Siggraph conference (the annual conference of the Special Interest Group of
the Association for Computing Machinery) in August looks like scoring the prize
for the most innovative recent development in digital image software.
Photosynth takes a collection of geographically related images and arranges
them in a 3D-modelled space so you can navigate through them.
It is, appropriately enough, a synthesis of three software technologies that
provides a new kind of environment for browsing photos. Those technologies are
image-based modelling, image-based rendering and image browsing.
To put it another, perhaps slightly glib way,
Microsoft
has rolled together technologies from computer gaming, panorama stitching and
photo organising to create an entirely new and original way of looking at
digital photos.
Photosynth doesn’t seek to produce a seamless, technically perfect 360º
panoramic vista of a scene in the way that panorama stitching software such as
Realviz
Stitcher,
or
Pano Tools does. Instead, it positions individual images within a 3D model
that allows you to navigate between them and take a closer look at whatever
interests you.
The software engineers behind Photosynth are Noah Snavely and Steven M Seitz
of the University of Washington’s Graphics and Imaging Laboratory (Grail) and
Microsoft’s Richard Szeliski.
In their paper
‘Photo
Tourism: Exploring Photo Collections in 3D’, they explain that the object is
“not to synthesise a photo-realistic view of the world from all viewpoints per
se, but to browse a specific collection of photographs in a 3D spatial context.”
How does it work?
Like panoramic photography, Photosynth computes the location, orientation and
geometry of images in a scene by comparing matching features in pairs of
overlapping images. It even uses the same Scale-Invariant Feature Transform
(SIFT) algorithm that is used in some panorama stitching software.
Next comes an optimisation process that maps the position of each image
relative to its neighbour, starting with a pair of images and incrementally
adding more images and re-running the optimisation algorithm.
The final step is to align the model with a geo-referenced image, a satellite
map for example, or a digital elevation map such as those used by
Google
Earth.
In one sense, at least, Photosynth’s job is easier than a panorama
stitcher’s, because it doesn’t have to produce an exact seamless match. In
panorama stitching, however, a lot of the variables are eliminated by using a
known camera and lens combination and by precisely controlling the movement of
the camera between shots.
The material Photosynth has to work with will have been shot handheld on
anything from a digital SLR with a telephoto lens to a cameraphone. Matching and
accurately positioning these images is a vast computational undertaking.
As you’d expect, this is not a process that happens in real time, or anything
like it. The optimisation takes the bulk of the time as it involves multiple
iterations which slows down with the addition of each new image and as more
images share matching points.
In tests using images of a section of the Great Wall of China, shot with the
same camera and lens over a short period of time, the render time for a set of
120 photos, of which 82 were registered (that is, the software was able to
process them) was several hours. A set of 2,635 ‘uncontrolled’ images obtained
from Flickr (of which 597 were registered) took several days.
Now watch the demo
Although there is currently no Photosynth application available, you can view a
live
demo Java applet of the Washington Grail research group’s Photo Tourism
applet, on which Photosynth is based.
The demo displays the 3D space as a ‘point cloud’ with the image frusta
overlayed. And if you’re wondering what a frustum is, in this case, they are 3D
pyramid shapes which indicate the position within the 3D space of cameras, the
direction they are pointed in and their angle of view.
Clicking on any one of the cameras displays the image at that location. Just
as interesting, if not more so, than the final view is the journey, which flies
you smoothly through the 3D model, passing other cameras on the way. Transitions
from one camera to another are very slick, incorporating smooth movement through
the 3D space, as well as a dissolve.
A step-back button does just that, depicting a wider field of view from which
you can select alternative cameras. The Photosynth application will feature
‘geometric browsing tools’ which will allow you to move left and right and to
view parts of the scene at different scales.
There will also be a ‘similar’ button that will display alternative images of
the same scene, for example at different times of the day or year, or even over
longer time periods, enabling historical comparisons.
Photosynth’s zooming and its ability to display high-resolution detail, will
surpass conventional pixel-based viewers. The multi-image composition of scenes
makes it possible to drill down to fine detail using new images, rather than
enlarging existing ones, until the pixels look like breeze blocks.
This process can happen in real time even on narrowband connections due to a
technology called Seadragon acquired by Microsoft in February this year.
Reader comments