Structured Light 3D Scanning
by kylemcdonald in Workshop > 3D Printing
344907 Views, 654 Favorites, 0 Comments
Structured Light 3D Scanning
The same technique used for Thom's face in the Radiohead "House of Cards" video. I'll walk you through setting up your projector and camera, and capturing images that can be decoded into a 3D point cloud using a Processing application.
House of Cards Google Code project
I've included a lot of discussion about how to make this technique work, and the general theory behind it. The basic idea is:
1 Download ThreePhase
2 Rotate a projector clockwise to "portrait"
3 Project the included images in /patterns, taking a picture of each
4 Resize the photos and replace the example images in /img with them
Point Clouds with Depth of Field from Kyle McDonald on Vimeo.
House of Cards Google Code project
I've included a lot of discussion about how to make this technique work, and the general theory behind it. The basic idea is:
1 Download ThreePhase
2 Rotate a projector clockwise to "portrait"
3 Project the included images in /patterns, taking a picture of each
4 Resize the photos and replace the example images in /img with them
Theory: Triangulation
If you just want to make a scan and don't care how it works, skip to Step 3! These first two steps are just some discussion of the technique.
Triangulation from Inherent Features
Most 3D scanning is based on triangulation (the exception being time-of-flight systems like Microsoft's "Natal"). Triangulation works on the basic trigonometric principle of taking three measurements of a triangle and using those to recover the remaining measurements.
If we take a picture of a small white ball from two perspectives, we will get two angle measurements (based on the position of the ball in the camera's images). If we also know the distance between the two cameras, we have two angles and a side. This allows us to calculate the distance to the white ball. This is how motion capture works (lots of reflective balls, lots of cameras). It is related to how humans see depth, and is used in disparity-based 3D scanning (for example, Point Grey's Bumblebee).
Triangulation from Projected Features
Instead of using multiple image sensors, we can replace one with a laser pointer. If we know the angle of the laser pointer, that's one angle. The other comes from the camera again, except we're looking for a laser dot instead of a white ball. The distance between the laser and camera gives us the side, and from this we can calculate the distance to the laser dot.
But cameras aren't limited to one point at a time, we could scan a whole line. This is the foundation of systems like the DAVID 3D Scanner, which sweep a line laser across the scene.
Or, better yet, we could project a bunch of lines and track them all simultaneously. This is called structured light scanning.
Triangulation from Inherent Features
Most 3D scanning is based on triangulation (the exception being time-of-flight systems like Microsoft's "Natal"). Triangulation works on the basic trigonometric principle of taking three measurements of a triangle and using those to recover the remaining measurements.
If we take a picture of a small white ball from two perspectives, we will get two angle measurements (based on the position of the ball in the camera's images). If we also know the distance between the two cameras, we have two angles and a side. This allows us to calculate the distance to the white ball. This is how motion capture works (lots of reflective balls, lots of cameras). It is related to how humans see depth, and is used in disparity-based 3D scanning (for example, Point Grey's Bumblebee).
Triangulation from Projected Features
Instead of using multiple image sensors, we can replace one with a laser pointer. If we know the angle of the laser pointer, that's one angle. The other comes from the camera again, except we're looking for a laser dot instead of a white ball. The distance between the laser and camera gives us the side, and from this we can calculate the distance to the laser dot.
But cameras aren't limited to one point at a time, we could scan a whole line. This is the foundation of systems like the DAVID 3D Scanner, which sweep a line laser across the scene.
Or, better yet, we could project a bunch of lines and track them all simultaneously. This is called structured light scanning.
Theory: Three Phase Scanning
Imagine projecting a single gradient from white to black on a scene. The brightness of a given point then corresponds to the angle from the projector. But there are three problems here: this only yields 256 possible angles due to the bit depth of most cameras and projectors, the inherent color and reflectivity of the scene isn't accounted for, and the way light intensity drops off with distance isn't modeled.
If we project multiple gradients that are slightly offset from each other, we can overcome two of these problems. The light intensity drop off acts as a scaling factor on all light at a given point, the inherent color of a scene is an offset intensity. Sampling a point multiple times can help with the bit depth issue, but the better solution is to repeat the gradient as a triangle wave ("stripes"). This way adjacent lines don't have the same brightness.
With repeated gradients, we no longer have a unique angle identification for every line. To deal with this, we have to propagate the stripe number from one stripe to another. This is called phase unwrapping. There are a bunch of phase unwrapping algorithms, and they generally have a trade off between accuracy and speed. One of the simplest and fastest phase unwrapping algorithms uses a flood fill technique.
One remaining problem is that the pattern will blur as the scene moves away from the projector's focal plane. We can deal with this by using a sort of "pre-blurred" pattern: three cosine waves. This technique is called three phase scanning, and consist of the patterns: cos(θ - 2π/3), cos(θ), cos(θ + 2π/3).
If we project multiple gradients that are slightly offset from each other, we can overcome two of these problems. The light intensity drop off acts as a scaling factor on all light at a given point, the inherent color of a scene is an offset intensity. Sampling a point multiple times can help with the bit depth issue, but the better solution is to repeat the gradient as a triangle wave ("stripes"). This way adjacent lines don't have the same brightness.
With repeated gradients, we no longer have a unique angle identification for every line. To deal with this, we have to propagate the stripe number from one stripe to another. This is called phase unwrapping. There are a bunch of phase unwrapping algorithms, and they generally have a trade off between accuracy and speed. One of the simplest and fastest phase unwrapping algorithms uses a flood fill technique.
One remaining problem is that the pattern will blur as the scene moves away from the projector's focal plane. We can deal with this by using a sort of "pre-blurred" pattern: three cosine waves. This technique is called three phase scanning, and consist of the patterns: cos(θ - 2π/3), cos(θ), cos(θ + 2π/3).
Tools and Materials
Camera & mount
We're just taking three pictures and transferring them to the computer. So you could use anything from a webcam to a Polaroid and scanner, but you're best off with a standard digital camera.
Projector & mount
We're just projecting three frames, so you could use transparencies and an overhead projector, or even a slide projector, but you're best off with a digital projector you can feed with your computer. When projecting structured light patterns for capturing motion, the projector type is important (CCD vs DLP, color wheel frequency, etc.) but if we're just making a single scan it doesn't really matter.
ThreePhase decoder application
This can be downloaded from the structured-light Google Code project, and includes the patterns to be projected.
Processing
If you want to modify the decoder application, you'll need Processing. Processing is an "open source programming language and environment for people who want to program images, animation, and interactions." The decoder application is also using two external libraries that you will need to download and install: PeasyCam and ControlP5.
We're just taking three pictures and transferring them to the computer. So you could use anything from a webcam to a Polaroid and scanner, but you're best off with a standard digital camera.
Projector & mount
We're just projecting three frames, so you could use transparencies and an overhead projector, or even a slide projector, but you're best off with a digital projector you can feed with your computer. When projecting structured light patterns for capturing motion, the projector type is important (CCD vs DLP, color wheel frequency, etc.) but if we're just making a single scan it doesn't really matter.
ThreePhase decoder application
This can be downloaded from the structured-light Google Code project, and includes the patterns to be projected.
Processing
If you want to modify the decoder application, you'll need Processing. Processing is an "open source programming language and environment for people who want to program images, animation, and interactions." The decoder application is also using two external libraries that you will need to download and install: PeasyCam and ControlP5.
Projector-Camera Configuration
We'll use a "portrait" setup for this scan. This means that the projector and camera will be mounted on their sides. Projectors tend to have some internal lens shift: when you place a projector on a table, it projects the image "up" to a screen rather than shooting straight. To compensate for this, make sure the camera's optical axis is tangential to the surface the projector is hitting. In other words, if the projector has been rotated clockwise, keep the camera to the right of the projector.
At this point, the image planes of the camera and projector are approximately rectified: a horizontal line from the projector is a horizontal line on the camera. A white wall is helpful as a backdrop for testing that the camera and projector are aligned (in general, more reflective colors like white are easier to scan than darker colors).
Once aligned, the camera's perspective needs to have a vertical offset from the projector's perspective. Move the camera up: enough so you can see the curve in the cosine pattern being projected. Too much, and you'll be capturing a lot of shadows (areas where the projecting isn't projecting) or missing areas where the projector is projecting.
For mounting, you can use pretty much anything that is suitably stable. I've used everything from spider projector mounts, to desks with tape and cardboard, to music stands. The main goal is making sure that the geometry doesn't change while you're capturing each pattern.
At this point, the image planes of the camera and projector are approximately rectified: a horizontal line from the projector is a horizontal line on the camera. A white wall is helpful as a backdrop for testing that the camera and projector are aligned (in general, more reflective colors like white are easier to scan than darker colors).
Once aligned, the camera's perspective needs to have a vertical offset from the projector's perspective. Move the camera up: enough so you can see the curve in the cosine pattern being projected. Too much, and you'll be capturing a lot of shadows (areas where the projecting isn't projecting) or missing areas where the projector is projecting.
For mounting, you can use pretty much anything that is suitably stable. I've used everything from spider projector mounts, to desks with tape and cardboard, to music stands. The main goal is making sure that the geometry doesn't change while you're capturing each pattern.
Lighting Calibration
Once your projector and camera are arranged, the lighting needs to be considered.
Turning out the lights
Structured light is fairly strong against ambient illumination, but it will help to turn off as many lights as possible. That way only the projector is illuminating the scene.
Once the lights are out, make sure your projector isn't clipping the blacks or whites. Use a calibration pattern like the one from this Monitor Gamma Test, or the pattern attached below, while adjusting the "contrast" and "brightness" settings on your projector. Then adjust the white balance and exposure of your camera to make sure the whites aren't clipping. If your subject is close to the projector, the brightest white will be brighter than when your subject is further from the projector due to light intensity drop off.
Exposure Time Issues
You might find that the images from your camera look colored, even though you're projecting a black and white image. This is because your projector has a color wheel and you're capturing less than a full frame of the projection. To solve this, lower the ISO (decrease the sensitivity) and increase the exposure time of your camera to at least 1/60th of a second. If the whites are clipping, you have to either put a neutral density filter on the camera or move the scene away from the projector.
Eye Safety
Before you step in front of the projector, remember: it's bright, and retinal damage is irreversible. If you want to do bust scans, keep your eyes closed.
Turning out the lights
Structured light is fairly strong against ambient illumination, but it will help to turn off as many lights as possible. That way only the projector is illuminating the scene.
Once the lights are out, make sure your projector isn't clipping the blacks or whites. Use a calibration pattern like the one from this Monitor Gamma Test, or the pattern attached below, while adjusting the "contrast" and "brightness" settings on your projector. Then adjust the white balance and exposure of your camera to make sure the whites aren't clipping. If your subject is close to the projector, the brightest white will be brighter than when your subject is further from the projector due to light intensity drop off.
Exposure Time Issues
You might find that the images from your camera look colored, even though you're projecting a black and white image. This is because your projector has a color wheel and you're capturing less than a full frame of the projection. To solve this, lower the ISO (decrease the sensitivity) and increase the exposure time of your camera to at least 1/60th of a second. If the whites are clipping, you have to either put a neutral density filter on the camera or move the scene away from the projector.
Eye Safety
Before you step in front of the projector, remember: it's bright, and retinal damage is irreversible. If you want to do bust scans, keep your eyes closed.
Capture and Decoding
Capture
With everything in place and calibrated, set your resolution to 1024x768 and open up the three cosine images and display them full screen, one at a time. Most operating systems have a built in image viewer that will let you use "slide show" mode and step forward manually. Zoom in your camera if necessary. Focus the projector on the subject, not behind them. Try to match the example images as much as possible.
Decode Setup
Once captured, transfer your images to the computer and resize them, reorienting them to portrait if necessary. 480x640 is a good size to start with. That might sound small, but remember: this is 300k points, or 600k triangles! Rename the images to "phase1.jpg", "phase2.jpg", and "phase3.jpg" and replace the example images. Make sure the images are in order. It's not important which one comes first, just that the pattern moves gradually in 2π/3 steps from image to image.
Decode Parameters
There are three decoding parameters: noise threshold, zskew, and zscale. Both zskew and zscale are dependent on the projector-camera configuration, and the noise threshold depends on the brightness of the subject as well as the ambient light. In other words, once your camera and projector are in place, you only need to tweak zskew and zscale once. zskew and zscale are related to the two measurements described in Step 1 (distance between camera and projector, and angle between camera and projector).
First tweak the noise threshold so that anything that's too dark to decode doesn't affect the decoding. Anywhere from 5-20 is a good starting point. Then modify the zscale so the points have some depth. Make sure the depth is in the right direction, and that your subject isn't flipped left-right (or really inside-out, like a mirror). Finally, modify zskew so the points are slanted correctly. If your camera is looking down slightly towards a vertical wall, the wall should lean forward slightly. Changing the decoding parameters is an iterative process: sometimes you have to get one right before you can see how to change the others.
With everything in place and calibrated, set your resolution to 1024x768 and open up the three cosine images and display them full screen, one at a time. Most operating systems have a built in image viewer that will let you use "slide show" mode and step forward manually. Zoom in your camera if necessary. Focus the projector on the subject, not behind them. Try to match the example images as much as possible.
Decode Setup
Once captured, transfer your images to the computer and resize them, reorienting them to portrait if necessary. 480x640 is a good size to start with. That might sound small, but remember: this is 300k points, or 600k triangles! Rename the images to "phase1.jpg", "phase2.jpg", and "phase3.jpg" and replace the example images. Make sure the images are in order. It's not important which one comes first, just that the pattern moves gradually in 2π/3 steps from image to image.
Decode Parameters
There are three decoding parameters: noise threshold, zskew, and zscale. Both zskew and zscale are dependent on the projector-camera configuration, and the noise threshold depends on the brightness of the subject as well as the ambient light. In other words, once your camera and projector are in place, you only need to tweak zskew and zscale once. zskew and zscale are related to the two measurements described in Step 1 (distance between camera and projector, and angle between camera and projector).
First tweak the noise threshold so that anything that's too dark to decode doesn't affect the decoding. Anywhere from 5-20 is a good starting point. Then modify the zscale so the points have some depth. Make sure the depth is in the right direction, and that your subject isn't flipped left-right (or really inside-out, like a mirror). Finally, modify zskew so the points are slanted correctly. If your camera is looking down slightly towards a vertical wall, the wall should lean forward slightly. Changing the decoding parameters is an iterative process: sometimes you have to get one right before you can see how to change the others.
Variations
Congratulations, you made a 3D scan from only three images! So where do we go from here? To keep track of where I take this technique next, watch my Vimeo. Here are some things I've been looking into. Feel free to leave any ideas or questions in the comments section!
3D Motion
This only gives us one scan. To capture motion with this technique requires a few more tricks. The first one is synchronizing a camera and projector so we can capture at a high rate. Commercial structured light systems use hardware and special electronics to do this. A DIY approach might use the vertical sync signal from a camera to drive a microcontroller that generates the three phase patterns for a projector. Without real sync, we can get pretty far with a 60 fps camera like the PS3Eye and a 60 Hz projector.
Exporting the Data
Exporting the 3D data for use with other applications is obviously important if you want to do something with the data. Maybe fabricating a miniature, or using the mesh for a character in a video game. A more complex application simply called decode.zip is available from the structured-light project. It can handle exporting into various 3D formats like .png depth maps, .ply and .obj triangle meshes and point clouds. It will also allow you to capture motion as described above. As this application develops, I'll write another Instructable describing how to capture 3D video.
More Accurate Unwrapping
Phase unwrapping, mentioned in Step 2, is a big part of phase-shift scanning. There isn't a single "right" answer to an unwrapping problem given the wrapped phase. However, the flood fill technique is clearly not ideal as it can create regions with large phase discontinuities along straight lines. Better phase unwrapping algorithms could avoid these obvious errors.
Automatic Calibration
We should be able to automatically approximate zskew, zscale and the noise threshold parameters by taking some test scans of reference subjects.
Absolute Position
Three phase scanning can only recover relative position by propagating phases during the unwrapping stage. In order to take absolute measurements of every position in a scene, we can use cosine patterns with many different frequencies, or use a technique called Gray code scanning. Gray codes assign a unique code to each stripe by using 10 patterns instead of 3.
Invisible Capture for Performance
Unless you like the aesthetic of flashing scrolling lines, structured light isn't the best way to capture 3D information on a stage in front of an audience. One solution to this involves modifying a projector to remove the infrared-cut filter and replace it with a visible light-cut filter. Then, with an IR camera, you can see the projected patterns without disturbing the scene in the visible spectrum.
3D Motion
This only gives us one scan. To capture motion with this technique requires a few more tricks. The first one is synchronizing a camera and projector so we can capture at a high rate. Commercial structured light systems use hardware and special electronics to do this. A DIY approach might use the vertical sync signal from a camera to drive a microcontroller that generates the three phase patterns for a projector. Without real sync, we can get pretty far with a 60 fps camera like the PS3Eye and a 60 Hz projector.
Exporting the Data
Exporting the 3D data for use with other applications is obviously important if you want to do something with the data. Maybe fabricating a miniature, or using the mesh for a character in a video game. A more complex application simply called decode.zip is available from the structured-light project. It can handle exporting into various 3D formats like .png depth maps, .ply and .obj triangle meshes and point clouds. It will also allow you to capture motion as described above. As this application develops, I'll write another Instructable describing how to capture 3D video.
More Accurate Unwrapping
Phase unwrapping, mentioned in Step 2, is a big part of phase-shift scanning. There isn't a single "right" answer to an unwrapping problem given the wrapped phase. However, the flood fill technique is clearly not ideal as it can create regions with large phase discontinuities along straight lines. Better phase unwrapping algorithms could avoid these obvious errors.
Automatic Calibration
We should be able to automatically approximate zskew, zscale and the noise threshold parameters by taking some test scans of reference subjects.
Absolute Position
Three phase scanning can only recover relative position by propagating phases during the unwrapping stage. In order to take absolute measurements of every position in a scene, we can use cosine patterns with many different frequencies, or use a technique called Gray code scanning. Gray codes assign a unique code to each stripe by using 10 patterns instead of 3.
Invisible Capture for Performance
Unless you like the aesthetic of flashing scrolling lines, structured light isn't the best way to capture 3D information on a stage in front of an audience. One solution to this involves modifying a projector to remove the infrared-cut filter and replace it with a visible light-cut filter. Then, with an IR camera, you can see the projected patterns without disturbing the scene in the visible spectrum.
Other Resources
Interactive Examples
Gray Code Decoder
This is the first 3D scanner I built, based on gray codes and intersecting projector/camera rays.
Three Phase Decoder
This is a stripped down version of a three phase decoder, implementing only the bare essentials.
Papers
Build Your Own 3D Scanner
Great overview of how 3D scanning works, with a description of the math involved.
Pattern Codification Strategies in Structured Light Systems
Amazing compilation of different approaches to structured light scanning. If you think you have better patterns than three phase scanning, compare them to the ones in this paper.
Multiview Geometry for Camera Networks
Slightly deeper introduction to the math involved in surveying a scene from multiple perspectives.
Temporal Dithering of Illumination for Fast Active Vision
Crazy approach to ultra-fast structured light that uses the dithering patterns of DLP projectors.
Researchers
Song Zhang
The researcher behind Radiohead's "House of Cards" video.
Paul Debevec
Capturing higher order material information, like subsurface scattering and polarization.
Michael Wand
Lots of work processing 3D data from structured light scanners.
Gray Code Decoder
This is the first 3D scanner I built, based on gray codes and intersecting projector/camera rays.
Three Phase Decoder
This is a stripped down version of a three phase decoder, implementing only the bare essentials.
Papers
Build Your Own 3D Scanner
Great overview of how 3D scanning works, with a description of the math involved.
Pattern Codification Strategies in Structured Light Systems
Amazing compilation of different approaches to structured light scanning. If you think you have better patterns than three phase scanning, compare them to the ones in this paper.
Multiview Geometry for Camera Networks
Slightly deeper introduction to the math involved in surveying a scene from multiple perspectives.
Temporal Dithering of Illumination for Fast Active Vision
Crazy approach to ultra-fast structured light that uses the dithering patterns of DLP projectors.
Researchers
Song Zhang
The researcher behind Radiohead's "House of Cards" video.
Paul Debevec
Capturing higher order material information, like subsurface scattering and polarization.
Michael Wand
Lots of work processing 3D data from structured light scanners.