For our Computer Vision class we had to do 3D-reconstruction using OpenCV. I worked on this assignment together with a classmate. We had to geometrically calibrate our three cameras, apply background substraction, generate a set of voxels visible on all cameras, and visualize these voxels in 3D space. We finally applied Marching Cubes to create a more smooth model. Our scene consisted of a person wearing a horse mask, sitting in a chair.
The endresults can be viewed in the video below. In the video some artefacts are visible, especially when the voxel visualization is used. Please note that these artefacts were the result of a faulty GPU, and are not part of our implementation. We did not see any artefacts when testing on other machines.
We had to calculate the camera intrinsics ourselves. For this we used a video in which a chess board was moved around in front of the camera. We applied an algorithm that automatically searched for a set of frames that minimized the average reprojection error.
To calibrate camera extrinsics, the program initially required the user to manually specify the corners of a chess board in a frame. This was necessary because the resolution of the cameras was not high enough to automatically detect these corners with sufficient precision. This was an excruciatingly labour-intensive task so we figured we would create a better method for this. Therefore we introduced the FlexGrid, which allows you to just click and drag a lattice until it matches the chess board.
To acquire a clear, noise-free binary image representing the foreground objects, we used a Gaussian Mixture Model to represent the background. We deviated from the assignment (which suggested using an HSV model) in doing so, but we showed that our method resulted in higher background subtraction quality. When applying background subtraction we used this model to subtract from a frame, resuting in a binary foreground image. Since this image was still quite noisy we applied a morphological closing operator to close small gaps in the image. To remove the noise that consists of small objects, we first detected all contours in the image, and then removed the external contours that had an area smaller than a user specified value. For every contour that is large enough, we filter out all holes (internal contours) smaller than a second user specified value. This resulted in a near-perfect, noise-free background/foreground segmentation.
Using the foreground-images of multiple camera we could apply triangulation to create a 3D reconstruction of our scene. Initially this resulted in a point-cloud, but we added support for (colored) voxels and even a relatively smooth mesh created using Marching Cubes.
We also implemented custom 3D versions of morphological operations such as opening and closing. We noted that these operations were usefull to improve the reconstruction in case the quality of the foreground/background segmentation was very low. However, since the quality of our segmentation was already very high, the addition of these operations did not improve our model much.