Using Multiple Cues For Hand Tracking And Model Refinement2
1 of 8
Using Multiple Cues For Hand Tracking And Model Refinement2
Featured
Hybrid Volume And Polygon Rendering With Cube Hardware By Kevin Kreeger
California s Water
Doubles Two
SAP System Administration 2
Microeconomics Externalities
Bernie s Student Cafeteria 1
Electromagnetism
Arhtur D little approach
Optoelectronic Multi-chip Module Demonstrator System By Jason D.bakos
cost of making the inefficient units more efficient
Quality definition
Summary The Great War WWI
Problem Solving I
Area, Power, and Pin Efficient Bus Transceiver Using Multi-Bit-Differential Signaling
On the Testing Maturity of Software Producing Organizations
TRACING BUSINESS AND ENVIRONMENTAL EFFECTS OF ENVIRONMENTAL MANAGEMENT SYSTEMS A STUDY OF NETWORKING SMALL AND MEDIUM SIZED ENTERPRISES USING A JOINT ENVIRONMENTAL MANAGEMENT SYSTEM
Hypothesis Testing Review
DATA MINING Introductory and Advanced Topics 3
SPSS basics 1
The Great Plains
Using Multiple Cues For Hand Tracking And Model Refinement2 - Transcript
Using Multiple Cues for Hand Tracking and Model Re nement
Shan Lu Dimitris Metaxas CS Department Rutgers University Piscataway NJ 08854 shanlu dnm cs rutgers edu Dimitris Samaras CS Department S U N Y Stony Brook Stony Brook NY 11794 samaras cs sunysb edu John Oliensis CS Department Stevens Institute of Technology Hoboken NJ 07030 oliensis cs stevens tech edu
Abstract
We present a model based approach to the integration of multiple cues for tracking high degree of freedom articulated motions and model re nement We then apply it to the problem of hand tracking using a single camera sequence Hand tracking is particularly challenging because of occlusions shading variations and the high dimensionality of the motion The novelty of our approach is in the combination of multiple sources of information which come from edges optical ow and shading information in order to re ne the model during tracking We rst use a previously formulated generalized version of the gradient based optical ow constraint that includes shading ow i e the variation of the shading of the object as it rotates with respect to the light source Using this model we track its complex articulated motion in the presence of shading changes We use a forward recursive dynamic model to track the motion in response to data derived 3D forces applied to the model However due to inaccurate initial shape the generalized optical ow constraint is violated In this paper we use the error in the generalized optical ow equation to compute generalized forces that correct the model shape at each step The effectiveness of our approach is demonstrated with experiments on a number of different hand motions with shading changes rotations and occlusions of signi cant parts of the hand
1 Introduction
In this paper we present a model based approach to high degree of freedom articulated motion tracking and model re nement based on the integration of visual cues and apply it to the problem of hand tracking using a single camera sequence Hand tracking has received signi cant attention in the last few years because of its crucial role in the design of new human computer interaction methods gesture analysis and sign language understanding Glove based devices
capture human hand motion directly but are expensive and hard to use Vision based hand tracking is a cost effective non invasive alternative Serious challenges lie in the high number of degrees of freedom and the problem of occlusions Two general approaches have been suggested for this problem Model based approaches try to estimate the position of a hand by projecting a 3 D hand model to image space and comparing it with image features ngertips 25 24 28 line segments 25 Spline and quadricsbased hand shape models were used in 23 27 to minimize differences between the silhouette of the projected model and the data Others 31 25 have used stereo to avoid occlusions Appearance based approaches estimate hand postures directly from the images after learning the mapping from image feature space to hand con guration space 30 29 Such systems are more useful for recognizing discrete hand states than for general purpose hand tracking The study of motion and shading together has been formalized 20 22 recently and extended to multiple views 21 Our approach is model based and hence can work with a single view Our rst contribution is in the combination of cue forces from edges optical ow and shading In particular we introduce in deformable model theory a generalized version of the gradient based optical ow constraint that includes shading ow i e the variation of the shading of the object as it rotates with respect to the light source This constraint uni es the shading and the optical ow constraints and degenerates to each one of them when the other is not present Although optical ow and edges in deformable models have been used in the past 18 as well as shading 17 these two methods were applied to different problem domains moving and static objects respectively In this paper we combine them to correct for the errors due to the brightness constancy assumption We use cue information from the entirety of the hand and we are able to track its complex articulated motion in the presence of shading changes Given the model based formulation we augment the optical ow constraint with shading information
The hand can have as many as 26 degrees of freedom when we model it as a multiple open chain structure The dynamic kinematic problem of such a large system which contains not only open chains but also closed chains can be modeled as a sub problem of robotic mechanisms There are many forward and inverse dynamics simulation techniques for human and robotic motion 14 15 16 10 13 Using such a formulation we limit the allowable motion of the ngers with the use of recursive dynamics constraints The model s driving forces are computed from image cues such as edges optical ow and shading In our formulation we compute 2D data based forces from edge optical ow and shading cue constraints The perspective camera model is used to convert these 2D forces into 3D forces that drive the hand model These 3D forces are then used to calculate the acceleration of our dynamic hand its new velocity and new position Since this is a second order dynamic hand model we use it to predict nger motion from one frame to the next so that we are closer to the data in the next frame To avoid unnecessary calculations of the shading constraint we monitor the intensity changes in several hand areas during tracking and use it only if these changes are signi cant Since we are using a deformable model framework we can take advantage of the error in the combined generalized ow and edge constraints to improve the shape of the hand model we use The second contribution is that we further generalize our method by introducing at each integration step a model shape re nement process based on the error from the cue constraints Since the shape of the hand and ngers is made of articulated piecewise rigid parts we employ this shape correction step in the rst few frames to improve the hand shape The use of cue error for shape correction has already been employed 19 7 26 for model shape correction However the previous methods used a limited number of cues Our paper is organized as follows The dynamic hand model is described in Sec 2 Sec 3 presents model initialization and generation of image forces Sec 4 introduces illumination information on the optical ow constraint Sec 5 presents the recursive dynamics formulation of the hand model and the constraints on the allowable motion Sec 6 formulates the use of residual error during tracking for model shape correction Tracking experiments are shown in Sec 7 including complex palm nger tracking with signi cant rotation and model shape correction
connected by two one degree of freedom revolute joints The nger parts are modeled as cylinders and the palm is modeled as a six rectangle side solid A two degree of freedom revolute joint can be simpli ed as two one degree of freedom revolute joints connected by a zero length and zero mass link dummy link 4 In the hand model there are 21 links including 5 dummy links and 20 one degree of freedom revolute joints We number the palm base link as link 0 For each nger there are 4 links including one dummy link and 4 joints The joint connecting the nger to the palm is joint 1 and link 1 connects joint 1 and joint 2 Link 1 is the dummy link Joint i connects link i 1 and link i link i links joint i and joint i 1 Each link has a local coordinate frame xed to its starting end The above geometric model is based on the measurements of an average male The user speci es approximately the joint locations in the image to initialize the model Using our proposed method we then correct this basic shape during tracking to t the data
3 Image Based Cues
3 1 Fitting the 3 D Model to 2 D images
This approach needs a geometric 3 D model to transform 2 D forces into 3 D ones which will be applied on the dynamic model Initially the model is tted to a known pose of the hand as can be seen in Figure 1 b At this stage of the work we assume knowledge of the camera parameters At each frame visibility checking is performed in order to match correctly image and model points The computation of the relative motion to the palm of occluded ngers is based on the rigid motion of the hand When the relative motion is not too large we pick up the nger edges when they reappear This method will fail when the ngers undergo signi cant relative motions when occluded In order to track them successfully in that case other methods should be integrated in the framework such as appearance based methods which is outside the scope of this paper
3 2 Force Calculation for Dynamic Model
The 3 D nger motion is recovered by tting the model to image derived data The external forces are applied on the dynamic model then the rotation and translation of nger joints are calculated Figure 1 c shows two kinds of typical nger motion We obtain the forces by calculating displacements using the following procedure Extract the nger edges using the Canny edge operator A curvature nding operator 6 is used to nd the base points such as Bi Bj shown in Figure 1 d for the ngers between the end ngers little nger and thumb
2 Hand Model
In our forward dynamics formulation the hand model Fig 1 a consists of a base link palm and ve linkchains ngers connected to the base link through ve twodegree of freedom revolute joints Each nger is three links
a
b
c
d
e
Figure 1 a Dynamic Model of Hand b Initial posture of hand model c Finger motion and force from edge displacement d nger segmentation and base points e Representing the projection of the model s articulated segments by their medial axis thick white line
For the end ngers we use symmetry to nd the beginning of the nger on the outer side B0 where curvature can not be used The begnning is the same as on the inner side of nger where the base point B1 is de ned based on curvature The edges between Bi and Bj correspond to the nger segment The edge points of sub segments can be derived from the corresponding 3 D points in the 3 D model during tracking Because the hand motion will result to the change of base point position between the current and afterframe a normalization process is necessary to match the base points in current and after frame according to the distance of two base points and the length of nger segment Let pk i and pk 1 i corresponding edge points in k th frame and k 1 th frame The 2 D force fedge from edge displacement can be calculated by the equation fedge i pk 1 i pk i 1
3 3 Force transformation from 2 D to 3 D
We assume a perspective projection model Therefore the point x x y z in the world coordinate system and the point xc xc yc zc T in the camera coordinate system ensure the following equation x R c xc T c 3 where Tc and Rc are translation and rotation matrices In the deformable model formulation presented in 8 by taking the time derivatives of the perspective projection equation with an image point xp they get xp Hxc H R 1 x with c
2 f zc 0 xc zc f 4 2 0 f zc yc zc f The focal length f is obtained by pre calibration of the camera According to deformable model theory these 3D forces are converted to generalized forces fq J f3d on the model parameters q with J x x y z q the Ja cobian of the model points by q fq Consequently the generalized forces calculated from 2 D images will be fq Jp J f2d with Jp HR 1 the Jacobian of the c model points under perspective projection To apply the external forces on the dynamic model we transform the individual forces obtained from edges and the optical ow within every hand segment into one total force and torque to be used in the recursive dynamic framework The total force and torque for each hand segment are n n F i 1 fi i 1 ri fi respectively fi and ri are the individual force vectors and force position vectors respectively
H
Another force fopt can be directly derived from the optical ow of the image In the optical ow equation Ix u Iy v ft 0 2
the temporal differential e u v at position x y will be considered as the external force The optical ow of hand motion is computed by the Lucas Kanade method 9 Optical ow near nger edges is not as reliable due to possible mismatches of edge points so we will only consider the optical ow of the inside area of the nger segment obtained from the projection of the 3 D model in the image plane For optical ow computation we select points with signi cant gradient magnitude only In Fig 2 we see the edge forces and the optical ow forces applied to different regions of the image
4 Extending the Optical Flow Constraint
In 17 a methodology was developed for the incorporation of illumination constraints any type that is differentiable w r t the model parameters in a deformable model
200 Top light Side Light
Average Gray Level 30x50 area
180
160
140
120
100
80 1
2
3
4
5
6
7
Frame number
a
b
c
d
Figure 2 Forces applied to the hand model and the effects of shading a Edge forces b optical ow forces in the interior of the model d is the change in average intensity in a small smooth area of the hand depicted in c when the illumination comes from the top blue line and from the side green dashed line respectively formulation In that work the tting of the model was done this paper in model based optical ow motion eld vectors based on a static image i e that data did not change durare vectors of velocities of model points and hence x Jq ing the tting process Hence any partial derivatives with applies Typically in the literature 11 this optical ow term respect to time in the illumination constraint C were zero is set to 0 This is correct in the case of ambient only illuIn this paper we generalize this constraint formulation to mination For the case of light sources at in nity it is also include image motion Instead of one image the tting procorrect for pure translational motion For the simplest case cess will be guided by a sequence of moving images of a Lambertian surface with a light source at in nity it can We will start by taking the re ectance equation Let us be shown 12 that if is the angular velocity of the rotaassume that we have a re ectance function of the general tional motion n is the normal of surface l is the light source form IL L lp q where IL is the observed image indirection and is the albedo of the surface the magnitude tensity lp are the lighting model parameters both light of the error Dv between the true motion eld and the apsource parameters and shape re ectance properties such as parent and computable optical ow is the surface albedo of a Lambertian model and q are the l n hand model parameters L lp q can be differentiated with Dv 8 E respect to the model parameters q This means that the re ectance of the surface is locally computable and that there This error is small when the change of gradient E is big are no global illumination effects We also assume that the but in the case of smooth surfaces this effect becomes much illumination parameters do not change with time The con L lp q 0 since normals more pronounced Similarly t straint equation is C IL L lp q and we differentiate change based only on the model parameters q it w r t time and apply Baumgarte stabilization 3 in order This means that when there is no motion the constraint to obtain equation simpli es to the shading constraint Therefore 5 C q t C Cq q Ct C 0 IL L lp q IL q q a L lp q IL 0 9 q q t In this case we cannot ignore the partial derivatives Ct w r t time Therefore using the above formulas we expand Equaencompasses both constraints In the case of a smooth movtion 5 to ing object 9 allows to deal with errors due to directed illu L lp q IL IL L lp q a IL L lp q 0 q q mination and offers the possibility of recovering the motion q q t t of relatively smoothly shaded surfaces Fig 2 c d shows 6 the change in average intensity in a small smooth area of the hand when the illumination comes from the top and from We notice that if J is the Jacobian of the model points and the side respectively In the second case changes in the inJp is the Jacobian of the model points under perspective tensity of the points are dramatic projection as described in Sec 3 then IL IL q q t I L Jp Jq IL t 7
5 Dynamic Tracking of Hand Motion
In our methodology we estimate the hand motion in response to the applied 3D forces on the hand as a Forward
is the left hand side of the model based optical ow constraint equation that was presented in 18 As explained in
Dynamics problem where given the external forces we want to compute the velocity and position of the palm Since we use a recursive dynamic formulation we will use Featherstone s 2 spatial notation to model our kinematic and dynamic variables We integrate the constraint of Eq 9 in the above formulation to determine the vector q of the model s degrees of freedom which includes the joint variables global rotation and translation based on the method of Lagrange mulipliers In addition we also use generalized edge forces computed from image based edge forces as outlined in Sections 3 2 and 3 3 In summary we solve for q fq 10 subject to Equation 9 and fedges 11 where fq are generalized forces computed from the two costraints above Furthermore human ngers are not ideal dynamic links their joints have upper and lower bounds Therefore we need to solve the above dynamic equations under joint limit constraints These joint limits which constrain the relative motion of ngers together with our dynamic formulation which does not allow the inter penetration of ngers make hand tracking signi cantly more robust Our method has the following steps 1 At time t mark the joints that reach their joint limits 2 Solve the dynamic equations of the hand i e solve for q at time t dt recursively using 10 3 For each nger starting at joint 1 the joint that connects the palm and the nger mark the rst joint that keeps at its joint limit during the time period from t to t dt If there is no such joint go to step 6 4 Fix the joints marked at step 3 and merge two links connected by a xed joint to one link Update the dynamic hand model 5 Go back to step 2 6 Output the status of the dynamic model of the hand at time t dt Increase time t t dt and go to step 1
the change of its shape based on the residual error from the cues i e the generalized optical ow and the edges Our assumption here is that the residual error is primarily caused by errors in the initial hand shape This assumption is justi ed since we do not allow any abrupt changes in illumination during tracking Based on the above our algorithm is as follows at a time step i 1 Compute the rst component of the residual error at time step i from the generalized optical ow constraint equation 9 In this equation the error is computed based on the estimated motion of the model at time step i and the difference between the projected estimated model s intensities and the image data at time step i 2 Compute the second component of the residual error at time step i from the edge difference between the projected estimated model location at step i and the image data at time step i This computation is described in Section 3 2 3 Add the two components of the residual error and use them as a constraint to modify the shape of the model in a fashion similar to the one described in the previous section The difference is that instead of modifying the motion of the hand model we modify its shape For the purposes of this paper we allowed non isotropic scaling deformations and we found them suf cient as is shown in our results section The shape correction step is done at every step i right after the estimation of the model s motion parameters However in our experiments approximately 10 20 frames after the initial step we stop correcting the model shape Since the hand model is comprised of articulated rigid parts 10 20 frames are usually suf cient to correct for the shape of the hand
7 Experiments
We have performed experiments to test our method with a variety of hand motions All our experiments run on a P4 1 0GHz processor at approximately 4 frames per second The experimental results are shown in Fig 3 and Fig 4 respectively The rst experiment Fig 3 demonstrates the process of tting the model to the image by simultaneous motion tracking and model shape change From the rotational motion of whole hand the width and the length of palm and the ngers are modi ed to t the image the rst and second rows in Fig 3 In the third and fourth rows in Fig 3 individual ngers bend towards the direction of the thumb The thickness of the ngers also changes to t the images At the last frames in the sequence the fth row in Fig 3 the shape of
6 Model Shape Correction from Cue Residual Error
Based on the above methodology we estimate the model s rotation and translation parameters q at each time step In our presentation of the approach we have so far assumed that the model shape is known We now relax this assumption and extend our method to allow the estimation of the model shape This extention will allow us to start with a rough approximation of the model shape which we will re ne over time The approach is as follows Assuming that the initial hand model is not accurate we will compute
Figure 3 Projected wire frame images from a sequence tracking hand rotation and nger movements The shape of the palm and the ngers are being dynamically modi ed from the initial shape during tracking The wire frames of the model projected on the original images show that the modi ed shape ts very well the image data
hand model does not change anymore and the whole hand rotates back to the original position The second experiment depicted in Fig 4 used the tted model shown in Fig 3 The sequence has been taken with the same camera position as in Fig 3 From the rst to the third row the ringer nger and little nger bended largely away from camera Finally the little nger has been occluded completely Then the whole hand rotates at about 90 degrees To show the accuracy of the tracking we project the wire frames of the hand model back onto the image The full sequences and the tracking results are available as movie les and can be found at http www dcis rutgers edu shanlu hand for the above two experiments
The above experiments demonstrate the successful tracking of complex hand and nger motions involving large rotations and large relative nger motions The dynamic estimation of the hand shape model signi cantly improved the tracking accuracy and robustness
8 Conclusions
In this paper we have augmented traditional optical ow and replaced it with a more general equation that includes shading information We then further extended this formulation to use the cue residual error to correct for the model s shape We have used this formulation within a deformable model framework and we were able to track dif cult hand
motions under a variety of illumination conditions Our dynamic hand model formulation allows the integration of multiple cues and for robustness we also use edges in our tracking We have shown tracking results for simple and complex palm and nger motions Future work includes better occlusion recovery handling using Kalman Filtering and the incorporation of other sources of visual information such as color in order to work on cluttered backgrounds
References
1 G Engeln Mullges F Uhlig Numerical Algorithms with C Springer 1996 2 R Featherstone Robot Dynamics Algorithm Kluwer Academic Boston 1987 3 J Baumgarte Stabilization of constraints and integrals of motion in dynamical systems Computer Methods in Applied Mechanics and Engineering 1 1 16 1972 4 G Huang D Metaxas and J Lo Human Motion Planning Based on Recursive Dynamics and Optimal Control Techniques Computer Graphics International 2000 pp 19 28 5 J Lo and D Metaxas Recursive Dynamics and Optimal Control Techniques for Human Motion Planning CA 99 Geneva Switzerland May 26 29 1999 6 S B Kang and K Ikeuchi Toward Automatic Instruction from Perception Recognizing a Grasp from Observation IEEE Trans of Robotics and Automation pp 432 443 Aug 1993 7 R Koch Dynamic 3 D Analysis through Synthesis Feedback Control IEEE PAMI 15 6 pp 556 568 June 1993 8 D N Metaxas Physics Based Deformable Models Applications to Computer Vision Graphics and Medical Imaging Kluwer Academic Publishers 1997 9 B Lucas and T Kanade An Iterative Technique of Image Registration and Its Application to Stereo Proc 7th IJCAI pp 674 679 August 1981 10 J Angeles and O Ma Dynamic Simulation of n Axis Serial Robotic Manipulators Using a Natural Orthogonal Complement The International Journal of Robotics Research 7 5 32 47 October 1988 11 B K P Horn Robot Vision 1986 12 A Verri and T A Poggio Motion eld and optical ow Qualitative properties PAMI 11 5 490 498 May 1989 13 H Brandl R Johanni and M Otter An Algorithm for the Simulation of Multibody Systems with Kinematic Loops IFToMM Seventh World Congress on the Theory of Machines and Mechanisms Sep 1987 14 J K Hodgins W L Wooten D C Brogan and J F O Brien Animation of Human Athletics SIGGRAPH 95
15 A J Stewart and J F Cremer Beyond keyframing An algorithmic approach to animation Graphics Interface 1992 16 J Wilhelms and B Barsky Using Dynamic Analysis to Animate Articulated Bodies such as Humans and Robots In Graphics Interface 1985 17 D Samaras and D Metaxas Incorporating Illumination Constraints in Deformable Models CVPR 1998 pp 322 329 18 D DeCarlo and D Metaxas Optical Flow Constraints on Deformable Models with Applications to Face Tracking IJCV July 2000 38 2 pp 99 127 19 D DeCarlo and D Metaxas Adjusting Shape Parameters Using Model Based Optical Flow Residuals IEEE PAMI 24 6 pp 814 823 June 2002 20 Negahdaripour S Revised De nition of Optical Flow Integration of Radiometric and Geometric Cues for Dynamic Scene Analysis PAMI 20 No 9 Sep 1998 pp 961 979 21 Carceroni R L Kutulakos K N Multi View Scene Capture by Surfel Sampling From Video Streams to Non Rigid 3D Motion Shape and Re ectance ICCV01 II 60 67 22 Haussecker H and D J Fleet 2000 Computing optical ow with physical models of brightness variation PAMI 23 No 6 pp 661 673 2001 23 J J Kuch and T S Huang Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration In ICCV95 pg 666 671 1995 24 J Lee and T Kunii Model based analysis of hand posture IEEE CGA 15 77 86 Sept 1995 25 J Rehg and T Kanade Model based tracking of selfoccluding articulated objects In ICCV 95 pg 612 617 26 Shimada N Shirai Y Kuno Y and Miura J Hand Gesture Estimation and Model Re nement Using Monocular Camera Ambiguity Limitation by Inequality Constraints Procs Automatic Face and Gesture Recognition Workshop pp 268 273 1998 27 B Stenger P R Mendonca and R Cipolla Modelbased Hand Tracking Using an Uncented Kalman FIlter Procs British Machine Vision Conference Vol 1 pp 63 72 Sept 2001 28 Ying Wu and T S Huang Capturing articulated human hand motion A divide and conquer approach In ICCV 99 pg 606 611 Corfu Greece Sept 1999 29 Ying Wu Lin J Y Huang T S Capturing natural hand articulation ICCV 01 II 426 432 30 Rosales R Athitsos V Sigal L and Sclaroff S 3D Hand Pose Reconstruction Using Specialized Mappings ICCV01 31 Q Delamarre and O Faugeras Finding pose of hand in video images a stereo based approach AFGR 98
Figure 4 25 Frames from a sequence tracking exing of ngers and hand rotation After the initial shape estimation as shown in Figure 3 the movements of the hand and the ngers have been tracked accurately The tracking results are shown by projecting the wire frame of the hand model on the original images












