Install the libraries
Before we get into the implementation, the first step is to install the libraries. In this case, we will install the OpenCV and the Mediapipe library by using pip. On your terminal, please write this command:
pip install opencv-python
pip install mediapipe
Load the libraries
After we install the libraries, the next step is to load the libraries into our code. We will import the NumPy, OpenCV, and Mediapipe libraries. Please add this line of code:
After we load the libraries, the next step is to initialize several objects. There are two objects that we initialize. They are:
- The FaceMesh object from the Mediapipe library. This object will detect faces and also detect keypoints from one or more faces.
- The VideoCapture object from the OpenCV library. This object will be used for retrieving images from the webcam. We set a parameter on the object with 0 for retrieving images from the webcam.
Please add these lines of code:
Capture the image
Now we have initialized the objects. The next step is to capture the image from the webcam. Please add this line of code for doing that:
Process the image
After we capture the image, the next step is to process the image. For your information, the OpenCV and the Mediapipe library read their image differently.
On the OpenCV library, the image is in BGR color space. Meanwhile, the mediapipe library needs an image with RGB color space.
Therefore, we need to convert the color space to RGB first, apply face landmark detection, then convert it back to BGR color space.
Please add these lines of code (Be careful with the indentation):
Retrieve the 2D and the 3D coordinates
After we process the image, the next step is to retrieve the keypoint coordinates. For your information, the mediapipe’s face landmark detection algorithm catches around 468 keypoints from a face. Each keypoint is on 3D coordinates.
For head pose estimation, we don’t have to use all the keypoints. Instead, we choose 6 points that at least can represent a face. Those points are on the edge of the eyes, the nose, the chin, and the edge of the mouth.
For accessing those points, we refer to the index that has been used on the BlazeFace model. I’ve already marked the index. Here is the picture of it:
Now let’s extract those keypoints. For 2D coordinates, we will take only the x and y-axis coordinates. And for 3D coordinates, we retrieve all of…
Continue reading: https://towardsdatascience.com/head-pose-estimation-using-python-d165d3541600?source=rss—-7f60cf5620c9—4