Learning the Fundamentals of Practical AI Development and Facial Recognition with 'InsightFace' Library (2)

INDEX

Recap of the Previous Session
Analyzing the FaceAnalysis Class
Analyzing RetinaFace (with face detection code)
Checking the LandMark model
Check Attribute model
Analyzing the ArchFaceONNX model
Next Session

Recap of the Previous Session

In the last session, we quickly implemented face authentication using the FaceAnalysis class sample code. It was astonishing how easily we could perform face authentication. We also opened the python implementation code page from the InsightFace library's GitHub page and checked the code of the FaceAnalysis class.

InsightFace Official Repository

The relevant page of the InsightFace official repository is here

Tips

The same python implementation code is copied to your python environment when you install the insightface library via pip. It can be cumbersome to follow the code on GitHub, so open the corresponding directory in a convenient code editor like VSCode on your PC to easily trace the code.

We also confirmed that files with the onnx extension are automatically downloaded when using the sample code. The downloaded files were the following five.

Downloaded onnx files

1k3d69.onnx
2d106det.onnx
det_10g.onnx
genderage.onnx
w600k_r50.onnx

These onnx files are the 'AI models' used internally by InsightFace. In this session, we will analyze these AI models. ONNX (Open Neural Network Exchange) files are an open-source format for transferring machine learning models between different frameworks (such as TensorFlow, Pytorch, MxNet). ONNX files store the model's architecture and weights.

Analyzing the FaceAnalysis Class

Let’s look at the code of the FaceAnalysis class. This class is used for face authentication, but what does it contain? The following code is the constructor of the FaceAnalysis class, called when creating an instance of the class. AI models are downloaded inside the ensure_available() method (found on line 6). You can check the code of ensure_available() by ctrl+clicking it in VSCode.

face_analysis.py

1
2class FaceAnalysis:
3    def __init__(self, name=DEFAULT_MP_NAME, root='~/.insightface', allowed_modules=None, **kwargs):
4        onnxruntime.set_default_logger_severity(3)
5        self.models = {}
6        self.model_dir = ensure_available('models', name, root=root)
7        onnx_files = glob.glob(osp.join(self.model_dir, '*.onnx'))
8        onnx_files = sorted(onnx_files)
9        for onnx_file in onnx_files:
10            model = model_zoo.get_model(onnx_file, **kwargs)
11            if model is None:
12                print('model not recognized:', onnx_file)
13            elif allowed_modules is not None and model.taskname not in allowed_modules:
14                print('model ignore:', onnx_file, model.taskname)
15                del model
16            elif model.taskname not in self.models and (allowed_modules is None or model.taskname in allowed_modules):
17                print('find model:', onnx_file, model.taskname, model.input_shape, model.input_mean, model.input_std)
18                self.models[model.taskname] = model
19            else:
20                print('duplicated model task type, ignore:', onnx_file, model.taskname)
21                del model
22        assert 'detection' in self.models
23        self.det_model = self.models['detection']
24
25    def prepare(self, ctx_id, det_thresh=0.5, det_size=(640, 640)):
26        self.det_thresh = det_thresh
27        assert det_size is not None
28        print('set det-size:', det_size)
29        self.det_size = det_size
30        for taskname, model in self.models.items():
31            if taskname == 'detection':
32                model.prepare(ctx_id, input_size=det_size, det_thresh=det_thresh)
33            else:
34                model.prepare(ctx_id)
35    // omitted below
36

Part Where ONNX Files Are Loaded

If you read the above code in detail, you will notice that the part where the downloaded ONNX files are loaded is done in the loop process from line 9. It seems like you can get an overview of the model by looking at model.taskname! The onnx files are loaded into a variable called onnx_files and passed to model_zoo.get_model(). If you look at model_zoo.get_model(), you will understand more details, so let’s take a look at it immediately. You can also open model_zoo.py by ctrl+clicking the model.get_model() part in VSCode.

Relevant Part of model_zoo.py

model_zoo.py

1
2def get_model(name, **kwargs):
3    root = kwargs.get('root', '~/.insightface')
4    root = os.path.expanduser(root)
5    model_root = osp.join(root, 'models')
6    allow_download = kwargs.get('download', False)
7    download_zip = kwargs.get('download_zip', False)
8    if not name.endswith('.onnx'):
9        model_dir = os.path.join(model_root, name)
10        model_file = find_onnx_file(model_dir)
11        if model_file is None:
12            return None
13    else:
14        model_file = name
15    if not osp.exists(model_file) and allow_download:
16        model_file = download_onnx('models', model_file, root=root, download_zip=download_zip)
17    assert osp.exists(model_file), 'model_file %s should exist'%model_file
18    assert osp.isfile(model_file), 'model_file %s should be a file'%model_file
19    router = ModelRouter(model_file)
20    providers = kwargs.get('providers', get_default_providers())
21    provider_options = kwargs.get('provider_options', get_default_provider_options())
22    model = router.get_model(providers=providers, provider_options=provider_options)
23    return model
24

Let’s look at the relevant part of model_zoo.py. Inside get_model(), you can see that the onnx file is being loaded. Also, the contents of the onnx file are loaded and stored in a class called ModelRouter and returned.

Relevant Part of the ModelRouter Class

model_zoo.py

1
2class ModelRouter:
3    def __init__(self, onnx_file):
4        self.onnx_file = onnx_file
5
6    def get_model(self, **kwargs):
7        session = PickableInferenceSession(self.onnx_file, **kwargs)
8        print(f'Applied providers: {session._providers}, with options: {session._provider_options}')
9        inputs = session.get_inputs()
10        input_cfg = inputs[0]
11        input_shape = input_cfg.shape
12        outputs = session.get_outputs()
13
14        if len(outputs)>=5:
15            return RetinaFace(model_file=self.onnx_file, session=session)
16        elif input_shape[2]==192 and input_shape[3]==192:
17            return Landmark(model_file=self.onnx_file, session=session)
18        elif input_shape[2]==96 and input_shape[3]==96:
19            return Attribute(model_file=self.onnx_file, session=session)
20        elif len(inputs)==2 and input_shape[2]==128 and input_shape[3]==128:
21            return INSwapper(model_file=self.onnx_file, session=session)
22        elif input_shape[2]==input_shape[3] and input_shape[2]>=112 and input_shape[2]%16==0:
23            return ArcFaceONNX(model_file=self.onnx_file, session=session)
24        else:
25            #raise RuntimeError('error on model routing')
26            return None
27

Finally, we have arrived at this code. Please look at the code from line 14 onwards. You can see that InsightFace classifies AI models into five types. If you trace the code carefully, you will see that each corresponds to the downloaded ONNX files, and depending on the type, the appropriate AI model is returned.

Let's summarize the roles of each model class.

Roles of Each Model Class

1k3d69.onnx ➡ Face Detection AI
2d106det.onnx ➡ Facial Landmark Detection AI
det_10g.onnx ➡ Facial Attribute Detection AI
genderage.onnx ➡ Face Swapping AI
w600k_r50.onnx ➡ Face Authentication AI

Dependencies between these models

These models are used for inference in the FaceAnalaysis class get() method in face_analysis.py. You can see the dependencies between each of them here. If you read the code, you will see that only face detection (face detection) is dependent on the others. This means that the face detection AI first detects the faces in the given image, and then the face landmark detection AI, face attribute detection AI, face authentication AI, etc. are working in parallel on the detected faces. The code is the following part.

face_analysis.py

1
2        def get(self, img, max_num=0):
3            bboxes, kpss = self.det_model.detect(img,
4                                                max_num=max_num,
5                                                metric='default')
6            if bboxes.shape[0] == 0:
7                return []
8            ret = []
9            for i in range(bboxes.shape[0]):
10                bbox = bboxes[i, 0:4]
11                det_score = bboxes[i, 4]
12                kps = None
13                if kpss is not None:
14                    kps = kpss[i]
15                face = Face(bbox=bbox, kps=kps, det_score=det_score)
16                for taskname, model in self.models.items():
17                    if taskname == 'detection':
18                        continue
19                    model.get(img, face)
20                ret.append(face)
21            return ret
22

Now, I would like to take a quick look at each model. The code for each model can be followed from the get_model method in the ModelRouter class in model_zoo.py that we have already reviewed above.

Analyzing RetinaFace (with face detection code)

Let's start by looking at face detection, which is the first step in face recognition and other face analysis processes. Face detection is implemented as a class called RetinaFace with 1k3d69.onnx loaded, the RetinaFace code is here.

retinaface.py

1
2class RetinaFace:
3    def __init__(self, model_file=None, session=None):
4        import onnxruntime
5        self.model_file = model_file
6        self.session = session
7        self.taskname = 'detection'
8        if self.session is None:
9            assert self.model_file is not None
10            assert osp.exists(self.model_file)
11            self.session = onnxruntime.InferenceSession(self.model_file, None)
12        self.center_cache = {}
13        self.nms_thresh = 0.4
14        self.det_thresh = 0.5
15        self._init_vars()
16
17    def _init_vars(self):
18        input_cfg = self.session.get_inputs()[0]
19        input_shape = input_cfg.shape
20        # print(input_shape)
21        if isinstance(input_shape[2], str):
22            self.input_size = None
23        else:
24            self.input_size = tuple(input_shape[2:4][::-1])
25        #print('image_size:', self.image_size)
26        input_name = input_cfg.name
27        self.input_shape = input_shape
28        outputs = self.session.get_outputs()
29        output_names = []
30        for o in outputs:
31            output_names.append(o.name)
32        self.input_name = input_name
33        self.output_names = output_names
34        self.input_mean = 127.5
35        self.input_std = 128.0
36        # print(self.output_names)
37        #assert len(outputs)==10 or len(outputs)==15
38        self.use_kps = False
39        self._anchor_ratio = 1.0
40        self._num_anchors = 1
41        if len(outputs) == 6:
42            self.fmc = 3
43            self._feat_stride_fpn = [8, 16, 32]
44            self._num_anchors = 2
45        elif len(outputs) == 9:
46            self.fmc = 3
47            self._feat_stride_fpn = [8, 16, 32]
48            self._num_anchors = 2
49            self.use_kps = True
50        elif len(outputs) == 10:
51            self.fmc = 5
52            self._feat_stride_fpn = [8, 16, 32, 64, 128]
53            self._num_anchors = 1
54        elif len(outputs) == 15:
55            self.fmc = 5
56            self._feat_stride_fpn = [8, 16, 32, 64, 128]
57            self._num_anchors = 1
58            self.use_kps = True
59
60    def prepare(self, ctx_id, **kwargs):
61        if ctx_id < 0:
62            self.session.set_providers(['CPUExecutionProvider'])
63        nms_thresh = kwargs.get('nms_thresh', None)
64        if nms_thresh is not None:
65            self.nms_thresh = nms_thresh
66        det_thresh = kwargs.get('det_thresh', None)
67        if det_thresh is not None:
68            self.det_thresh = det_thresh
69        input_size = kwargs.get('input_size', None)
70        if input_size is not None:
71            if self.input_size is not None:
72                print('warning: det_size is already set in detection model, ignore')
73            else:
74                self.input_size = input_size
75
76    def forward(self, img, threshold):
77        scores_list = []
78        bboxes_list = []
79        kpss_list = []
80        input_size = tuple(img.shape[0:2][::-1])
81        blob = cv2.dnn.blobFromImage(img, 1.0/self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
82        net_outs = self.session.run(self.output_names, {self.input_name: blob})
83
84        input_height = blob.shape[2]
85        input_width = blob.shape[3]
86        fmc = self.fmc
87        for idx, stride in enumerate(self._feat_stride_fpn):
88            scores = net_outs[idx]
89            bbox_preds = net_outs[idx+fmc]
90            bbox_preds = bbox_preds * stride
91            if self.use_kps:
92                kps_preds = net_outs[idx+fmc*2] * stride
93            height = input_height // stride
94            width = input_width // stride
95            K = height * width
96            key = (height, width, stride)
97            if key in self.center_cache:
98                anchor_centers = self.center_cache[key]
99            else:
100                # solution-1, c style:
101                #anchor_centers = np.zeros( (height, width, 2), dtype=np.float32 )
102                # for i in range(height):
103                #    anchor_centers[i, :, 1] = i
104                # for i in range(width):
105                #    anchor_centers[:, i, 0] = i
106
107                # solution-2:
108                #ax = np.arange(width, dtype=np.float32)
109                #ay = np.arange(height, dtype=np.float32)
110                #xv, yv = np.meshgrid(np.arange(width), np.arange(height))
111                #anchor_centers = np.stack([xv, yv], axis=-1).astype(np.float32)
112
113                # solution-3:
114                anchor_centers = np.stack(np.mgrid[:height, :width][::-1], axis=-1).astype(np.float32)
115                # print(anchor_centers.shape)
116
117                anchor_centers = (anchor_centers * stride).reshape((-1, 2))
118                if self._num_anchors > 1:
119                    anchor_centers = np.stack([anchor_centers]*self._num_anchors, axis=1).reshape((-1, 2))
120                if len(self.center_cache) < 100:
121                    self.center_cache[key] = anchor_centers
122
123            pos_inds = np.where(scores >= threshold)[0]
124            bboxes = distance2bbox(anchor_centers, bbox_preds)
125            pos_scores = scores[pos_inds]
126            pos_bboxes = bboxes[pos_inds]
127            scores_list.append(pos_scores)
128            bboxes_list.append(pos_bboxes)
129            if self.use_kps:
130                kpss = distance2kps(anchor_centers, kps_preds)
131                #kpss = kps_preds
132                kpss = kpss.reshape((kpss.shape[0], -1, 2))
133                pos_kpss = kpss[pos_inds]
134                kpss_list.append(pos_kpss)
135        return scores_list, bboxes_list, kpss_list
136
137    def detect(self, img, input_size=None, max_num=0, metric='default'):
138        assert input_size is not None or self.input_size is not None
139        input_size = self.input_size if input_size is None else input_size
140
141        im_ratio = float(img.shape[0]) / img.shape[1]
142        model_ratio = float(input_size[1]) / input_size[0]
143        if im_ratio > model_ratio:
144            new_height = input_size[1]
145            new_width = int(new_height / im_ratio)
146        else:
147            new_width = input_size[0]
148            new_height = int(new_width * im_ratio)
149        det_scale = float(new_height) / img.shape[0]
150        resized_img = cv2.resize(img, (new_width, new_height))
151        det_img = np.zeros((input_size[1], input_size[0], 3), dtype=np.uint8)
152        det_img[:new_height, :new_width, :] = resized_img
153
154        scores_list, bboxes_list, kpss_list = self.forward(det_img, self.det_thresh)
155
156        scores = np.vstack(scores_list)
157        scores_ravel = scores.ravel()
158        order = scores_ravel.argsort()[::-1]
159        bboxes = np.vstack(bboxes_list) / det_scale
160        if self.use_kps:
161            kpss = np.vstack(kpss_list) / det_scale
162        pre_det = np.hstack((bboxes, scores)).astype(np.float32, copy=False)
163        pre_det = pre_det[order, :]
164        keep = self.nms(pre_det)
165        det = pre_det[keep, :]
166        if self.use_kps:
167            kpss = kpss[order, :, :]
168            kpss = kpss[keep, :, :]
169        else:
170            kpss = None
171        if max_num > 0 and det.shape[0] > max_num:
172            area = (det[:, 2] - det[:, 0]) * (det[:, 3] -
173                                              det[:, 1])
174            img_center = img.shape[0] // 2, img.shape[1] // 2
175            offsets = np.vstack([
176                (det[:, 0] + det[:, 2]) / 2 - img_center[1],
177                (det[:, 1] + det[:, 3]) / 2 - img_center[0]
178            ])
179            offset_dist_squared = np.sum(np.power(offsets, 2.0), 0)
180            if metric == 'max':
181                values = area
182            else:
183                values = area - offset_dist_squared * 2.0  # some extra weight on the centering
184            bindex = np.argsort(
185                values)[::-1]  # some extra weight on the centering
186            bindex = bindex[0:max_num]
187            det = det[bindex, :]
188            if kpss is not None:
189                kpss = kpss[bindex, :]
190        return det, kpss
191
192    def nms(self, dets):
193        thresh = self.nms_thresh
194        x1 = dets[:, 0]
195        y1 = dets[:, 1]
196        x2 = dets[:, 2]
197        y2 = dets[:, 3]
198        scores = dets[:, 4]
199
200        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
201        order = scores.argsort()[::-1]
202
203        keep = []
204        while order.size > 0:
205            i = order[0]
206            keep.append(i)
207            xx1 = np.maximum(x1[i], x1[order[1:]])
208            yy1 = np.maximum(y1[i], y1[order[1:]])
209            xx2 = np.minimum(x2[i], x2[order[1:]])
210            yy2 = np.minimum(y2[i], y2[order[1:]])
211
212            w = np.maximum(0.0, xx2 - xx1 + 1)
213            h = np.maximum(0.0, yy2 - yy1 + 1)
214            inter = w * h
215            ovr = inter / (areas[i] + areas[order[1:]] - inter)
216

This is a rather difficult code, RetinaFace detects face area rectangle and keypoints, RetinaFace uses something like ResNet as a backbone and SSD extension for the head. It is quite difficult to create these on your own, so let's use the face detection as it is in this case. If you are interested in how object detection works, I suggest you study how SSD works first.

Let's try to detect faces in an image using RetinaFace.

Since we're here, let's create a quick code that only uses RetinaFace for face detection. If you are comfortable with this, please do not look at the answers below and try to make your own first.

face detection code

1
2import cv2
3import numpy as np
4import insightface
5
6# load detection model
7detector = insightface.model_zoo.get_model("models/buffalo_l/det_10g.onnx") # 任意のパスに変更してください
8detector.prepare(ctx_id=-1, input_size=(640, 640))
9
10# 入力画像を準備
11rgb_img = cv2.cvtColor(cv2.imread("data/images/path_to_image/some_face_1.jpg"), cv2.COLOR_BGR2RGB) # 任意のパスに変更してください
12
13# 検出
14bboxes, kpss = detector.detect(rgb_img)
15

Oh no, thank you library. I was able to do face detection with such ease. RetinaFace also detects not only the bounding box of the face part, but also the key points of the eyes, nose, and mouth at the same time. This is useful. After implementing the above code, you can output the bboxes and kpss respectively and see what's inside.

Checking the LandMark model

The LandMark model detects not only key points on a face, such as eyes, nose, and mouth, but also more detailed facial landmarks. Facial landmarks are useful, for example, when performing image processing to process the face or add effects. We will not use this LandMark model for this face recognition, so we will not touch the internal code. The code can be followed from the ModelRouter class in model_zoo.py, in a file called landmark.py, so if you are interested, please use it.

Check Attribute model

The Attribute model infers attributes such as gender and age of detected faces. The Attribute model is also not used in this face recognition, so we will not touch the internal code. The code can be followed from the ModelRouter class in model_zoo.py, in a file called attribute.py, so if you are interested in analyzing visitors, you can use it.

Analyzing the ArchFaceONNX model

Finally, let's take a look at the ArchFaceONNX facial recognition model. Here is the code for ArchFaceONNX.

arcface_onnx.py

1
2class ArcFaceONNX:
3    def __init__(self, model_file=None, session=None):
4        assert model_file is not None
5        self.model_file = model_file
6        self.session = session
7        self.taskname = 'recognition'
8        find_sub = False
9        find_mul = False
10        model = onnx.load(self.model_file)
11        graph = model.graph
12        for nid, node in enumerate(graph.node[:8]):
13            #print(nid, node.name)
14            if node.name.startswith('Sub') or node.name.startswith('_minus'):
15                find_sub = True
16            if node.name.startswith('Mul') or node.name.startswith('_mul'):
17                find_mul = True
18        if find_sub and find_mul:
19            #mxnet arcface model
20            input_mean = 0.0
21            input_std = 1.0
22        else:
23            input_mean = 127.5
24            input_std = 127.5
25        self.input_mean = input_mean
26        self.input_std = input_std
27        #print('input mean and std:', self.input_mean, self.input_std)
28        if self.session is None:
29            self.session = onnxruntime.InferenceSession(self.model_file, None)
30        input_cfg = self.session.get_inputs()[0]
31        input_shape = input_cfg.shape
32        input_name = input_cfg.name
33        self.input_size = tuple(input_shape[2:4][::-1])
34        self.input_shape = input_shape
35        outputs = self.session.get_outputs()
36        output_names = []
37        for out in outputs:
38            output_names.append(out.name)
39        self.input_name = input_name
40        self.output_names = output_names
41        assert len(self.output_names)==1
42        self.output_shape = outputs[0].shape
43
44    def prepare(self, ctx_id, **kwargs):
45        if ctx_id<0:
46            self.session.set_providers(['CPUExecutionProvider'])
47
48    def get(self, img, face):
49        aimg = face_align.norm_crop(img, landmark=face.kps, image_size=self.input_size[0])
50        face.embedding = self.get_feat(aimg).flatten()
51        return face.embedding
52
53    def compute_sim(self, feat1, feat2):
54        from numpy.linalg import norm
55        feat1 = feat1.ravel()
56        feat2 = feat2.ravel()
57        sim = np.dot(feat1, feat2) / (norm(feat1) * norm(feat2))
58        return sim
59
60    def get_feat(self, imgs):
61        if not isinstance(imgs, list):
62            imgs = [imgs]
63        input_size = self.input_size
64        
65        blob = cv2.dnn.blobFromImages(imgs, 1.0 / self.input_std, input_size,
66                                      (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
67        net_out = self.session.run(self.output_names, {self.input_name: blob})[0]
68        return net_out
69
70    def forward(self, batch_data):
71        blob = (batch_data - self.input_mean) / self.input_std
72        net_out = self.session.run(self.output_names, {self.input_name: blob})[0]
73        return net_out
74

Earlier, we only used RetinaFace to detect faces. Now we are inputting the detected face information into the AcchFaceONNX model to perform face recognition. Reading through the code, it seems that the model is inferred by the get or forward method in the ArcFaceONNX class. Also, the get method output is stored in the embedding of the InsightFace face object, so this is definitely the embedding used for face recognition. In the next session, we will combine this ArcFaceONNX class with RetinaFace to create a face recognition code. We will also learn to implement a model in pytorch to deepen our understanding of ArcFaceONNX. The result of the implementation will be the same as the result of the quick facial recognition using the insightface app shown in Session 1, but the code will be more understandable and usable at the module level than the sample code in Session 1, where you had no idea what was going on. I'm sure you'll be able to use the code at the module level.

Next Session

Next session is here