Learning the Fundamentals of Practical AI Development and Facial Recognition with 'InsightFace' Library (2)

Recap of the Previous Session

Last session, session 1 is here

In the last session, we quickly implemented face authentication using the FaceAnalysis class sample code. It was astonishing how easily we could perform face authentication. We also opened the python implementation code page from the InsightFace library's GitHub page and checked the code of the FaceAnalysis class.

InsightFace Official Repository

The relevant page of the InsightFace official repository is here
Tips

The same python implementation code is copied to your python environment when you install the insightface library via pip. It can be cumbersome to follow the code on GitHub, so open the corresponding directory in a convenient code editor like VSCode on your PC to easily trace the code.

We also confirmed that files with the onnx extension are automatically downloaded when using the sample code. The downloaded files were the following five.

Downloaded onnx files

  1. 1k3d69.onnx
  2. 2d106det.onnx
  3. det_10g.onnx
  4. genderage.onnx
  5. w600k_r50.onnx

These onnx files are the 'AI models' used internally by InsightFace. In this session, we will analyze these AI models. ONNX (Open Neural Network Exchange) files are an open-source format for transferring machine learning models between different frameworks (such as TensorFlow, Pytorch, MxNet). ONNX files store the model's architecture and weights.

Analyzing the FaceAnalysis Class

Let’s look at the code of the FaceAnalysis class. This class is used for face authentication, but what does it contain? The following code is the constructor of the FaceAnalysis class, called when creating an instance of the class. AI models are downloaded inside the ensure_available() method (found on line 6). You can check the code of ensure_available() by ctrl+clicking it in VSCode.

face_analysis.py
1
2class FaceAnalysis:
3 def __init__(self, name=DEFAULT_MP_NAME, root='~/.insightface', allowed_modules=None, **kwargs):
4 onnxruntime.set_default_logger_severity(3)
5 self.models = {}
6 self.model_dir = ensure_available('models', name, root=root)
7 onnx_files = glob.glob(osp.join(self.model_dir, '*.onnx'))
8 onnx_files = sorted(onnx_files)
9 for onnx_file in onnx_files:
10 model = model_zoo.get_model(onnx_file, **kwargs)
11 if model is None:
12 print('model not recognized:', onnx_file)
13 elif allowed_modules is not None and model.taskname not in allowed_modules:
14 print('model ignore:', onnx_file, model.taskname)
15 del model
16 elif model.taskname not in self.models and (allowed_modules is None or model.taskname in allowed_modules):
17 print('find model:', onnx_file, model.taskname, model.input_shape, model.input_mean, model.input_std)
18 self.models[model.taskname] = model
19 else:
20 print('duplicated model task type, ignore:', onnx_file, model.taskname)
21 del model
22 assert 'detection' in self.models
23 self.det_model = self.models['detection']
24
25 def prepare(self, ctx_id, det_thresh=0.5, det_size=(640, 640)):
26 self.det_thresh = det_thresh
27 assert det_size is not None
28 print('set det-size:', det_size)
29 self.det_size = det_size
30 for taskname, model in self.models.items():
31 if taskname == 'detection':
32 model.prepare(ctx_id, input_size=det_size, det_thresh=det_thresh)
33 else:
34 model.prepare(ctx_id)
35 // omitted below
36

Part Where ONNX Files Are Loaded

If you read the above code in detail, you will notice that the part where the downloaded ONNX files are loaded is done in the loop process from line 9. It seems like you can get an overview of the model by looking at model.taskname! The onnx files are loaded into a variable called onnx_files and passed to model_zoo.get_model(). If you look at model_zoo.get_model(), you will understand more details, so let’s take a look at it immediately. You can also open model_zoo.py by ctrl+clicking the model.get_model() part in VSCode.

Relevant Part of model_zoo.py

model_zoo.py
1
2def get_model(name, **kwargs):
3 root = kwargs.get('root', '~/.insightface')
4 root = os.path.expanduser(root)
5 model_root = osp.join(root, 'models')
6 allow_download = kwargs.get('download', False)
7 download_zip = kwargs.get('download_zip', False)
8 if not name.endswith('.onnx'):
9 model_dir = os.path.join(model_root, name)
10 model_file = find_onnx_file(model_dir)
11 if model_file is None:
12 return None
13 else:
14 model_file = name
15 if not osp.exists(model_file) and allow_download:
16 model_file = download_onnx('models', model_file, root=root, download_zip=download_zip)
17 assert osp.exists(model_file), 'model_file %s should exist'%model_file
18 assert osp.isfile(model_file), 'model_file %s should be a file'%model_file
19 router = ModelRouter(model_file)
20 providers = kwargs.get('providers', get_default_providers())
21 provider_options = kwargs.get('provider_options', get_default_provider_options())
22 model = router.get_model(providers=providers, provider_options=provider_options)
23 return model
24

Let’s look at the relevant part of model_zoo.py. Inside get_model(), you can see that the onnx file is being loaded. Also, the contents of the onnx file are loaded and stored in a class called ModelRouter and returned.

Relevant Part of the ModelRouter Class

model_zoo.py
1
2class ModelRouter:
3 def __init__(self, onnx_file):
4 self.onnx_file = onnx_file
5
6 def get_model(self, **kwargs):
7 session = PickableInferenceSession(self.onnx_file, **kwargs)
8 print(f'Applied providers: {session._providers}, with options: {session._provider_options}')
9 inputs = session.get_inputs()
10 input_cfg = inputs[0]
11 input_shape = input_cfg.shape
12 outputs = session.get_outputs()
13
14 if len(outputs)>=5:
15 return RetinaFace(model_file=self.onnx_file, session=session)
16 elif input_shape[2]==192 and input_shape[3]==192:
17 return Landmark(model_file=self.onnx_file, session=session)
18 elif input_shape[2]==96 and input_shape[3]==96:
19 return Attribute(model_file=self.onnx_file, session=session)
20 elif len(inputs)==2 and input_shape[2]==128 and input_shape[3]==128:
21 return INSwapper(model_file=self.onnx_file, session=session)
22 elif input_shape[2]==input_shape[3] and input_shape[2]>=112 and input_shape[2]%16==0:
23 return ArcFaceONNX(model_file=self.onnx_file, session=session)
24 else:
25 #raise RuntimeError('error on model routing')
26 return None
27

Finally, we have arrived at this code. Please look at the code from line 14 onwards. You can see that InsightFace classifies AI models into five types. If you trace the code carefully, you will see that each corresponds to the downloaded ONNX files, and depending on the type, the appropriate AI model is returned.

Let's summarize the roles of each model class.

Roles of Each Model Class

  1. 1k3d69.onnxFace Detection AI
  2. 2d106det.onnxFacial Landmark Detection AI
  3. det_10g.onnxFacial Attribute Detection AI
  4. genderage.onnxFace Swapping AI
  5. w600k_r50.onnxFace Authentication AI

Dependencies between these models

These models are used for inference in the FaceAnalaysis class get() method in face_analysis.py. You can see the dependencies between each of them here. If you read the code, you will see that only face detection (face detection) is dependent on the others. This means that the face detection AI first detects the faces in the given image, and then the face landmark detection AI, face attribute detection AI, face authentication AI, etc. are working in parallel on the detected faces. The code is the following part.

face_analysis.py
1
2 def get(self, img, max_num=0):
3 bboxes, kpss = self.det_model.detect(img,
4 max_num=max_num,
5 metric='default')
6 if bboxes.shape[0] == 0:
7 return []
8 ret = []
9 for i in range(bboxes.shape[0]):
10 bbox = bboxes[i, 0:4]
11 det_score = bboxes[i, 4]
12 kps = None
13 if kpss is not None:
14 kps = kpss[i]
15 face = Face(bbox=bbox, kps=kps, det_score=det_score)
16 for taskname, model in self.models.items():
17 if taskname == 'detection':
18 continue
19 model.get(img, face)
20 ret.append(face)
21 return ret
22

Now, I would like to take a quick look at each model. The code for each model can be followed from the get_model method in the ModelRouter class in model_zoo.py that we have already reviewed above.

Analyzing RetinaFace (with face detection code)

Let's start by looking at face detection, which is the first step in face recognition and other face analysis processes. Face detection is implemented as a class called RetinaFace with 1k3d69.onnx loaded, the RetinaFace code is here.

retinaface.py
1
2class RetinaFace:
3 def __init__(self, model_file=None, session=None):
4 import onnxruntime
5 self.model_file = model_file
6 self.session = session
7 self.taskname = 'detection'
8 if self.session is None:
9 assert self.model_file is not None
10 assert osp.exists(self.model_file)
11 self.session = onnxruntime.InferenceSession(self.model_file, None)
12 self.center_cache = {}
13 self.nms_thresh = 0.4
14 self.det_thresh = 0.5
15 self._init_vars()
16
17 def _init_vars(self):
18 input_cfg = self.session.get_inputs()[0]
19 input_shape = input_cfg.shape
20 # print(input_shape)
21 if isinstance(input_shape[2], str):
22 self.input_size = None
23 else:
24 self.input_size = tuple(input_shape[2:4][::-1])
25 #print('image_size:', self.image_size)
26 input_name = input_cfg.name
27 self.input_shape = input_shape
28 outputs = self.session.get_outputs()
29 output_names = []
30 for o in outputs:
31 output_names.append(o.name)
32 self.input_name = input_name
33 self.output_names = output_names
34 self.input_mean = 127.5
35 self.input_std = 128.0
36 # print(self.output_names)
37 #assert len(outputs)==10 or len(outputs)==15
38 self.use_kps = False
39 self._anchor_ratio = 1.0
40 self._num_anchors = 1
41 if len(outputs) == 6:
42 self.fmc = 3
43 self._feat_stride_fpn = [8, 16, 32]
44 self._num_anchors = 2
45 elif len(outputs) == 9:
46 self.fmc = 3
47 self._feat_stride_fpn = [8, 16, 32]
48 self._num_anchors = 2
49 self.use_kps = True
50 elif len(outputs) == 10:
51 self.fmc = 5
52 self._feat_stride_fpn = [8, 16, 32, 64, 128]
53 self._num_anchors = 1
54 elif len(outputs) == 15:
55 self.fmc = 5
56 self._feat_stride_fpn = [8, 16, 32, 64, 128]
57 self._num_anchors = 1
58 self.use_kps = True
59
60 def prepare(self, ctx_id, **kwargs):
61 if ctx_id < 0:
62 self.session.set_providers(['CPUExecutionProvider'])
63 nms_thresh = kwargs.get('nms_thresh', None)
64 if nms_thresh is not None:
65 self.nms_thresh = nms_thresh
66 det_thresh = kwargs.get('det_thresh', None)
67 if det_thresh is not None:
68 self.det_thresh = det_thresh
69 input_size = kwargs.get('input_size', None)
70 if input_size is not None:
71 if self.input_size is not None:
72 print('warning: det_size is already set in detection model, ignore')
73 else:
74 self.input_size = input_size
75
76 def forward(self, img, threshold):
77 scores_list = []
78 bboxes_list = []
79 kpss_list = []
80 input_size = tuple(img.shape[0:2][::-1])
81 blob = cv2.dnn.blobFromImage(img, 1.0/self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
82 net_outs = self.session.run(self.output_names, {self.input_name: blob})
83
84 input_height = blob.shape[2]
85 input_width = blob.shape[3]
86 fmc = self.fmc
87 for idx, stride in enumerate(self._feat_stride_fpn):
88 scores = net_outs[idx]
89 bbox_preds = net_outs[idx+fmc]
90 bbox_preds = bbox_preds * stride
91 if self.use_kps:
92 kps_preds = net_outs[idx+fmc*2] * stride
93 height = input_height // stride
94 width = input_width // stride
95 K = height * width
96 key = (height, width, stride)
97 if key in self.center_cache:
98 anchor_centers = self.center_cache[key]
99 else:
100 # solution-1, c style:
101 #anchor_centers = np.zeros( (height, width, 2), dtype=np.float32 )
102 # for i in range(height):
103 # anchor_centers[i, :, 1] = i
104 # for i in range(width):
105 # anchor_centers[:, i, 0] = i
106
107 # solution-2:
108 #ax = np.arange(width, dtype=np.float32)
109 #ay = np.arange(height, dtype=np.float32)
110 #xv, yv = np.meshgrid(np.arange(width), np.arange(height))
111 #anchor_centers = np.stack([xv, yv], axis=-1).astype(np.float32)
112
113 # solution-3:
114 anchor_centers = np.stack(np.mgrid[:height, :width][::-1], axis=-1).astype(np.float32)
115 # print(anchor_centers.shape)
116
117 anchor_centers = (anchor_centers * stride).reshape((-1, 2))
118 if self._num_anchors > 1:
119 anchor_centers = np.stack([anchor_centers]*self._num_anchors, axis=1).reshape((-1, 2))
120 if len(self.center_cache) < 100:
121 self.center_cache[key] = anchor_centers
122
123 pos_inds = np.where(scores >= threshold)[0]
124 bboxes = distance2bbox(anchor_centers, bbox_preds)
125 pos_scores = scores[pos_inds]
126 pos_bboxes = bboxes[pos_inds]
127 scores_list.append(pos_scores)
128 bboxes_list.append(pos_bboxes)
129 if self.use_kps:
130 kpss = distance2kps(anchor_centers, kps_preds)
131 #kpss = kps_preds
132 kpss = kpss.reshape((kpss.shape[0], -1, 2))
133 pos_kpss = kpss[pos_inds]
134 kpss_list.append(pos_kpss)
135 return scores_list, bboxes_list, kpss_list
136
137 def detect(self, img, input_size=None, max_num=0, metric='default'):
138 assert input_size is not None or self.input_size is not None
139 input_size = self.input_size if input_size is None else input_size
140
141 im_ratio = float(img.shape[0]) / img.shape[1]
142 model_ratio = float(input_size[1]) / input_size[0]
143 if im_ratio > model_ratio:
144 new_height = input_size[1]
145 new_width = int(new_height / im_ratio)
146 else:
147 new_width = input_size[0]
148 new_height = int(new_width * im_ratio)
149 det_scale = float(new_height) / img.shape[0]
150 resized_img = cv2.resize(img, (new_width, new_height))
151 det_img = np.zeros((input_size[1], input_size[0], 3), dtype=np.uint8)
152 det_img[:new_height, :new_width, :] = resized_img
153
154 scores_list, bboxes_list, kpss_list = self.forward(det_img, self.det_thresh)
155
156 scores = np.vstack(scores_list)
157 scores_ravel = scores.ravel()
158 order = scores_ravel.argsort()[::-1]
159 bboxes = np.vstack(bboxes_list) / det_scale
160 if self.use_kps:
161 kpss = np.vstack(kpss_list) / det_scale
162 pre_det = np.hstack((bboxes, scores)).astype(np.float32, copy=False)
163 pre_det = pre_det[order, :]
164 keep = self.nms(pre_det)
165 det = pre_det[keep, :]
166 if self.use_kps:
167 kpss = kpss[order, :, :]
168 kpss = kpss[keep, :, :]
169 else:
170 kpss = None
171 if max_num > 0 and det.shape[0] > max_num:
172 area = (det[:, 2] - det[:, 0]) * (det[:, 3] -
173 det[:, 1])
174 img_center = img.shape[0] // 2, img.shape[1] // 2
175 offsets = np.vstack([
176 (det[:, 0] + det[:, 2]) / 2 - img_center[1],
177 (det[:, 1] + det[:, 3]) / 2 - img_center[0]
178 ])
179 offset_dist_squared = np.sum(np.power(offsets, 2.0), 0)
180 if metric == 'max':
181 values = area
182 else:
183 values = area - offset_dist_squared * 2.0 # some extra weight on the centering
184 bindex = np.argsort(
185 values)[::-1] # some extra weight on the centering
186 bindex = bindex[0:max_num]
187 det = det[bindex, :]
188 if kpss is not None:
189 kpss = kpss[bindex, :]
190 return det, kpss
191
192 def nms(self, dets):
193 thresh = self.nms_thresh
194 x1 = dets[:, 0]
195 y1 = dets[:, 1]
196 x2 = dets[:, 2]
197 y2 = dets[:, 3]
198 scores = dets[:, 4]
199
200 areas = (x2 - x1 + 1) * (y2 - y1 + 1)
201 order = scores.argsort()[::-1]
202
203 keep = []
204 while order.size > 0:
205 i = order[0]
206 keep.append(i)
207 xx1 = np.maximum(x1[i], x1[order[1:]])
208 yy1 = np.maximum(y1[i], y1[order[1:]])
209 xx2 = np.minimum(x2[i], x2[order[1:]])
210 yy2 = np.minimum(y2[i], y2[order[1:]])
211
212 w = np.maximum(0.0, xx2 - xx1 + 1)
213 h = np.maximum(0.0, yy2 - yy1 + 1)
214 inter = w * h
215 ovr = inter / (areas[i] + areas[order[1:]] - inter)
216

This is a rather difficult code, RetinaFace detects face area rectangle and keypoints, RetinaFace uses something like ResNet as a backbone and SSD extension for the head. It is quite difficult to create these on your own, so let's use the face detection as it is in this case. If you are interested in how object detection works, I suggest you study how SSD works first.

Let's try to detect faces in an image using RetinaFace.

Since we're here, let's create a quick code that only uses RetinaFace for face detection. If you are comfortable with this, please do not look at the answers below and try to make your own first.

face detection code
1
2import cv2
3import numpy as np
4import insightface
5
6# load detection model
7detector = insightface.model_zoo.get_model("models/buffalo_l/det_10g.onnx") # 任意のパスに変更してください
8detector.prepare(ctx_id=-1, input_size=(640, 640))
9
10# 入力画像を準備
11rgb_img = cv2.cvtColor(cv2.imread("data/images/path_to_image/some_face_1.jpg"), cv2.COLOR_BGR2RGB) # 任意のパスに変更してください
12
13# 検出
14bboxes, kpss = detector.detect(rgb_img)
15

Oh no, thank you library. I was able to do face detection with such ease. RetinaFace also detects not only the bounding box of the face part, but also the key points of the eyes, nose, and mouth at the same time. This is useful. After implementing the above code, you can output the bboxes and kpss respectively and see what's inside.

Checking the LandMark model

The LandMark model detects not only key points on a face, such as eyes, nose, and mouth, but also more detailed facial landmarks. Facial landmarks are useful, for example, when performing image processing to process the face or add effects. We will not use this LandMark model for this face recognition, so we will not touch the internal code. The code can be followed from the ModelRouter class in model_zoo.py, in a file called landmark.py, so if you are interested, please use it.

Check Attribute model

The Attribute model infers attributes such as gender and age of detected faces. The Attribute model is also not used in this face recognition, so we will not touch the internal code. The code can be followed from the ModelRouter class in model_zoo.py, in a file called attribute.py, so if you are interested in analyzing visitors, you can use it.

Analyzing the ArchFaceONNX model

Finally, let's take a look at the ArchFaceONNX facial recognition model. Here is the code for ArchFaceONNX.

arcface_onnx.py
1
2class ArcFaceONNX:
3 def __init__(self, model_file=None, session=None):
4 assert model_file is not None
5 self.model_file = model_file
6 self.session = session
7 self.taskname = 'recognition'
8 find_sub = False
9 find_mul = False
10 model = onnx.load(self.model_file)
11 graph = model.graph
12 for nid, node in enumerate(graph.node[:8]):
13 #print(nid, node.name)
14 if node.name.startswith('Sub') or node.name.startswith('_minus'):
15 find_sub = True
16 if node.name.startswith('Mul') or node.name.startswith('_mul'):
17 find_mul = True
18 if find_sub and find_mul:
19 #mxnet arcface model
20 input_mean = 0.0
21 input_std = 1.0
22 else:
23 input_mean = 127.5
24 input_std = 127.5
25 self.input_mean = input_mean
26 self.input_std = input_std
27 #print('input mean and std:', self.input_mean, self.input_std)
28 if self.session is None:
29 self.session = onnxruntime.InferenceSession(self.model_file, None)
30 input_cfg = self.session.get_inputs()[0]
31 input_shape = input_cfg.shape
32 input_name = input_cfg.name
33 self.input_size = tuple(input_shape[2:4][::-1])
34 self.input_shape = input_shape
35 outputs = self.session.get_outputs()
36 output_names = []
37 for out in outputs:
38 output_names.append(out.name)
39 self.input_name = input_name
40 self.output_names = output_names
41 assert len(self.output_names)==1
42 self.output_shape = outputs[0].shape
43
44 def prepare(self, ctx_id, **kwargs):
45 if ctx_id<0:
46 self.session.set_providers(['CPUExecutionProvider'])
47
48 def get(self, img, face):
49 aimg = face_align.norm_crop(img, landmark=face.kps, image_size=self.input_size[0])
50 face.embedding = self.get_feat(aimg).flatten()
51 return face.embedding
52
53 def compute_sim(self, feat1, feat2):
54 from numpy.linalg import norm
55 feat1 = feat1.ravel()
56 feat2 = feat2.ravel()
57 sim = np.dot(feat1, feat2) / (norm(feat1) * norm(feat2))
58 return sim
59
60 def get_feat(self, imgs):
61 if not isinstance(imgs, list):
62 imgs = [imgs]
63 input_size = self.input_size
64
65 blob = cv2.dnn.blobFromImages(imgs, 1.0 / self.input_std, input_size,
66 (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
67 net_out = self.session.run(self.output_names, {self.input_name: blob})[0]
68 return net_out
69
70 def forward(self, batch_data):
71 blob = (batch_data - self.input_mean) / self.input_std
72 net_out = self.session.run(self.output_names, {self.input_name: blob})[0]
73 return net_out
74

Earlier, we only used RetinaFace to detect faces. Now we are inputting the detected face information into the AcchFaceONNX model to perform face recognition. Reading through the code, it seems that the model is inferred by the get or forward method in the ArcFaceONNX class. Also, the get method output is stored in the embedding of the InsightFace face object, so this is definitely the embedding used for face recognition. In the next session, we will combine this ArcFaceONNX class with RetinaFace to create a face recognition code. We will also learn to implement a model in pytorch to deepen our understanding of ArcFaceONNX. The result of the implementation will be the same as the result of the quick facial recognition using the insightface app shown in Session 1, but the code will be more understandable and usable at the module level than the sample code in Session 1, where you had no idea what was going on. I'm sure you'll be able to use the code at the module level.