Learning the Fundamentals of Practical AI Development and Facial Recognition with 'InsightFace' Library (2)
Recap of the Previous Session
Last session, session 1 is hereIn the last session, we quickly implemented face authentication using the FaceAnalysis class sample code. It was astonishing how easily we could perform face authentication. We also opened the python implementation code page from the InsightFace library's GitHub page and checked the code of the FaceAnalysis class.
InsightFace Official Repository
The relevant page of the InsightFace official repository is hereThe same python implementation code is copied to your python environment when you install the insightface library via pip. It can be cumbersome to follow the code on GitHub, so open the corresponding directory in a convenient code editor like VSCode on your PC to easily trace the code.
We also confirmed that files with the onnx extension are automatically downloaded when using the sample code. The downloaded files were the following five.
Downloaded onnx files
- 1k3d69.onnx
- 2d106det.onnx
- det_10g.onnx
- genderage.onnx
- w600k_r50.onnx
These onnx files are the 'AI models' used internally by InsightFace. In this session, we will analyze these AI models. ONNX (Open Neural Network Exchange) files are an open-source format for transferring machine learning models between different frameworks (such as TensorFlow, Pytorch, MxNet). ONNX files store the model's architecture and weights.
Analyzing the FaceAnalysis Class
Let’s look at the code of the FaceAnalysis class. This class is used for face authentication, but what does it contain? The following code is the constructor of the FaceAnalysis class, called when creating an instance of the class. AI models are downloaded inside the ensure_available() method (found on line 6). You can check the code of ensure_available() by ctrl+clicking it in VSCode.
12class FaceAnalysis:3 def __init__(self, name=DEFAULT_MP_NAME, root='~/.insightface', allowed_modules=None, **kwargs):4 onnxruntime.set_default_logger_severity(3)5 self.models = {}6 self.model_dir = ensure_available('models', name, root=root)7 onnx_files = glob.glob(osp.join(self.model_dir, '*.onnx'))8 onnx_files = sorted(onnx_files)9 for onnx_file in onnx_files:10 model = model_zoo.get_model(onnx_file, **kwargs)11 if model is None:12 print('model not recognized:', onnx_file)13 elif allowed_modules is not None and model.taskname not in allowed_modules:14 print('model ignore:', onnx_file, model.taskname)15 del model16 elif model.taskname not in self.models and (allowed_modules is None or model.taskname in allowed_modules):17 print('find model:', onnx_file, model.taskname, model.input_shape, model.input_mean, model.input_std)18 self.models[model.taskname] = model19 else:20 print('duplicated model task type, ignore:', onnx_file, model.taskname)21 del model22 assert 'detection' in self.models23 self.det_model = self.models['detection']2425 def prepare(self, ctx_id, det_thresh=0.5, det_size=(640, 640)):26 self.det_thresh = det_thresh27 assert det_size is not None28 print('set det-size:', det_size)29 self.det_size = det_size30 for taskname, model in self.models.items():31 if taskname == 'detection':32 model.prepare(ctx_id, input_size=det_size, det_thresh=det_thresh)33 else:34 model.prepare(ctx_id)35 // omitted below36
Part Where ONNX Files Are Loaded
If you read the above code in detail, you will notice that the part where the downloaded ONNX files are loaded is done in the loop process from line 9. It seems like you can get an overview of the model by looking at model.taskname! The onnx files are loaded into a variable called onnx_files and passed to model_zoo.get_model(). If you look at model_zoo.get_model(), you will understand more details, so let’s take a look at it immediately. You can also open model_zoo.py by ctrl+clicking the model.get_model() part in VSCode.
Relevant Part of model_zoo.py
12def get_model(name, **kwargs):3 root = kwargs.get('root', '~/.insightface')4 root = os.path.expanduser(root)5 model_root = osp.join(root, 'models')6 allow_download = kwargs.get('download', False)7 download_zip = kwargs.get('download_zip', False)8 if not name.endswith('.onnx'):9 model_dir = os.path.join(model_root, name)10 model_file = find_onnx_file(model_dir)11 if model_file is None:12 return None13 else:14 model_file = name15 if not osp.exists(model_file) and allow_download:16 model_file = download_onnx('models', model_file, root=root, download_zip=download_zip)17 assert osp.exists(model_file), 'model_file %s should exist'%model_file18 assert osp.isfile(model_file), 'model_file %s should be a file'%model_file19 router = ModelRouter(model_file)20 providers = kwargs.get('providers', get_default_providers())21 provider_options = kwargs.get('provider_options', get_default_provider_options())22 model = router.get_model(providers=providers, provider_options=provider_options)23 return model24
Let’s look at the relevant part of model_zoo.py. Inside get_model(), you can see that the onnx file is being loaded. Also, the contents of the onnx file are loaded and stored in a class called ModelRouter and returned.
Relevant Part of the ModelRouter Class
12class ModelRouter:3 def __init__(self, onnx_file):4 self.onnx_file = onnx_file56 def get_model(self, **kwargs):7 session = PickableInferenceSession(self.onnx_file, **kwargs)8 print(f'Applied providers: {session._providers}, with options: {session._provider_options}')9 inputs = session.get_inputs()10 input_cfg = inputs[0]11 input_shape = input_cfg.shape12 outputs = session.get_outputs()1314 if len(outputs)>=5:15 return RetinaFace(model_file=self.onnx_file, session=session)16 elif input_shape[2]==192 and input_shape[3]==192:17 return Landmark(model_file=self.onnx_file, session=session)18 elif input_shape[2]==96 and input_shape[3]==96:19 return Attribute(model_file=self.onnx_file, session=session)20 elif len(inputs)==2 and input_shape[2]==128 and input_shape[3]==128:21 return INSwapper(model_file=self.onnx_file, session=session)22 elif input_shape[2]==input_shape[3] and input_shape[2]>=112 and input_shape[2]%16==0:23 return ArcFaceONNX(model_file=self.onnx_file, session=session)24 else:25 #raise RuntimeError('error on model routing')26 return None27
Finally, we have arrived at this code. Please look at the code from line 14 onwards. You can see that InsightFace classifies AI models into five types. If you trace the code carefully, you will see that each corresponds to the downloaded ONNX files, and depending on the type, the appropriate AI model is returned.
Let's summarize the roles of each model class.
Roles of Each Model Class
- 1k3d69.onnx ➡ Face Detection AI
- 2d106det.onnx ➡ Facial Landmark Detection AI
- det_10g.onnx ➡ Facial Attribute Detection AI
- genderage.onnx ➡ Face Swapping AI
- w600k_r50.onnx ➡ Face Authentication AI
Dependencies between these models
These models are used for inference in the FaceAnalaysis class get() method in face_analysis.py. You can see the dependencies between each of them here. If you read the code, you will see that only face detection (face detection) is dependent on the others. This means that the face detection AI first detects the faces in the given image, and then the face landmark detection AI, face attribute detection AI, face authentication AI, etc. are working in parallel on the detected faces. The code is the following part.
12 def get(self, img, max_num=0):3 bboxes, kpss = self.det_model.detect(img,4 max_num=max_num,5 metric='default')6 if bboxes.shape[0] == 0:7 return []8 ret = []9 for i in range(bboxes.shape[0]):10 bbox = bboxes[i, 0:4]11 det_score = bboxes[i, 4]12 kps = None13 if kpss is not None:14 kps = kpss[i]15 face = Face(bbox=bbox, kps=kps, det_score=det_score)16 for taskname, model in self.models.items():17 if taskname == 'detection':18 continue19 model.get(img, face)20 ret.append(face)21 return ret22
Now, I would like to take a quick look at each model. The code for each model can be followed from the get_model method in the ModelRouter class in model_zoo.py that we have already reviewed above.
Analyzing RetinaFace (with face detection code)
Let's start by looking at face detection, which is the first step in face recognition and other face analysis processes. Face detection is implemented as a class called RetinaFace with 1k3d69.onnx loaded, the RetinaFace code is here.
12class RetinaFace:3 def __init__(self, model_file=None, session=None):4 import onnxruntime5 self.model_file = model_file6 self.session = session7 self.taskname = 'detection'8 if self.session is None:9 assert self.model_file is not None10 assert osp.exists(self.model_file)11 self.session = onnxruntime.InferenceSession(self.model_file, None)12 self.center_cache = {}13 self.nms_thresh = 0.414 self.det_thresh = 0.515 self._init_vars()1617 def _init_vars(self):18 input_cfg = self.session.get_inputs()[0]19 input_shape = input_cfg.shape20 # print(input_shape)21 if isinstance(input_shape[2], str):22 self.input_size = None23 else:24 self.input_size = tuple(input_shape[2:4][::-1])25 #print('image_size:', self.image_size)26 input_name = input_cfg.name27 self.input_shape = input_shape28 outputs = self.session.get_outputs()29 output_names = []30 for o in outputs:31 output_names.append(o.name)32 self.input_name = input_name33 self.output_names = output_names34 self.input_mean = 127.535 self.input_std = 128.036 # print(self.output_names)37 #assert len(outputs)==10 or len(outputs)==1538 self.use_kps = False39 self._anchor_ratio = 1.040 self._num_anchors = 141 if len(outputs) == 6:42 self.fmc = 343 self._feat_stride_fpn = [8, 16, 32]44 self._num_anchors = 245 elif len(outputs) == 9:46 self.fmc = 347 self._feat_stride_fpn = [8, 16, 32]48 self._num_anchors = 249 self.use_kps = True50 elif len(outputs) == 10:51 self.fmc = 552 self._feat_stride_fpn = [8, 16, 32, 64, 128]53 self._num_anchors = 154 elif len(outputs) == 15:55 self.fmc = 556 self._feat_stride_fpn = [8, 16, 32, 64, 128]57 self._num_anchors = 158 self.use_kps = True5960 def prepare(self, ctx_id, **kwargs):61 if ctx_id < 0:62 self.session.set_providers(['CPUExecutionProvider'])63 nms_thresh = kwargs.get('nms_thresh', None)64 if nms_thresh is not None:65 self.nms_thresh = nms_thresh66 det_thresh = kwargs.get('det_thresh', None)67 if det_thresh is not None:68 self.det_thresh = det_thresh69 input_size = kwargs.get('input_size', None)70 if input_size is not None:71 if self.input_size is not None:72 print('warning: det_size is already set in detection model, ignore')73 else:74 self.input_size = input_size7576 def forward(self, img, threshold):77 scores_list = []78 bboxes_list = []79 kpss_list = []80 input_size = tuple(img.shape[0:2][::-1])81 blob = cv2.dnn.blobFromImage(img, 1.0/self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)82 net_outs = self.session.run(self.output_names, {self.input_name: blob})8384 input_height = blob.shape[2]85 input_width = blob.shape[3]86 fmc = self.fmc87 for idx, stride in enumerate(self._feat_stride_fpn):88 scores = net_outs[idx]89 bbox_preds = net_outs[idx+fmc]90 bbox_preds = bbox_preds * stride91 if self.use_kps:92 kps_preds = net_outs[idx+fmc*2] * stride93 height = input_height // stride94 width = input_width // stride95 K = height * width96 key = (height, width, stride)97 if key in self.center_cache:98 anchor_centers = self.center_cache[key]99 else:100 # solution-1, c style:101 #anchor_centers = np.zeros( (height, width, 2), dtype=np.float32 )102 # for i in range(height):103 # anchor_centers[i, :, 1] = i104 # for i in range(width):105 # anchor_centers[:, i, 0] = i106107 # solution-2:108 #ax = np.arange(width, dtype=np.float32)109 #ay = np.arange(height, dtype=np.float32)110 #xv, yv = np.meshgrid(np.arange(width), np.arange(height))111 #anchor_centers = np.stack([xv, yv], axis=-1).astype(np.float32)112113 # solution-3:114 anchor_centers = np.stack(np.mgrid[:height, :width][::-1], axis=-1).astype(np.float32)115 # print(anchor_centers.shape)116117 anchor_centers = (anchor_centers * stride).reshape((-1, 2))118 if self._num_anchors > 1:119 anchor_centers = np.stack([anchor_centers]*self._num_anchors, axis=1).reshape((-1, 2))120 if len(self.center_cache) < 100:121 self.center_cache[key] = anchor_centers122123 pos_inds = np.where(scores >= threshold)[0]124 bboxes = distance2bbox(anchor_centers, bbox_preds)125 pos_scores = scores[pos_inds]126 pos_bboxes = bboxes[pos_inds]127 scores_list.append(pos_scores)128 bboxes_list.append(pos_bboxes)129 if self.use_kps:130 kpss = distance2kps(anchor_centers, kps_preds)131 #kpss = kps_preds132 kpss = kpss.reshape((kpss.shape[0], -1, 2))133 pos_kpss = kpss[pos_inds]134 kpss_list.append(pos_kpss)135 return scores_list, bboxes_list, kpss_list136137 def detect(self, img, input_size=None, max_num=0, metric='default'):138 assert input_size is not None or self.input_size is not None139 input_size = self.input_size if input_size is None else input_size140141 im_ratio = float(img.shape[0]) / img.shape[1]142 model_ratio = float(input_size[1]) / input_size[0]143 if im_ratio > model_ratio:144 new_height = input_size[1]145 new_width = int(new_height / im_ratio)146 else:147 new_width = input_size[0]148 new_height = int(new_width * im_ratio)149 det_scale = float(new_height) / img.shape[0]150 resized_img = cv2.resize(img, (new_width, new_height))151 det_img = np.zeros((input_size[1], input_size[0], 3), dtype=np.uint8)152 det_img[:new_height, :new_width, :] = resized_img153154 scores_list, bboxes_list, kpss_list = self.forward(det_img, self.det_thresh)155156 scores = np.vstack(scores_list)157 scores_ravel = scores.ravel()158 order = scores_ravel.argsort()[::-1]159 bboxes = np.vstack(bboxes_list) / det_scale160 if self.use_kps:161 kpss = np.vstack(kpss_list) / det_scale162 pre_det = np.hstack((bboxes, scores)).astype(np.float32, copy=False)163 pre_det = pre_det[order, :]164 keep = self.nms(pre_det)165 det = pre_det[keep, :]166 if self.use_kps:167 kpss = kpss[order, :, :]168 kpss = kpss[keep, :, :]169 else:170 kpss = None171 if max_num > 0 and det.shape[0] > max_num:172 area = (det[:, 2] - det[:, 0]) * (det[:, 3] -173 det[:, 1])174 img_center = img.shape[0] // 2, img.shape[1] // 2175 offsets = np.vstack([176 (det[:, 0] + det[:, 2]) / 2 - img_center[1],177 (det[:, 1] + det[:, 3]) / 2 - img_center[0]178 ])179 offset_dist_squared = np.sum(np.power(offsets, 2.0), 0)180 if metric == 'max':181 values = area182 else:183 values = area - offset_dist_squared * 2.0 # some extra weight on the centering184 bindex = np.argsort(185 values)[::-1] # some extra weight on the centering186 bindex = bindex[0:max_num]187 det = det[bindex, :]188 if kpss is not None:189 kpss = kpss[bindex, :]190 return det, kpss191192 def nms(self, dets):193 thresh = self.nms_thresh194 x1 = dets[:, 0]195 y1 = dets[:, 1]196 x2 = dets[:, 2]197 y2 = dets[:, 3]198 scores = dets[:, 4]199200 areas = (x2 - x1 + 1) * (y2 - y1 + 1)201 order = scores.argsort()[::-1]202203 keep = []204 while order.size > 0:205 i = order[0]206 keep.append(i)207 xx1 = np.maximum(x1[i], x1[order[1:]])208 yy1 = np.maximum(y1[i], y1[order[1:]])209 xx2 = np.minimum(x2[i], x2[order[1:]])210 yy2 = np.minimum(y2[i], y2[order[1:]])211212 w = np.maximum(0.0, xx2 - xx1 + 1)213 h = np.maximum(0.0, yy2 - yy1 + 1)214 inter = w * h215 ovr = inter / (areas[i] + areas[order[1:]] - inter)216
This is a rather difficult code, RetinaFace detects face area rectangle and keypoints, RetinaFace uses something like ResNet as a backbone and SSD extension for the head. It is quite difficult to create these on your own, so let's use the face detection as it is in this case. If you are interested in how object detection works, I suggest you study how SSD works first.
Let's try to detect faces in an image using RetinaFace.
Since we're here, let's create a quick code that only uses RetinaFace for face detection. If you are comfortable with this, please do not look at the answers below and try to make your own first.
12import cv23import numpy as np4import insightface56# load detection model7detector = insightface.model_zoo.get_model("models/buffalo_l/det_10g.onnx") # 任意のパスに変更してください8detector.prepare(ctx_id=-1, input_size=(640, 640))910# 入力画像を準備11rgb_img = cv2.cvtColor(cv2.imread("data/images/path_to_image/some_face_1.jpg"), cv2.COLOR_BGR2RGB) # 任意のパスに変更してください1213# 検出14bboxes, kpss = detector.detect(rgb_img)15
Oh no, thank you library. I was able to do face detection with such ease. RetinaFace also detects not only the bounding box of the face part, but also the key points of the eyes, nose, and mouth at the same time. This is useful. After implementing the above code, you can output the bboxes and kpss respectively and see what's inside.
Checking the LandMark model
The LandMark model detects not only key points on a face, such as eyes, nose, and mouth, but also more detailed facial landmarks. Facial landmarks are useful, for example, when performing image processing to process the face or add effects. We will not use this LandMark model for this face recognition, so we will not touch the internal code. The code can be followed from the ModelRouter class in model_zoo.py, in a file called landmark.py, so if you are interested, please use it.
Check Attribute model
The Attribute model infers attributes such as gender and age of detected faces. The Attribute model is also not used in this face recognition, so we will not touch the internal code. The code can be followed from the ModelRouter class in model_zoo.py, in a file called attribute.py, so if you are interested in analyzing visitors, you can use it.
Analyzing the ArchFaceONNX model
Finally, let's take a look at the ArchFaceONNX facial recognition model. Here is the code for ArchFaceONNX.
12class ArcFaceONNX:3 def __init__(self, model_file=None, session=None):4 assert model_file is not None5 self.model_file = model_file6 self.session = session7 self.taskname = 'recognition'8 find_sub = False9 find_mul = False10 model = onnx.load(self.model_file)11 graph = model.graph12 for nid, node in enumerate(graph.node[:8]):13 #print(nid, node.name)14 if node.name.startswith('Sub') or node.name.startswith('_minus'):15 find_sub = True16 if node.name.startswith('Mul') or node.name.startswith('_mul'):17 find_mul = True18 if find_sub and find_mul:19 #mxnet arcface model20 input_mean = 0.021 input_std = 1.022 else:23 input_mean = 127.524 input_std = 127.525 self.input_mean = input_mean26 self.input_std = input_std27 #print('input mean and std:', self.input_mean, self.input_std)28 if self.session is None:29 self.session = onnxruntime.InferenceSession(self.model_file, None)30 input_cfg = self.session.get_inputs()[0]31 input_shape = input_cfg.shape32 input_name = input_cfg.name33 self.input_size = tuple(input_shape[2:4][::-1])34 self.input_shape = input_shape35 outputs = self.session.get_outputs()36 output_names = []37 for out in outputs:38 output_names.append(out.name)39 self.input_name = input_name40 self.output_names = output_names41 assert len(self.output_names)==142 self.output_shape = outputs[0].shape4344 def prepare(self, ctx_id, **kwargs):45 if ctx_id<0:46 self.session.set_providers(['CPUExecutionProvider'])4748 def get(self, img, face):49 aimg = face_align.norm_crop(img, landmark=face.kps, image_size=self.input_size[0])50 face.embedding = self.get_feat(aimg).flatten()51 return face.embedding5253 def compute_sim(self, feat1, feat2):54 from numpy.linalg import norm55 feat1 = feat1.ravel()56 feat2 = feat2.ravel()57 sim = np.dot(feat1, feat2) / (norm(feat1) * norm(feat2))58 return sim5960 def get_feat(self, imgs):61 if not isinstance(imgs, list):62 imgs = [imgs]63 input_size = self.input_size64 65 blob = cv2.dnn.blobFromImages(imgs, 1.0 / self.input_std, input_size,66 (self.input_mean, self.input_mean, self.input_mean), swapRB=True)67 net_out = self.session.run(self.output_names, {self.input_name: blob})[0]68 return net_out6970 def forward(self, batch_data):71 blob = (batch_data - self.input_mean) / self.input_std72 net_out = self.session.run(self.output_names, {self.input_name: blob})[0]73 return net_out74
Earlier, we only used RetinaFace to detect faces. Now we are inputting the detected face information into the AcchFaceONNX model to perform face recognition. Reading through the code, it seems that the model is inferred by the get or forward method in the ArcFaceONNX class. Also, the get method output is stored in the embedding of the InsightFace face object, so this is definitely the embedding used for face recognition. In the next session, we will combine this ArcFaceONNX class with RetinaFace to create a face recognition code. We will also learn to implement a model in pytorch to deepen our understanding of ArcFaceONNX. The result of the implementation will be the same as the result of the quick facial recognition using the insightface app shown in Session 1, but the code will be more understandable and usable at the module level than the sample code in Session 1, where you had no idea what was going on. I'm sure you'll be able to use the code at the module level.