Abbreviations#
Abbreviation |
Definition |
---|---|
1x1 Filter |
A 1x1 filter, also known as a pointwise convolution applies a linear transformation to the input feature map, combining information from different channels or reducing the number of channels. |
Anchor |
In object detection, an anchor is a predefined bounding box that is used as a reference for detecting objects in an image. Anchors are typically defined at different scales and aspect ratios to handle objects of various sizes and shapes. During training, the network learns to adjust the anchors to better fit the objects in the image. Anchors play a crucial role in the region proposal network (RPN) of object detection models such as Faster R-CNN and RetinaNet. |
Anchor-free |
In object detection, anchor-free methods do not rely on predefined anchors to detect objects in an image. Instead, they directly predict the bounding boxes and class probabilities without the need for anchor boxes. Anchor-free methods have gained popularity due to their simplicity and flexibility in handling objects of various sizes and aspect ratios. They have been successfully applied in object detection models such as CenterNet and FCOS. |
AP |
Average Precision: It measures the accuracy by calculating the area under the precision-recall curve. |
AP@[0.5:0.95] |
Average Precision (IoU) between 50% and 95% taken into account |
APlarge |
AP for large objects (area > 962) |
APmedium |
AP for medium objects (322 < area < 962) |
APsmall |
AP for small objects (area < 322) |
Backbone |
The backbone of a neural network refers to the main architecture or structure of the network. It typically consists of multiple layers or modules that extract features from the input data. |
Batch Normalization |
Batch normalization is a technique that normalizes the activations of each batch in a neural network. It helps to stabilize and speed up the training process by reducing internal covariate shift and allowing higher learning rates. Batch normalization is commonly used in deep neural networks. |
BoF |
Bag of Freebies are those methods which change the training, not the inference PANNet = Path Aggregation Network |
Bottleneck |
In machine learning, a bottleneck refers to a layer or a set of layers in a neural network that has a smaller number of units compared to the preceding and succeeding layers. Bottlenecks are often used in architectures like ResNet to reduce the computational complexity and memory requirements of the network. They can also act as a bottleneck for information flow, forcing the network to learn more compact and informative representations. |
CBN |
Cross Batch Normalization – takes mean and standard deviation of the four last batches |
CIoU |
Complete Intersection over Union (CIoU) is an extension of the Intersection over Union (IoU) metric used in object detection. It not only measures the overlap between two bounding boxes but also takes into account the distance between the central points and the aspect ratio of the bounding boxes. It’s a more comprehensive metric that can distinguish between different relative positions and aspect ratios of the boxes. |
CmBN |
Cross Iteration Mini Batch Normalization – takes CBN further, assumes four batches inside a single batch |
CSP |
Cross-Stage Partial Connection |
CSPNet |
CSPNet (Cross Stage Partial Network) is a convolutional neural network architecture that improves the performance of object detection tasks. It introduces a cross stage partial connection module, which enhances the information flow between different stages of the network. CSPNet has been shown to achieve state-of-the-art results on various object detection benchmarks. |
DenseNet |
DenseNet is a convolutional neural network architecture that connects each layer to every other layer in a feed-forward fashion. It is known for its dense connectivity pattern, where each layer receives feature maps from all preceding layers. DenseNet has been shown to improve gradient flow, encourage feature reuse, and reduce the number of parameters compared to traditional convolutional neural networks. |
DFL |
Distribution Focal Loss – Focal Loss has proven to be effective at balancing loss by increasing the loss on hard-to-classify classes |
Dropout |
Dropout is a regularization technique that randomly sets a fraction of the input units to zero during training. It helps to prevent overfitting by reducing the co-adaptation of neurons and encouraging the network to learn more robust features. Dropout is commonly used in deep neural networks. |
Head |
The head of a neural network refers to the final layers or modules that are responsible for producing the output predictions. It takes the features extracted by the backbone and processes them to generate the desired output. |
IDD |
Independent and identically distributed data |
Image Segmentation |
Encompasses Instance Segmentation (things) and Semantic Segmentation (stuff) |
Instance Segmentation |
Studies things (e.g. Masked R-CNN, Faster R-CNN, PANet, YOLCAT), measurement: AP |
IoU |
Intersection over Union |
mAP |
Averaged over all classes |
Mosaic Data Augmentation |
Mosaic data augmentation is a technique used in computer vision tasks, such as object detection, to improve the performance of deep learning models. It involves combining multiple images into a single mosaic image and using it as training data. Mosaic data augmentation helps to increase the diversity and complexity of the training data, leading to better generalization and robustness of the model. |
Neck |
The neck of a neural network refers to an intermediate set of layers or modules that connect the backbone and the head. It is responsible for further refining the features extracted by the backbone before passing them to the head for final processing. |
NICO |
Non-IID Image dataset with contexts |
NMS |
Non-Maximum Suppression is a technique used in object detection to eliminate redundant and overlapping bounding boxes. It selects the most probable bounding box and eliminates any box that has a high overlap (as measured by the Intersection over Union (IoU) metric) with the chosen box. |
Non-IDD |
Non Independent and identically distributed data |
Padding |
Padding is the process of adding extra pixels around the input image or feature map. It is commonly used in convolutional neural networks to preserve spatial dimensions and prevent information loss at the edges of the image. Padding can be done with zeros (zero-padding) or with values from the original image (reflective padding or symmetric padding). |
Panoptic Segmentation |
Studies both (most models are based on Mask R-CNN) |
Pooling |
Pooling is a downsampling operation that reduces the spatial dimensions of the input feature map. It is commonly used to reduce the computational complexity of the network and to extract the most important features. The most common types of pooling are max pooling and average pooling, which take the maximum or average value within a pooling window, respectively. |
Precision |
Relevant Retrieved / Retrieved |
Recall |
Relevant retrieved / Relevant |
Region Proposal |
Region proposal is a technique used in object detection to generate potential bounding boxes around objects in an image. It helps to narrow down the search space for the object detector by proposing regions that are likely to contain objects. Region proposal methods, such as Selective Search and EdgeBoxes, use various algorithms to generate these potential regions based on image features and similarity measures. |
Residual |
\( Y = F(x) + x \) |
SAM |
Special Attention Module |
SAT |
Self Adversarial Training |
Semantic Segmentation |
Studies stuff (e.g. SegNet, U-Net, DeconvNet), measurement: IoU |
SENet |
Squeeze and Excitation Network, finds which channel are more/less important in a feature map |
SiLU |
Sigmoid Linear Unit, aka. Swish |
Skip Connection |
Skip connection, also known as residual connection, is a technique that adds the input of a layer to the output of a subsequent layer. It allows the network to learn residual functions, which can help to alleviate the vanishing gradient problem and improve the flow of gradients during training. Skip connections are commonly used in deep residual networks (ResNet). |
SPP |
Spatial Pyramid Pooling, increases the receptive field |
Stride |
Stride refers to the number of pixels the convolutional kernel moves at each step during the convolution operation. A stride of 1 means the kernel moves one pixel at a time, while a stride of 2 means the kernel moves two pixels at a time. Stride affects the output size of the feature map, as well as the amount of computation required. |
Upsampling |
Upsampling is the process of increasing the spatial dimensions of an image or feature map. It is commonly used in tasks such as image super-resolution, semantic segmentation, and generative modeling. Upsampling can be done using techniques such as transposed convolution, nearest-neighbor interpolation, or bilinear interpolation. |