▶
Classical Vision
▶
Edge Detection
Canny Edge Detection
Structured Forests
Crisp Boundary Detection
Holistically Nested Edge Detection
▶
Corner Detection
Harris Corner Detector
▶
Keypoint Detection & Feature Extraction
SIFT
SURF
▶
Image Matching
Multiscale Oriented Matching
▶
Image Segmentation
Normalized Cuts
Mean Shift
▶
Line Detection
Hough Transform
Hough Transform Survey
▶
Image Representation
Log Polar Representation
Maximally Stable Extremal Regions
Histogram of Oriented Gradients
Spatial Pyramid Kernel
Text Retrieval for Image Matching
Bag of Keypoints
Vector of Locally Aggregated Descriptors (VLAD)
▶
Feature/Descriptor Matching
Optical Flow
RANSAC
Scalable Recognition
Selective Match Kernels
Pyramid Matching
Color Indexing
Fast Image Retrieval via Embeddings
Approximate Correspondences
EarthMover's Distance Matching
Spatial Pyramid Matching
Hough Pyramid Matching
▶
Deep Vision
▶
Convolutional Neural Networks
LeCun CNN (1998)
Backpropagation in CNNs
Spatial Pyramid Pooling
Spectral Representations for CNNs
AlexNet (2012)
ZFNet
VGGNet
GoogLeNet
ResNets
Identity Mapping in ResNets
Wide Residual Networks
ResNeXt
Stochastic Depth Networks
DenseNets
MobileNet
EfficientNet
SENet
Neural Architecture Search
▶
Methods for Explaining CNNs
One Weird Trick to Parallelize CNNs
Visualizing and Understanding CNNs
Deep Inside CNNs
Striving for Simplicity: The All Convolutional Net
Class Activation Maps (CAM)
Grad-CAM
Grad-CAM++
DeepLIFT
Integrated Gradients
SmoothGrad
XRAI: Better Attributions Through Regions
Local Interpretable Model-agnostic Explanations (LIME)
SHAP (SHapley Additive exPlanations)
DeepDream
Neural Style Transfer
▶
Object Detection
OverFeat
R-CNN
Fast R-CNN
Faster R-CNN
YOLOv1
Single Shot MultiBox Detector (SSD)
Feature Pyramid Networks
Focal Loss for Dense Object Detection
▶
Segmentation
Fully Convolutional Networks (FCNs)
SegNet
U-Net
Pyramid Scene Parsing Network (PSPNet)
DeepLab
DeepLabV3
Mask R-CNN
▶
Face Recognition
Deep Face Recognition: A Survey
Siamese Networks
DeepFace
Deep Learning Face Representation by Joint Identification-Verification
FaceNet
SphereFace
▶
Human Pose and Crowds
DeepPose
Human Pose Estimation with Iterative Error Feedback
Efficient Object Localization Using Convolutional Networks
CNN-based Density Estimation and Crowd Counting
Deep People Counting in Extremely Dense Crowds
Single-Image Crowd Counting via Multi-Column CNN
Switching Convolutional Neural Network for Crowd Counting
Crowd Counting via Scale-Adaptive CNN
Scale Pyramid Network for Crowd Counting
▶
Depth Estimation
Depth Map Prediction from a Single Image
GeoNet
▶
Super Resolution
Image Super-Resolution Using CNNs
▶
Anomaly Detection
Enhancing Reliability of Out-of-Distribution Detection
▶
Video Understanding
3D CNNs for Human Action Recognition
Two-Stream CNNs for Action Recognition
Long-term Recurrent CNNs for Video Recognition
▶
Attention Models in Vision
Neural Machine Translation with Attention
Effective Attention-based Neural Machine Translation
Show, Attend, and Tell: Neural Image Captioning
DRAW: A Recurrent Neural Network for Image Generation
Spatial Transformer Networks
Attention is All You Need
Vision Transformers
▶
Transformer-Based Models
How to Train Your ViT?
Data-Efficient Transformers (DEiT)
Swin Transformers
▶
Transformer-Based Object Detection
DETR: End-to-End Object Detection with Transformers
Deformable DETR
Dynamic Anchor Boxes DETR
Denoising DETR
Conditional DETR
Cascade DETR
DINO
Grounding DINO
▶
Transformer-Based Image Segmentation
Segment Anything
Reviving Iterative Training with Mask Guidance
Grounded SAM
Tag2Text
Recognize Anything
▶
Vision-Language Models
CLIP: Connecting Text and Images
BLIP: Bootstrapped Language-Image Pretraining
BLIP-2
GLIP: Grounded Language-Image Pretraining
CoCa: Contrastive Captioners
PaLI: Pathways Language-Image Model
Flamingo: Visual-Language Models
FLAVA: Foundational Language and Vision Alignment
▶
Image Generation
Score Matching
Non-Contrastive Estimation
GANs: Generative Adversarial Networks
DCGAN: Deep Convolutional GANs
Fréchet Inception Distance (FID)
AutoEncoders
Variational Autoencoders (VAEs)
Adversarial Autoencoder
VAE-GAN
NICE: Non-linear Independent Components Estimation
Real NVP
Pixel RNNs
StackGAN
Progressive GANs
StyleGAN
Semantic Image Synthesis with Spatially-Adaptive Normalization
Large Scale GAN for High Fidelity Natural Image Synthesis
Self-Attention GANs
Pix2Pix
CycleGAN
UNIT-GAN
Multimodal UNIT-GAN
Beta-VAE
Isolating Sources of Disentanglement in VAEs
IcGAN
Super-Resolution GAN
3D Object Generation using GANs
Patch-Based Image Inpainting with GANs
Generating Videos with Scene Dynamics
The Pose Knows
Everybody Dance Now
▶
Zero-Shot and Few-Shot Learning
A Close Look at Few-Shot Classification
Generalizing from a Few Examples: A Survey on Few-Shot Learning
Meta-Learning
Matching Networks for One-Shot Learning
Feature Generating Networks for Zero-Shot Learning
▶
Adversarial Robustness
Adversarial Examples for Semantic Segmentation and Object Detection
Fast Gradient Sign Method (FGSM)
Projected Gradient Descent (PGD)
DeepFool
Carlini & Wagner (C&W) Attack
Jacobian-based Saliency Map Attack
Spatially Transformed Adversarial Examples
Functional Adversarial Attacks
Zeroth-Order Optimization
Simple Black-Box Adversarial Perturbations
Mitigating Adversarial Effects Through Randomization
Towards Robust Neural Networks via Random Self-Ensemble
Defense GAN
Distillation as a Defense to Adversarial Perturbations
Adversarial Logit Pairing
Theoretically Principled Trade-off Between Robustness and Accuracy
Benchmarking Neural Network Robustness
▶
Self-Supervised Learning
Context Encoders
Unsupervised Learning by Solving Jigsaw Puzzles
Unsupervised Learning by Predicting Image Rotations
Colorful Image Colorization
Momentum Contrast (MoCo)
SimCLR
Improved Baselines with Momentum Contrastive Learning
Bootstrap Your Own Latent (BYOL)
▶
Pruning and Model Compression
A Survey of Model Compression and Acceleration
Deep Compression
Knowledge Distillation
The Lottery Ticket Hypothesis
One Ticket to Win Them All
Drawing Early-Bird Tickets
Select a paper from the sidebar