CV Paper Collection

▶ Classical Vision

▶ Edge Detection

Canny Edge Detection Structured Forests Crisp Boundary Detection Holistically Nested Edge Detection

▶ Corner Detection

Harris Corner Detector

▶ Keypoint Detection & Feature Extraction

SIFT SURF

▶ Image Matching

Multiscale Oriented Matching

▶ Image Segmentation

Normalized Cuts Mean Shift

▶ Line Detection

Hough Transform Hough Transform Survey

▶ Image Representation

Log Polar Representation Maximally Stable Extremal Regions Histogram of Oriented Gradients Spatial Pyramid Kernel Text Retrieval for Image Matching Bag of Keypoints Vector of Locally Aggregated Descriptors (VLAD)

▶ Feature/Descriptor Matching

Optical Flow RANSAC Scalable Recognition Selective Match Kernels Pyramid Matching Color Indexing Fast Image Retrieval via Embeddings Approximate Correspondences EarthMover's Distance Matching Spatial Pyramid Matching Hough Pyramid Matching

▶ Deep Vision

▶ Convolutional Neural Networks

LeCun CNN (1998) Backpropagation in CNNs Spatial Pyramid Pooling Spectral Representations for CNNs AlexNet (2012) ZFNet VGGNet GoogLeNet ResNets Identity Mapping in ResNets Wide Residual Networks ResNeXt Stochastic Depth Networks DenseNets MobileNet EfficientNet SENet Neural Architecture Search

▶ Methods for Explaining CNNs

One Weird Trick to Parallelize CNNs Visualizing and Understanding CNNs Deep Inside CNNs Striving for Simplicity: The All Convolutional Net Class Activation Maps (CAM) Grad-CAM Grad-CAM++ DeepLIFT Integrated Gradients SmoothGrad XRAI: Better Attributions Through Regions Local Interpretable Model-agnostic Explanations (LIME) SHAP (SHapley Additive exPlanations) DeepDream Neural Style Transfer

▶ Object Detection

OverFeat R-CNN Fast R-CNN Faster R-CNN YOLOv1 Single Shot MultiBox Detector (SSD) Feature Pyramid Networks Focal Loss for Dense Object Detection

▶ Segmentation

Fully Convolutional Networks (FCNs) SegNet U-Net Pyramid Scene Parsing Network (PSPNet) DeepLab DeepLabV3 Mask R-CNN

▶ Face Recognition

Deep Face Recognition: A Survey Siamese Networks DeepFace Deep Learning Face Representation by Joint Identification-Verification FaceNet SphereFace

▶ Human Pose and Crowds

DeepPose Human Pose Estimation with Iterative Error Feedback Efficient Object Localization Using Convolutional Networks CNN-based Density Estimation and Crowd Counting Deep People Counting in Extremely Dense Crowds Single-Image Crowd Counting via Multi-Column CNN Switching Convolutional Neural Network for Crowd Counting Crowd Counting via Scale-Adaptive CNN Scale Pyramid Network for Crowd Counting

▶ Depth Estimation

Depth Map Prediction from a Single Image GeoNet

▶ Super Resolution

Image Super-Resolution Using CNNs

▶ Anomaly Detection

Enhancing Reliability of Out-of-Distribution Detection

▶ Video Understanding

3D CNNs for Human Action Recognition Two-Stream CNNs for Action Recognition Long-term Recurrent CNNs for Video Recognition

▶ Attention Models in Vision

Neural Machine Translation with Attention Effective Attention-based Neural Machine Translation Show, Attend, and Tell: Neural Image Captioning DRAW: A Recurrent Neural Network for Image Generation Spatial Transformer Networks Attention is All You Need Vision Transformers

▶ Transformer-Based Models

How to Train Your ViT? Data-Efficient Transformers (DEiT) Swin Transformers

▶ Transformer-Based Object Detection

DETR: End-to-End Object Detection with Transformers Deformable DETR Dynamic Anchor Boxes DETR Denoising DETR Conditional DETR Cascade DETR DINO Grounding DINO

▶ Transformer-Based Image Segmentation

Segment Anything Reviving Iterative Training with Mask Guidance Grounded SAM Tag2Text Recognize Anything

▶ Vision-Language Models

CLIP: Connecting Text and Images BLIP: Bootstrapped Language-Image Pretraining BLIP-2 GLIP: Grounded Language-Image Pretraining CoCa: Contrastive Captioners PaLI: Pathways Language-Image Model Flamingo: Visual-Language Models FLAVA: Foundational Language and Vision Alignment

▶ Image Generation

Score Matching Non-Contrastive Estimation GANs: Generative Adversarial Networks DCGAN: Deep Convolutional GANs Fréchet Inception Distance (FID) AutoEncoders Variational Autoencoders (VAEs) Adversarial Autoencoder VAE-GAN NICE: Non-linear Independent Components Estimation Real NVP Pixel RNNs StackGAN Progressive GANs StyleGAN Semantic Image Synthesis with Spatially-Adaptive Normalization Large Scale GAN for High Fidelity Natural Image Synthesis Self-Attention GANs Pix2Pix CycleGAN UNIT-GAN Multimodal UNIT-GAN Beta-VAE Isolating Sources of Disentanglement in VAEs IcGAN Super-Resolution GAN 3D Object Generation using GANs Patch-Based Image Inpainting with GANs Generating Videos with Scene Dynamics The Pose Knows Everybody Dance Now

▶ Zero-Shot and Few-Shot Learning

A Close Look at Few-Shot Classification Generalizing from a Few Examples: A Survey on Few-Shot Learning Meta-Learning Matching Networks for One-Shot Learning Feature Generating Networks for Zero-Shot Learning

▶ Adversarial Robustness

Adversarial Examples for Semantic Segmentation and Object Detection Fast Gradient Sign Method (FGSM) Projected Gradient Descent (PGD) DeepFool Carlini & Wagner (C&W) Attack Jacobian-based Saliency Map Attack Spatially Transformed Adversarial Examples Functional Adversarial Attacks Zeroth-Order Optimization Simple Black-Box Adversarial Perturbations Mitigating Adversarial Effects Through Randomization Towards Robust Neural Networks via Random Self-Ensemble Defense GAN Distillation as a Defense to Adversarial Perturbations Adversarial Logit Pairing Theoretically Principled Trade-off Between Robustness and Accuracy Benchmarking Neural Network Robustness

▶ Self-Supervised Learning

Context Encoders Unsupervised Learning by Solving Jigsaw Puzzles Unsupervised Learning by Predicting Image Rotations Colorful Image Colorization Momentum Contrast (MoCo) SimCLR Improved Baselines with Momentum Contrastive Learning Bootstrap Your Own Latent (BYOL)

▶ Pruning and Model Compression

A Survey of Model Compression and Acceleration Deep Compression Knowledge Distillation The Lottery Ticket Hypothesis One Ticket to Win Them All Drawing Early-Bird Tickets

Select a paper from the sidebar