What is med-DECILE

Data Efficient Learning with Less Data
State of the art AI and Deep Learning are very data hungry. This comes at significant cost including larger resource costs (multiple expensive GPUs and cloud costs), training times (often times multiple days), and human labeling costs and time. Med-Decile attempts to solve this by answering the following question. Can we train state of the art deep models with only a sample (say 5 to 10\%) of massive datasets, while having neglibible impact in accuracy? Can we do this while reducing training time/cost by an order of magnitude, and/or significantly reducing the amount of labeled data required?

Need for med-DECILE

Staggering Training Costs of Deep Learning

Labeling Large Datasets is Expensive

Noise and Imbalance in Data

Human Consumption and Data Overload

Modules

Feature 1 placeholder image

Reduce end to end training time from days to hours and hours to minutes using coresets and data selection. CORDS implements a number of state of the art data subset selection algorithms and coreset algorithms. Some of the algorithms currently implemented with CORDS include: GLISTER, GradMatchOMP, GradMatchFixed, CRAIG, SubmodularSelection, RandomSelection etc

Feature 2 placeholder image

DISTIL is a library that features many state-of-the-art active learning algorithms. Implemented in PyTorch, it gives fast and efficient implementations of these active learning algorithms. It allows users to modularly insert active learning selection into their pre-existing training loops with minimal change. Most importantly, it features promising results in achieving high model performance with less amount of labeled data. If you are looking to cut down on labeling costs, DISTIL should be your go-to for getting the most out of your data.

Feature 3 placeholder image

Summarize massive datasets using submodular optimization

SPEAR is a python library that reduce data labeling efforts using data programming. It implements several recent approaches such as Snorkel, ImplyLoss, Learning to reweight, etc. In addition to data labeling, it integrates semi-supervised approaches for training and inference.

lorem-ipsum

Targeted subset selection

Team

Person 1

Ganesh Ramakrishnan

Institute Chair Professor, Dept of CSE, IIT Bombay

Person 2

Kshitij Jadhav

Assistant Professor,

Koita Centre for Digital Health

Person 3

Pankaj Singh

Director,

Aify Innovation Labs

Person 4

Raghavendran L.

Health Information Manager,Koita Centre for Digital Health

Researchers and Code contributors

Research Publications

CORDS

Submodularity in data subset selection and active learning

Kai Wei, Rishabh Iyer, Jeff Bilmes

International Conference on Machine Learning (ICML) 2015

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan

7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 Hawaii, USA

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer

35th AAAI Conference on Artificial Intelligence, AAAI 2021

Fast multi-stage submodular maximization

Kai Wei, Rishabh K. Iyer, Jeff A. Bilmes

International Conference on Machine Learning (ICML 2014)

Submodular subset selection for large-scale speech training data

Wei, Kai, et al

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2014

Coresets for Data-efficient Training of Machine Learning Models

Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec

International Conference on Machine Learning (ICML), July 2020

Coresets for Robust Training of Deep Neural Networks against Noisy Labels

Baharan Mirzasoleiman, Kaidi Cao, Jure Leskovec

InProc. Advances in Neural Information Processing Systems (NeurIPS), 2020

DISTIL

Submodularity in data subset selection and active learning

Kai Wei, Rishabh Iyer, Jeff Bilmes

International Conference on Machine Learning (ICML) 2015

Deep batch active learning by diverse, uncertain gradient lower bounds.

Ash, Jordan T., et al.

8th International Conference on Learning Representations (ICLR), 2020

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer

In Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI 2021

An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments

Ashish Kulkarni, Narasimha Raju Uppalapati, Pankaj Singh, Ganesh Ramakrishnan

In Proceedings of the 32th AAAI Conference on Artificial Intelligence, AAAI 2018

Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks

Vishal Kaushal, Rishabh Iyer, Anurag Sahoo, Khoshrav Doctor, Narasimha Raju, Ganesh Ramakrishnan

In Proceedings of The 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA

A New Active Labeling Method for Deep Learning

Dan Wang, Yi Shang

International Joint Conference on Neural Networks (IJCNN), 2014

Deep Bayesian Active Learning with Image Data

Yarin Gal, Riashat Islam, Zoubin Ghahramani

34th International Conference on Machine Learning(ICML), 2017

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Ozan Sener, Silvio Savarese

6th International Conference on Learning Representations (ICLR), 2018

Adversarial Active Learning for Deep Networks: a Margin Based Approach

Melanie Ducoffe, Frederic Precioso

arXiv, 2018.

SUBMODLIB

A Framework towards Domain Specific Video Summarization

Vishal Kaushal, Sandeep Subramanian, Suraj Kothawade, Rishabh Iyer, Ganesh Ramakrishnan

In Proceedings of The 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.

Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance

Vishal Kaushal, Rishabh Iyer, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramakrishnan

n Proceedings of The 7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, Hawaii, USA.

Synthesis of Programs from Multimodal Datasets

Shantanu Thakoor, Simoni Shah, Ganesh Ramakrishnan, Amitabha Sanyal

In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA.

Beyond clustering: Sub-DAG Discovery for Categorising Documents

Ramakrishna Bairi, Mark Carman and Ganesh Ramakrishnan

In Proceedings of the 25th International Conference on Information and Knowledge Management (CIKM 2016), Indianapolis, USA

Building Compact Lexicons for Cross-Domain SMT by mining near-optimal Pattern Sets

Pankaj Singh, Ashish Kulkarni, Himanshu Ojha, Vishwajeet Kumar, Ganesh Ramakrishnan,

In Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2016.

SPEAR

Data Programming using Continuous and Quality-Guided Labeling Function

Oishik Chatterjee, Ganesh Ramakrishnan, Sunita Sarawagi

In Proceedings of The Thirty-Fourth AAAI Conferenceon Artificial Intelligence (AAAI 2020), New York, USA.

An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments

Ashish Kulkarni, Narasimha Raju Uppalapati, Pankaj Singh, Ganesh Ramakrishnan

In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA.

Synthesis of Programs from Multimodal Datasets

Shantanu Thakoor, Simoni Shah, Ganesh Ramakrishnan, Amitabha Sanyal

In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA.

Comparison between Explicit Learning and Implicit Modeling of Relational Features in Structured Output Spaces

Ajay Nagesh, Naveen Nair and Ganesh Ramakrishnan

In Proceedings of the 23rd International Conference on Inductive Logic Programming (ILP), 2013, Rio De Janerio, Brazil.

Towards Efficient Named-Entity Rule Induction for Customizability

Ajay Nagesh, Ganesh Ramakrishnan, Laura Chiticariu, Rajasekar Krishnamurthy, Ankush Dharkar, Pushpak Bhattacharyya

In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012, Jeju, Korea.

Rule Ensemble Learning Using Hierarchical Kernels in Structured Output Spaces

Naveen Nair, Amrita Saha, Ganesh Ramakrishnan, Shonali Krishnaswamy

In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI), 2012, Toronto, Canada.

What Kinds of Relational Features are Useful for Statistical Learning?

Amrita Saha, Ashwin Srinivasan, Ganesh Ramakrishnan

In Proceedings of the 22nd International Conference on Inductive Logic Programming (ILP), 2012, Dubrovnik

Probing the Space of Optimal Markov Logic Networks for Sequence Labeling

Naveen Nair, Ajay Nagesh, Ganesh Ramakrishnan

In Proceedings of the 22rd International Conference on Inductive Logic Programming (ILP), 2012

Efficient Rule Ensemble Learning using Hierarchical Kernels

Pratik Jawanpuria, Saketha Nath and Ganesh Ramakrishnan

In Proceedings of the 28th International Conference on Machine Learning, 2011

Pruning Search Space for Weighted First Order Horn Clause Satisfiability

Naveen Nair, Chander Jayaraman, Kiran TVS and Ganesh Ramakrishnan

In Proceedings of the 20rd International Conference on Inductive Logic Programming (ILP), Florence, Italy

BET : An Inductive Logic Programming Workbench

Srihari Kalgi, Chirag Gosar, Prasad Gawde, Ganesh Ramakrishnan, Chander Iyer, Kiran T V S, Kekin Gada and Ashwin Srinivasan

In Proceedings of the 20rd International Conference on Inductive Logic Programming (ILP), Florence, Italy

Parameter Screening and Optimisation for ILP using Designed Experiments

Ashwin Srinivasan, Ganesh Ramakrishnan

In the Journal of Machine Learning Research 11 (2010) 3481-3516

An Investigation into Feature Construction to Assist Word Sense Disambiguation

Lucia Specia, Ashwin Srinivasan, Ganesh Ramakrishnan, Sachindra Joshi and Maria das Gracas Volpe Nunes

In Machine Learning 76(1): 109-136 (2009)

Feature Construction using Theory-Guided Sampling and Randomised Search

Sachindra Joshi, Ganesh Ramakrishnan, and Ashwin Srinivasan

In Proceedings of the 18th International Conference on Inductive Logic Programming (ILP 2008), Prague, Czech Republic, September 10-12, 2008