A searchable list of some of my publications is below. You can also access my publications from the following sites.
My ORCID is
Publications:
Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra
Emergence of Maps in the Memories of Blind Navigation Agents Best Paper Proceedings Article
In: Proceedings of International Conference on Learning Representations (ICLR), 2023.
Abstract | Links | BibTeX | Tags: awards, best paper award, computer vision, google, ICLR, machine learning, robotics
@inproceedings{2023-Wijmans-EMMBNA,
title = {Emergence of Maps in the Memories of Blind Navigation Agents},
author = {Erik Wijmans and Manolis Savva and Irfan Essa and Stefan Lee and Ari S. Morcos and Dhruv Batra},
url = {https://arxiv.org/abs/2301.13261
https://wijmans.xyz/publication/eom/
https://openreview.net/forum?id=lTt4KjHSsyl
https://blog.iclr.cc/2023/03/21/announcing-the-iclr-2023-outstanding-paper-award-recipients/},
doi = {10.48550/ARXIV.2301.13261},
year = {2023},
date = {2023-05-01},
urldate = {2023-05-01},
booktitle = {Proceedings of International Conference on Learning Representations (ICLR)},
abstract = {Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms. Specifically, we train 'blind' agents -- with sensing limited to only egomotion and no other sensing of any kind -- to perform PointGoal navigation ('go to Δ x, Δ y') via reinforcement learning. Our agents are composed of navigation-agnostic components (fully-connected and recurrent neural networks), and our experimental setup provides no inductive bias towards mapping. Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments (~95% success); (2) they utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent 'forgets' exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation.},
keywords = {awards, best paper award, computer vision, google, ICLR, machine learning, robotics},
pubstate = {published},
tppubtype = {inproceedings}
}
Edison Thomaz, Cheng Zhang, Irfan Essa, Gregory Abowd
Inferring Meal Eating Activities in Real World Settings from Ambient Sounds: A Feasibility Study Best Paper Proceedings Article
In: ACM Conference on Intelligence User Interfaces (IUI), 2015.
Abstract | Links | BibTeX | Tags: ACM, activity recognition, AI, awards, behavioral imaging, best paper award, computational health, IUI, machine learning
@inproceedings{2015-Thomaz-IMEARWSFASFS,
title = {Inferring Meal Eating Activities in Real World Settings from Ambient Sounds: A Feasibility Study},
author = {Edison Thomaz and Cheng Zhang and Irfan Essa and Gregory Abowd},
url = {https://dl.acm.org/doi/10.1145/2678025.2701405},
doi = {10.1145/2678025.2701405},
year = {2015},
date = {2015-05-01},
urldate = {2015-05-01},
booktitle = {ACM Conference on Intelligence User Interfaces (IUI)},
abstract = {Dietary self-monitoring has been shown to be an effective method for weight-loss, but it remains an onerous task despite recent advances in food journaling systems. Semi-automated food journaling can reduce the effort of logging, but often requires that eating activities be detected automatically. In this work we describe results from a feasibility study conducted in-the-wild where eating activities were inferred from ambient sounds captured with a wrist-mounted device; twenty participants wore the device during one day for an average of 5 hours while performing normal everyday activities. Our system was able to identify meal eating with an F-score of 79.8% in a person-dependent evaluation, and with 86.6% accuracy in a person-independent evaluation. Our approach is intended to be practical, leveraging off-the-shelf devices with audio sensing capabilities in contrast to systems for automated dietary assessment based on specialized sensors.},
keywords = {ACM, activity recognition, AI, awards, behavioral imaging, best paper award, computational health, IUI, machine learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Vinay Bettadapura, Irfan Essa, Caroline Pantofaru
Egocentric Field-of-View Localization Using First-Person Point-of-View Devices Honorable Mention Proceedings Article
In: IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE Computer Society, 2015.
Abstract | Links | BibTeX | Tags: awards, best paper award, computer vision, WACV, wearable computing
@inproceedings{2015-Bettadapura-EFLUFPD,
title = {Egocentric Field-of-View Localization Using First-Person Point-of-View Devices},
author = {Vinay Bettadapura and Irfan Essa and Caroline Pantofaru},
url = {https://ieeexplore.ieee.org/document/7045943
http://www.vbettadapura.com/egocentric/localization/},
doi = {10.1109/WACV.2015.89},
year = {2015},
date = {2015-01-01},
urldate = {2015-01-01},
booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
publisher = {IEEE Computer Society},
abstract = {We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization. We define egocentric FOV localization as capturing the visual information from a person's field-of-view in a given environment and transferring this information onto a reference corpus of images and videos of the same space, hence determining what a person is attending to. Our method matches images and video taken from the first-person perspective with the reference corpus and refines the results using the first-person's head orientation information obtained using the device sensors. We demonstrate single and multi-user egocentric FOV localization in different indoor and outdoor environments with applications in augmented reality, event understanding and studying social interactions.
},
keywords = {awards, best paper award, computer vision, WACV, wearable computing},
pubstate = {published},
tppubtype = {inproceedings}
}
Yachna Sharma, Vinay Bettadapura, Thomas Ploetz, Nils Hammerla, Sebastian Mellor, Roisin McNaney, Patrick Olivier, Sandeep Deshmukh, Andrew Mccaskie, Irfan Essa
Video Based Assessment of OSATS Using Sequential Motion Textures Best Paper Proceedings Article
In: Proceedings of Workshop on Modeling and Monitoring of Computer Assisted Interventions (M2CAI), 2014.
Abstract | Links | BibTeX | Tags: activity assessment, awards, best paper award, computer vision, medical imaging, surgical training
@inproceedings{2014-Sharma-VBAOUSMT,
title = {Video Based Assessment of OSATS Using Sequential Motion Textures},
author = {Yachna Sharma and Vinay Bettadapura and Thomas Ploetz and Nils Hammerla and Sebastian Mellor and Roisin McNaney and Patrick Olivier and Sandeep Deshmukh and Andrew Mccaskie and Irfan Essa},
url = {https://smartech.gatech.edu/bitstream/handle/1853/53651/2014-Sharma-VBAOUSMT.pdf
https://www.semanticscholar.org/paper/Video-Based-Assessment-of-OSATS-Using-Sequential-Sharma-Bettadapura/1dde770faa24d4e04306ca6fb85e76dc78876c49},
year = {2014},
date = {2014-09-01},
urldate = {2014-09-01},
booktitle = {Proceedings of Workshop on Modeling and Monitoring of Computer Assisted Interventions (M2CAI)},
abstract = {A fully automated framework for video-based surgical skill assessment is presented that incorporates the sequential and qualitative aspects of surgical motion in a data-driven manner. The Objective Structured Assessment of Technical Skills (OSATS) assessments is replicated, which provides both an overall and in-detail evaluation of basic suturing skills required for surgeons. Video analysis techniques are introduced that incorporate sequential motion aspects into motion textures. Significant performance improvement over standard bag-of-words and motion analysis approaches is demonstrated. The framework is evaluated in a case study that involved medical students with varying levels of expertise performing basic surgical tasks in a surgical training lab setting.
},
keywords = {activity assessment, awards, best paper award, computer vision, medical imaging, surgical training},
pubstate = {published},
tppubtype = {inproceedings}
}
Glenn Hartmann, Matthias Grundmann, Judy Hoffman, David Tsai, Vivek Kwatra, Omid Madani, Sudheendra Vijayanarasimhan, Irfan Essa, James Rehg, Rahul Sukthankar
Weakly Supervised Learning of Object Segmentations from Web-Scale Videos Best Paper Proceedings Article
In: Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012.
Abstract | Links | BibTeX | Tags: awards, best paper award, computer vision, ECCV, machine learning
@inproceedings{2012-Hartmann-WSLOSFWV,
title = {Weakly Supervised Learning of Object Segmentations from Web-Scale Videos},
author = {Glenn Hartmann and Matthias Grundmann and Judy Hoffman and David Tsai and Vivek Kwatra and Omid Madani and Sudheendra Vijayanarasimhan and Irfan Essa and James Rehg and Rahul Sukthankar},
url = {https://link.springer.com/chapter/10.1007/978-3-642-33863-2_20
https://research.google.com/pubs/archive/40735.pdf
},
doi = {10.1007/978-3-642-33863-2_20},
year = {2012},
date = {2012-10-01},
urldate = {2012-10-01},
booktitle = {Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media},
abstract = {We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as “dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.
},
keywords = {awards, best paper award, computer vision, ECCV, machine learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Matthias Grundmann, Vivek Kwatra, Daniel Castro, Irfan Essa
Calibration-Free Rolling Shutter Removal Best Paper Proceedings Article
In: IEEE Conference on Computational Photography (ICCP), IEEE Computer Society, 2012.
Abstract | Links | BibTeX | Tags: awards, best paper award, computational photography, computational video, computer graphics, computer vision, ICCP
@inproceedings{2012-Grundmann-CRSR,
title = {Calibration-Free Rolling Shutter Removal},
author = {Matthias Grundmann and Vivek Kwatra and Daniel Castro and Irfan Essa},
url = {http://www.cc.gatech.edu/cpl/projects/rollingshutter/
https://research.google.com/pubs/archive/37744.pdf
https://youtu.be/_Pr_fpbAok8},
doi = {10.1109/ICCPhot.2012.6215213},
year = {2012},
date = {2012-01-01},
urldate = {2012-01-01},
booktitle = {IEEE Conference on Computational Photography (ICCP)},
publisher = {IEEE Computer Society},
abstract = {We present a novel algorithm for efficient removal of rolling shutter distortions in uncalibrated streaming videos. Our proposed method is calibration free as it does not need any knowledge of the camera used, nor does it require calibration using specially recorded calibration sequences. Our algorithm can perform rolling shutter removal under varying focal lengths, as in videos from CMOS cameras equipped with an optical zoom. We evaluate our approach across a broad range of cameras and video sequences demonstrating robustness, scaleability, and repeatability. We also conducted a user study, which demonstrates preference for the output of our algorithm over other state-of-the art methods. Our algorithm is computationally efficient, easy to parallelize, and robust to challenging artifacts introduced by various cameras with differing technologies.
},
keywords = {awards, best paper award, computational photography, computational video, computer graphics, computer vision, ICCP},
pubstate = {published},
tppubtype = {inproceedings}
}
Other Publication Sites
A few more sites that aggregate research publications: Academic.edu, Bibsonomy, CiteULike, Mendeley.
Copyright/About
[Please see the Copyright Statement that may apply to the content listed here.]
This list of publications is produced by using the teachPress plugin for WordPress.