A searchable list of some of my publications is below. You can also access my publications from the following sites.
My ORCID is
Publications:
Erik Wijmans, Irfan Essa, Dhruv Batra
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget Proceedings Article
In: International Conference on Autonomous Agents and Multi-Agent Systems, 2022.
Abstract | Links | BibTeX | Tags: computer vision, embodied agents, navigation
@inproceedings{2022-Wijmans-TPNASCB,
title = {How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget},
author = {Erik Wijmans and Irfan Essa and Dhruv Batra},
url = {https://arxiv.org/abs/2012.06117
https://ifaamas.org/Proceedings/aamas2022/pdfs/p1762.pdf},
doi = {10.48550/arXiv.2012.06117},
year = {2022},
date = {2022-12-01},
urldate = {2020-12-01},
booktitle = {International Conference on Autonomous Agents and Multi-Agent Systems},
journal = {arXiv},
number = {arXiv:2012.06117},
abstract = {PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge. In this paper, we study PointGoal navigation under both a sample budget (75 million frames) and a compute budget (1 GPU for 1 day). We conduct an extensive set of experiments, cumulatively totaling over 50,000 GPU-hours, that let us identify and discuss a number of ostensibly minor but significant design choices -- the advantage estimation procedure (a key component in training), visual encoder architecture, and a seemingly minor hyper-parameter change. Overall, these design choices to lead considerable and consistent improvements over the baselines present in Savva et al. Under a sample budget, performance for RGB-D agents improves 8 SPL on Gibson (14% relative improvement) and 20 SPL on Matterport3D (38% relative improvement). Under a compute budget, performance for RGB-D agents improves by 19 SPL on Gibson (32% relative improvement) and 35 SPL on Matterport3D (220% relative improvement). We hope our findings and recommendations will make serve to make the community's experiments more efficient.},
keywords = {computer vision, embodied agents, navigation},
pubstate = {published},
tppubtype = {inproceedings}
}
Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra
Decentralized Distributed PPO: Solving PointGoal Navigation Proceedings Article
In: Proceedings of International Conference on Learning Representations (ICLR), 2020.
Abstract | Links | BibTeX | Tags: embodied agents, ICLR, navigation, systems for ML
@inproceedings{2020-Wijmans-DDSPN,
title = {Decentralized Distributed PPO: Solving PointGoal Navigation},
author = {Erik Wijmans and Abhishek Kadian and Ari Morcos and Stefan Lee and Irfan Essa and Devi Parikh and Manolis Savva and Dhruv Batra},
url = {https://arxiv.org/abs/1911.00357
https://paperswithcode.com/paper/decentralized-distributed-ppo-solving},
year = {2020},
date = {2020-04-01},
urldate = {2020-04-01},
booktitle = {Proceedings of International Conference on Learning Representations (ICLR)},
abstract = {We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling -- achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.
This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the task --near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks -- the analog of ImageNet pre-training + task-specific fine-tuning for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available).},
keywords = {embodied agents, ICLR, navigation, systems for ML},
pubstate = {published},
tppubtype = {inproceedings}
}
This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the task --near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks -- the analog of ImageNet pre-training + task-specific fine-tuning for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available).
Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos
Analyzing Visual Representations in Embodied Navigation Tasks Technical Report
no. arXiv:2003.05993, 2020.
Abstract | Links | BibTeX | Tags: arXiv, embodied agents, navigation
@techreport{2020-Wijmans-AVRENT,
title = {Analyzing Visual Representations in Embodied Navigation Tasks},
author = {Erik Wijmans and Julian Straub and Dhruv Batra and Irfan Essa and Judy Hoffman and Ari Morcos},
url = {https://arxiv.org/abs/2003.05993
https://arxiv.org/pdf/2003.05993},
doi = {10.48550/arXiv.2003.05993},
year = {2020},
date = {2020-03-01},
urldate = {2020-03-01},
journal = {arXiv},
number = {arXiv:2003.05993},
abstract = {Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task. In this work, we present a methodology to study the underlying potential causes for this specialization. We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of visual representations learned in the same environment by performing different tasks.
We then leverage our proposed methodology to examine the task dependence of visual representations learned on related but distinct embodied navigation tasks. Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures. We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.},
howpublished = {arXiv:2003.05993},
keywords = {arXiv, embodied agents, navigation},
pubstate = {published},
tppubtype = {techreport}
}
We then leverage our proposed methodology to examine the task dependence of visual representations learned on related but distinct embodied navigation tasks. Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures. We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.
Kihwan Kim, Jay Summet, Thad Starner, Dan Ashbrook, M. Kapade, Irfan Essa
Localization and 3D Reconstruction of Urban Scenes Using GPS Proceedings Article
In: Proceedings of IEEE International Symposium on Wearable Computers (ISWC), pp. 11–14, IEEE Computer Society, 2008.
BibTeX | Tags: IMWUT, navigation, wearable computing
@inproceedings{2008-Kim-LRUSU,
title = {Localization and 3D Reconstruction of Urban Scenes Using GPS},
author = {Kihwan Kim and Jay Summet and Thad Starner and Dan Ashbrook and M. Kapade and Irfan Essa},
year = {2008},
date = {2008-09-01},
urldate = {2008-09-01},
booktitle = {Proceedings of IEEE International Symposium on Wearable Computers (ISWC)},
pages = {11--14},
publisher = {IEEE Computer Society},
keywords = {IMWUT, navigation, wearable computing},
pubstate = {published},
tppubtype = {inproceedings}
}
Other Publication Sites
A few more sites that aggregate research publications: Academic.edu, Bibsonomy, CiteULike, Mendeley.
Copyright/About
[Please see the Copyright Statement that may apply to the content listed here.]
This list of publications is produced by using the teachPress plugin for WordPress.