Title: LTRON: Using LEGO for Interactive Assembly, Disassembly and Scene Understanding
Abstract: We have developed LTRON: a new learning environment for interactive scene understanding and construction tasks using LEGO bricks. Our environment provides access to over 1700 high quality fan-made reproductions of actual LEGO products, along with visual and symbolic interaction modalities that allow learning agents to assemble, disassemble and modify LEGO models. We also show work-in-progress towards completing challenging scene understanding and assembly tasks in this environment.
Bio: Aaron Walsman is a 7th-year Ph.D. student working in Robotics and Computer Vision advised by Dieter Fox and Ali Farhadi.
Title: Pushing it out of the Way: Interactive Visual Navigation
Abstract: We have observed significant progress in visual navigation for embodied agents. However, intelligent navigation may involve interacting with the environment beyond just moving forward/backward and turning left/right. Sometimes, the best way to navigate is to push something out of the way.
In this project, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals. To this end, we introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions. By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
Bio: Kuo-Hao Zeng is a 4th-year Ph.D. student in the RAIVN Lab, advised by Ali Farhadi and Roozbeh Mottaghi. His current research interests are in Learning through Interaction and utilizing Visual Reasoning for Robot Learning. Web: https://homes.cs.washington.edu/~khzeng/
Title: LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes
Abstract: Learning binary representations of instances and classes is a classical problem with several high potential applications. In this work, we propose a novel method for Learning Low-dimensional binary Codes (LLC) for instances as well as classes. Our method does not require any side-information, like annotated attributes or label meta-data, and learns extremely low-dimensional binary codes (~20 bits for ImageNet-1K). The learnt codes are super-efficient while still ensuring nearly optimal performance for image classification and retrieval in sub-linear costs. Finally, we demonstrate that the learnt codes capture intrinsically important features in the data, by discovering an intuitive taxonomy over classes.
Bio: Aditya Kusupati is a 3rd year Ph.D. student working with Ali Farhadi and Sham Kakade. He focuses on designing fundamental Machine Learning algorithms with strong empirical performance & real-world deployability. Web: http://www.adityakusupati.com/
Title: Robust fine-tuning of zero-shot models
Abstract: Large pre-trained models such as CLIP offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning approaches substantially improve accuracy in-distribution, they also reduce out-of-distribution robustness. We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT). Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements out-of-distribution, while matching or improving in-distribution accuracy. On ImageNet (in-distribution) and five derived distribution shifts, WiSE-FT improves out-of-distribution accuracy by 2 to 10 percentage points (pp) while increasing in-distribution accuracy by nearly 1 pp relative to standard fine-tuning. WiSE-FT achieves similarly large robustness improvements (2 to 15 pp) on a diverse set of six further distribution shifts, and in-distribution accuracy gains of 0.8 to 3.3 pp compared to standard fine-tuning on seven commonly used transfer learning datasets. These improvements come at no additional computational cost during fine-tuning or inference.
Bio: Mitchell Wortsman is a 3rd year Ph.D. student advised by Ali Farhadi. His research is towards understanding and building more reliable machine learning systems. Web: https://mitchellnw.github.io/
0 Comments