| Time |
Topic |
| 9:00-9:50 |
Registration, Poster Setup & Breakfast |
| 9:50-10:00 |
Opening Remarks |
| 10:00-11:30 |
Oral Session I |
| [10:00] |
Not All Birds Look The Same: Identity-Preserving Generation For Birds, Aaron Sun, UMass Amherst |
| [10:15] |
Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos, Xavier Thomas, Boston University |
| [10:30] |
CObL: Toward Zero-Shot Ordinal Layering without User Prompting, Aneel Damaraju, Harvard University |
| [10:45] |
Consensus-Driven Active Model Selection, Justin Kay, MIT |
| [11:00] |
Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs, Hanhui Wang, Northeastern University |
| [11:15] |
Structured Light with a Million Light Planes per Second, Dhawal Sirikonda, Dartmouth College |
|
| 11:30-12:30 |
Coffee & Poster Session I |
| [1] | Tell the Story, Not the Frames: Narrative-Aware Retrieval for Audio Description, Seung Hyun Hahm, Dartmouth College |
| [2] | Relational Representation Learning, Ian Hajra, Brown University |
| [3] | Progressive Stereo Edge Correspondences and Refinement, Chiang-Heng Chien, Brown University |
| [4] | Learning and Stabilizing Isometries for Robust Vision, Javid Lakha, Harvard University |
| [5] | stable-worldmodel: An Ecosystem For World Model Research, Lucas Maes, Mila |
| [6] | Augmented Reality Active Area Labels for Dynamic Scenes, Lana Yang-Maccini, Brown University |
| [7] | Exploring Texture Guidance in Diffusion Models, Eric Yee, MIT |
| [8] | PackUV: Packed Gaussian UV Maps for 4D Volumetric Video, Aashish Rai, Brown University |
| [9] | Compositional Targeted Multi-Label Universal Perturbations, Hassan Mahmood, Northeastern University |
| [10] | Blind to Shape, Bound to Semantics: A VLM’s Dilemma, Zachary Meurer, Boston University |
| [11] | Curvature Tuning: Provable Training-free Model Steering From a Single Parameter, Leyang Hu, Brown University |
| [12] | FLIGHT: Fibonacci Lattice-based Inference for Geometric Heading in real-Time, Dave Dirnfeld, UMass Amherst |
| [13] | ID-Sim: An Identity-Focused Perceptual Similarity Metric, Nayoung Chae, MIT |
| [14] | Enhancing Autonomous Navigation by Imaging Hidden Objects using Single-Photon LiDAR, Nevindu Batagoda, Dartmouth College |
| [15] | Audio Geolocation: An Investigation with Natural Sounds, Wuao Liu, UMass Amherst |
| [16] | Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs, Tianle Chen, Boston University |
| [17] | Do VLMs see texture like humans and CNNs? Evidence from slant-from-texture, Qian Zhang, Brown University |
| [18] | A Monte Carlo Rendering Framework for Simulating Optical Heterodyne Detection, Juhyeon Kim, Dartmouth College |
| [19] | LVT: Large-Scale Scene Reconstruction via Local View Transformers, Tooba Imtiaz, Northeastern University |
| [20] | LayerCraft-Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration, Yuyao Zhang, Dartmouth College |
| [21] | Iris: Integrating Language into Diffusion-based Monocular Depth Estimation, Ziyao Zeng, Yale University |
| [22] | Underwater Optical Backscatter Communication using Acousto-Optic Beam Steering, Dhawal Sirikonda, Dartmouth College |
| [23] | BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models, Shengao Wang, Boston University |
| [24] | DevCV Toolbox: Toward Developmentally Grounded Benchmarking of Vision Foundation Models, Max Whitton, Boston University |
| [25] | PRISM: Controllable Diffusion for Compound Image Restoration with Scientific Fidelity, Rupa Kurinchi-Vendhan, MIT |
| [26] | Words That Make Language Models Perceive, Sophie Wang, MIT |
| [27] | Active Measurement: Efficient Estimation at Scale, Max Hamilton, UMass Amherst |
| [28] | Spatially-Varying Autofocus, Yingsi Qin, Carnegie Mellon University |
| [29] | VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning, Lingxiao Li, Boston University |
| [30] | Residual Primitive Fitting of 3D Shapes with SuperFrusta, Aditya Ganeshan, Brown University |
| [31] | CHAIR : An interpretable pipeline for AI-expert collaboration on elephant Re-identification, Antoine Salaun, MIT |
| [32] | The LLM Bottleneck: Why Open-Source Vision LLMs Struggle with Hierarchical Visual Recognition, Yuwen Tan, Boston University |
|
| 12:30-2:00 |
Lunch |
| 2:00-3:00 |
Coffee & Poster Session II |
| [1] | Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos, Xavier Thomas, Boston University |
| [2] | Looking at the Sky, Shrenik Borad, George Washington University |
| [3] | Not All Birds Look The Same: Identity-Preserving Generation For Birds, Aaron Sun, UMass Amherst |
| [4] | Super-Resolution with Structured Motion, Gabby Litterio, Brown University |
| [5] | PLLM: Pseudo-Labeling Large Language Models for CAD Program Synthesis, Yuanbo Li, Brown University |
| [6] | Unsafe2Safe: Controllable Image Anonymization for Downstream Utility, Minh Dinh, Dartmouth College |
| [7] | Exploring Efficient and Practical Unified Unified Multimodal Model, Xu Ma, Northeastern University |
| [8] | Scale-DiT: Ultra-High-Resolution Image Generation with Hierarchical Local Attention, Yuyao Zhang, Dartmouth College |
| [9] | HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models, Yiwen Chen, Northeastern University |
| [10] | Outlier-Aware Post-Training Quantization for Image Super-Resolution, Hailing Wang, Northeastern University |
| [11] | Does learning about time improve out-of-distribution generalization in object detection?, Kai Van Brunt, MIT |
| [12] | Vision Masked Image Modeling Transfers Across Domains, Pranav Sankar, Brown University |
| [13] | Can LVLMs Harness Visual Contexts to Untangle Ambiguity In Language?, Heejeong Nam, Brown University |
| [14] | SNAP: Towards Segmenting Anything in Any Point Cloud, Aniket Gupta, Northeastern University |
| [15] | Trace Anything: Representing Any Video in 4D via Trajectory Fields, Xinhang Liu, Dartmouth College |
| [16] | LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction, Tianye Ding, Northeastern University |
| [17] | RealBirdID: Benchmarking Bird Species Identification in the Era of MLLMs, Logan Lawrence, UMass Amherst |
| [18] | Combining Translation with Magnification to Resolve Ambiguity in Super-Resolution, Daniel Fu, Brown University |
| [19] | DIET-CP: Lightweight and Data Efficient Self Supervised Continued Pretraining, Jakob Ambsdorf, Brown University |
| [20] | Potion Brewing Laboratory: An Environment for Continual Learning in World Models, Taj Gillin, Brown University |
| [21] | CObL: Toward Zero-Shot Ordinal Layering without User Prompting, Aneel Damaraju, Harvard University |
| [22] | Consensus-Driven Active Model Selection, Justin Kay, MIT |
| [23] | Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs, Hanhui Wang, Northeastern University |
| [24] | Attribution Robustness via Implicit Curvature Regularization, Matteo Gamba, Brown University |
| [25] | Discontinuous 2D Neural Fields without Meshing, Javid Lakha, Harvard University |
| [26] | S3: Learnable Spline-Wavelets for State Space Models, Daniel Cai, Brown University |
| [27] | SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback, Jianglin Lu, Northeastern University |
| [28] | SuperRivolution: Fine-Scale Rivers from Coarse Temporal Satellite Imagery, Rangel Daroya, UMass Amherst |
| [29] | Coffee: Controllable Diffusion Fine-tuning, Ziyao Zeng, Yale University |
| [30] | Structured Light with a Million Light Planes per Second, Dhawal Sirikonda, Dartmouth College |
| [31] | Arbitrary-Scale 3D Gaussian Super-Resolution, Huimin Zeng, Northeastern University |
| [32] | 3D Curvix: From Multiview 2D Edges to 3D Curve Segments, Chiang-Heng Chien, Brown University |
|
| 3:00-4:30 |
Oral Session II |
| [3:00] |
Tell the Story, Not the Frames: Narrative-Aware Retrieval for Audio Description, Seung Hyun Hahm, Dartmouth College |
| [3:15] |
Curvature Tuning: Provable Training-free Model Steering From a Single Parameter, Leyang Hu, Brown University |
| [3:30] |
Iris: Integrating Language into Diffusion-based Monocular Depth Estimation, Ziyao Zeng, Yale University |
| [3:45] |
BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models, Shengao Wang, Boston University |
| [4:00] |
Words That Make Language Models Perceive, Sophie Wang, MIT |
| [4:15] |
Residual Primitive Fitting of 3D Shapes with SuperFrusta, Aditya Ganeshan, Brown University |
|
| 4:30-4:45 |
Closing Remarks |