Meta AI releases Joint Embedding Predictive World Model JEPA-WMs for physical planning

ME News message on April 3 (UTC+8): Meta AI Research released JEPA-WMs, a joint embedding predictive world model for physical planning, and its related research. This study explores the key factors behind the model’s success, and provides a complete PyTorch implementation, datasets, and pretrained models. The released models include the core JEPA-WM, as well as baseline models DINO-WM and V-JEPA-2-AC(fixed), covering multiple robot manipulation and navigation environments such as DROID & RoboCasa, Metaworld, Push-T, PointMaze, and Wall. The models use visual encoders such as DINOv3 ViT-L/16, DINOv2 ViT-S/14, and V-JEPA-2 ViT-G/16, with input image resolutions primarily 224×224 or 256×256. The project also provides an optional VM2M decoder head for visualization and trajectory decoding, but emphasizes that this decoder is not necessary for training world models or for conducting planning evaluations. All resources have been made public on GitHub, Hugging Face, and arXiv. (Source: InFoQ)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin