NAVER LABS Dominates Computer Vision Conference with Cutting-Edge Spatial Intelligence Technology l NAVER Corp.

NAVER LABS Dominates Computer Vision Conference with Cutting-Edge Spatial Intelligence Technology

- "MASt3R," an upgrade of "Dust3R" that can achieve 3D modeling with a single photo, wins first place among 12 teams in map-free localization challenge

- NAVER LABS also secures first place in the BOP Challenge, demonstrating superior speed and accuracy in object pose estimation using only RGB images

- Success attributed to "CroCo," 3D vision foundation model (VFM), showcasing 8 years of accumulated spatial intelligence expertise

- NAVER CLOUD also has 11 papers accepted, demonstrating advancements in vision AI technology

October 8, 2024

NAVER LABS (CEO Seok Sang-ok) secured top positions in two categories at the 2024 European Conference on Computer Vision (ECCV), demonstrating its world-class technological capabilities in Spatial Intelligence technology.

ECCV, a biennial conference, is one of the most prestigious gatherings in the field of AI and computer vision. It brings together top minds from global tech giants and academia to showcase cutting-edge AI research in image and video processing and to shape the direction of future research. NAVER LABS' achievements at ECCV 2024 included first place in both the Map-free Visual re-localization challenge and the BOP (Benchmark for 6D Object Pose Estimation) challenge.

[Photo] NAVER LABS researchers are presenting MASt3R at ECCV 2024, which took place in Milan, Italy, from the 28th of last month.

The Map-free Visual re-localization Challenge tests how accurately positioning can be performed without relying on precise maps. Typically, Visual Localization involves a process of ‘Localization’ based on pre-generated 3D/HD maps. In this regard, Map-free Visual re-localization technology is crucial for scenarios where pre-existing maps are unavailable, such as disaster areas or construction sites.

NAVER LABS introduced MAst3R, an AI tool for 3D image reconstruction, in this challenge. MASt3R was recognized for its ability to provide sufficiently accurate localization even without precise maps, winning first place among the 12 participating teams, including Google, Apple, and Meta. MASt3R is an upgraded version of "Dust3R," developed based on "CroCo," a 3D Vison Foundation Model (VFM) created by NAVER LABS Europe.

In addition, NAVER LABS won first place in the BOP (Benchmark for 6D Object Pose Estimation) Challenge. The BOP Challenge determines how accurately a team can estimate the three-dimensional rotation and position of objects in an image. NAVER LABS’ technical model in this category was recognized as the best for most accurately determining object positions using only RGB images (The Best RGB-Only Method) and for processing them the fastest (The Best Fast Method). This marks NAVER LABS’ second consecutive win in the BOP Challenge. The model was also based on the 3D VFM CroCo.

[Photo] NAVER LABS’ Mast3R, which won first place at ECCV 2024, was created based on the 3D VFM CroCo.

Launched as a separate entity in 2017, NAVER LABS has led technological discussions in the field of spatial intelligence through continuous research. As a result of accumulating vision technology through NAVER LABS Europe, the world’s largest AI research institute acquired in 2019, NAVER LABS was ranked first in the 2019 Conference on Computer Vision and Pattern Recognition (CVPR) with its Visual Localization (VL) element technology R2D2, outperforming global IT companies. Moreover, NAVER LABS introduced the 3D VFM CroCo last year and unveiled Mast3R at this year’s CVPR, leading the technology buzz among researchers around the world.

NAVER LABS CEO Seok Sang-ok emphasized the importance of spatial intelligence technology: "Spatial intelligence, which we've been developing since our inception through advancements in robotics and autonomous driving, is not just a core competency but the very essence of our competitive edge. We will continue to focus on R&D to expand its applications in robotics, autonomous driving, digital twins, and even new global milestones such as our projects in the Middle East."

NAVER CLOUD’s AI Papers Also Accepted: 11 Papers Showcasing Advanced Vision AI Technology

Meanwhile, NAVER CLOUD also had 11 AI research papers accepted for publication at the ECCV 2024, showcasing its technology in the field of computer vision, which has been gaining more attention with the recent rise of multimodal AI. Notable papers included: A study proposing methods to build high-quality learning datasets by effectively refining large amounts of image and text data required for AI model production ^[1]; and, Research introducing methods to improve the performance of image recognition models ^[2].

These advancements in computer vision AI are expected to enhance NAVER's generative AI services. In August, NAVER unveiled visual information processing technology based on its hyper-scale generative AI, HyperCLOVA X, and incorporated it into its conversational AI service, CLOVA X. According to NAVER, HyperCLOVA X’s image comprehension capabilities are on par with top global models, reaching 99.94% of the performance of the best models across eight benchmarks including ChartQA, DocVQA, MathVista, and MMMU. The company plans to continuously improve its performance.

About DUSt3R and MASt3R

DUSt3R is an AI technology tool that can create 3D models of specific buildings or interior spaces using only one or two photos. MASt3R is an advanced version of DUSt3R that can process thousands of large-scale image data at once, quickly and accurately creating 3D models of not only building interiors but also complex urban environments.

^[1] HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

^[2] Model Stock: All we need is just a few fine-tuned models

NAVER LABS Dominates Computer Vision Conference with Cutting-Edge Spatial Intelligence Technology

Related content