A real-world robotics challenge with a $180K prize pool, where innovation and industry impact collide. We’re standing at an inflection point in robotics: electronics assembly, especially dexterous manipulation remains one of the biggest open problems in industry today. Tasks like handling flexible cables or inserting connectors during electronics assembly, are still exceedingly hard for robots…
Picture an industrial robot that doesn’t wait for button presses or predefined programs, but instead reacts instantly to your presence. As you move, the robot’s tool gently adjusts its position, tracking your face in real time and responding with smooth, deliberate motion. This kind of interaction, where vision directly drives robotic behavior, highlights how computer…
Simultaneous Localization & Mapping (SLAM) is one of the most active and contentious areas of CV & robotics. Should you use purely visual SLAM? Do you need LiDAR? What about indoor .vs. outdoor use cases? We’ll cover all these and more with OpenCV community member Ali Pahlevani of SLAMbotics in the final episode of this…
This year the Low-Power Computer Vision Challenge (LPCV) has three tracks with serious prize money including Image-to-Text Retrieval, Action Recognition in Video and AI Generated Images Detection. Each track has over $10,000 in prizes up for grabs, and is open for participation! On this week’s episode we welcome back the LPCV organizers to give us…
Imagine a robot rolling through a building, a car driving through city streets, or a drone flying over a campus. Hours later, it reaches a familiar-looking spot and silently asks a crucial question: “Have I been here before?” This deceptively simple question is at the heart of Visual Place Recognition (VPR). Visual Place Recognition is…
Counting overlapping or touching objects in images is a common challenge in computer vision. Simple thresholding and contour detection often fail when objects are in contact, treating multiple items as a single blob. The Watershed algorithm provides a solution to this problem by treating the image as a topographic surface and “flooding” it to separate…
Imagine capturing the perfect landscape photo on a sunny day, only to find harsh shadows obscuring key details and distorting colors. Similarly, in computer vision projects, shadows can interfere with object detection algorithms, leading to inaccurate results. Shadows are a common nuisance in image processing, introducing uneven illumination that compromises both aesthetic quality and functional…
Imagine uploading an image of a document into your browser and watching it automatically detect page boundaries, correct perspective distortion, extract searchable text, and generate a clean, professional PDF, all without transmitting a single byte to a remote server. This isn’t science fiction; it’s the result of modern, high-performance web technologies running entirely on the…
If you’ve ever used OpenCV to process live video from webcams, IP cameras, or recorded streams, you know the pattern: a loop pulling frames and a growing chain of image-processing calls. It works, but it often feels like assembling IKEA furniture without the right tools, doable, yet increasingly inefficient as complexity grows. What if you…
EgoX introduces a novel framework for translating third-person (exocentric) videos into realistic first-person (egocentric) videos using only a single input video. The work tackles a highly challenging problem of extreme viewpoint transformation with minimal view overlap, leveraging pretrained video diffusion models and explicit geometric reasoning to generate coherent, high-fidelity egocentric videos. Key Highlights Single Exocentric…