 
It's common in computer vision projects for bounding boxes to jump or flicker from frame to frame, especially when objects move quickly or lighting is challenging. This is because the model detections aren't perfect or consistent and, when visualized, the human eye sees lots of jittery detections.
You can reduce jittery and flickering detection using the Supervision Detection Smoother feature. By removing the jumpy movements of detection, you'll have videos that demonstrate smooth detections. This doesn't only change visuals, this helps make your model detections more manageable to work with because the smoother blends past positions to create a cleaner, steadier bounding box.
The result? Smoother visual output, nicer-looking videos, and more reliable inputs for downstream tasks like counting, event detection, or analytics.
For this guide, we'll be making a production style video with smooth detections of a bike moving across a trail.
Let's get started!
Detection Smoothing Implementation
For this guide, I'll be using my very own bike detection model to perform detections on a video, but the process is the same for whatever video you would like to create.
Save the video you plan to detect on in a new project/folder, and add a file called main.py. Update main.py to:
import supervision as sv
import numpy as np
from roboflow import Roboflow
import cv2
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Initialize Roboflow
rf = Roboflow(api_key=os.getenv("ROBOFLOW_API_KEY"))
# Load your custom model
project = rf.workspace(os.getenv("ROBOFLOW_WORKSPACE")).project(os.getenv("ROBOFLOW_PROJECT"))
model = project.version(os.getenv("ROBOFLOW_VERSION")).model
video_info = sv.VideoInfo.from_video_path(video_path="sample.mp4")
frame_generator = sv.get_video_frames_generator(source_path="sample.mp4")
tracker = sv.ByteTrack(frame_rate=video_info.fps)
box_annotator = sv.BoxAnnotator(thickness=3) 
label_annotator = sv.LabelAnnotator(text_padding=3, text_scale=1.0, text_thickness=2)This code loads the model we created, the sample video, the annotators for the frames, as well a ByteTrack. This is quite important, as smooth detections relies on the tracking functionalities from ByteTrack from Supervision. Additionally, in the project, store all of the imported variables in a .env.
Next, update main.py to:
import supervision as sv
import numpy as np
from roboflow import Roboflow
import cv2
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Initialize Roboflow
rf = Roboflow(api_key=os.getenv("ROBOFLOW_API_KEY"))
# Load your custom model
project = rf.workspace(os.getenv("ROBOFLOW_WORKSPACE")).project(os.getenv("ROBOFLOW_PROJECT"))
model = project.version(os.getenv("ROBOFLOW_VERSION")).model
video_info = sv.VideoInfo.from_video_path(video_path="sample.mp4")
frame_generator = sv.get_video_frames_generator(source_path="sample.mp4")
tracker = sv.ByteTrack(frame_rate=video_info.fps)
smoother = sv.DetectionsSmoother()
box_annotator = sv.BoxAnnotator(thickness=3) 
label_annotator = sv.LabelAnnotator(text_padding=3, text_scale=1.0, text_thickness=2)
# Use OpenCV VideoWriter
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output_video.mp4', fourcc, video_info.fps, (video_info.width, video_info.height))
frame_count = 0
for frame in frame_generator:
    frame_count += 1
    print(f"Processing frame {frame_count}...")
    
    try:
        # Use Roboflow model for inference
        result = model.predict(frame, confidence=60, overlap=30).json()
        print(f"  Successfully processed frame {frame_count}")
        
        # Convert Roboflow predictions to supervision Detections
        if result['predictions']:
            num_detections = len(result['predictions'])
            print(f"  Found {num_detections} bike in frame {frame_count}")
            
            # Extract bounding boxes, confidence scores, and class IDs
            boxes = []
            confidences = []
            class_ids = []
            
            for prediction in result['predictions']:
                x = prediction['x']
                y = prediction['y']
                width = prediction['width']
                height = prediction['height']
                
                # Convert to xyxy format
                x1 = x - width / 2
                y1 = y - height / 2
                x2 = x + width / 2
                y2 = y + height / 2
                
                boxes.append([x1, y1, x2, y2])
                confidences.append(prediction['confidence'])
                class_ids.append(prediction['class_id'])
            
            detections = sv.Detections(
                xyxy=np.array(boxes),
                confidence=np.array(confidences),
                class_id=np.array(class_ids)
            )
        else:
            print(f"  No bike detected in frame {frame_count}")
            detections = sv.Detections.empty()
            
    except Exception as e:
        print(f"  Error processing frame {frame_count}: {str(e)}")
        detections = sv.Detections.empty()
    
    detections = tracker.update_with_detections(detections)
    detections = smoother.update_with_detections(detections)
    # Annotate with bounding boxes and labels
    annotated_frame = box_annotator.annotate(frame.copy(), detections)
    
    # Add custom labels
    if len(detections) > 0:
        for i in range(len(detections)):
            confidence = detections.confidence[i]
            label = f"bike {confidence:.1%}"
            
            # Get bounding box coordinates
            x1, y1, x2, y2 = detections.xyxy[i]
            
            # Draw custom label
            cv2.putText(
                annotated_frame,
                label,
                (int(x1), int(y1) - 10),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.8,
                (255, 255, 255), 
                2,  
                cv2.LINE_AA
            )
    out.write(annotated_frame)
out.release()
print(f"\nFinished processing {frame_count} frames!")
print("Output saved as: output_video.mp4")This code uses OpenCV to write a new video with the smooth detections. Every frame, its updating both the tracker as well as the Detections Smoother, and writing that to a new video.
Additionally, it logs the status/detections it finds every frame so a user can understand what frame the video is processing and the progress of the program (videos can take up to 10 minutes to process depending on size and complexity)
Finally, running main.py:
Conclusion
Congratulations! You can now smooth detections in your next vision project. Smoother detections are helpful for sharing demos and making your detection data less noisy.
If you have any questions about the project, you can check out the Github repository over here.
Cite this Post
Use the following entry to cite this post in your research:
Aryan Vasudevan. (Aug 6, 2025). Reduce Jittery and Flickering Detections in Computer Vision. Roboflow Blog: https://blog.roboflow.com/jittery-flickering-detections-computer-vision/