Seeing the City: Real-Time Traffic Insights at Scale

Author: Denis Avetisyan

A new edge-cloud framework unlocks real-time analysis of thousands of video streams, providing unprecedented visibility into urban traffic patterns.

A scalable AIITS testbed architecture has been developed to process and analyze live traffic camera feeds from Bengaluru, enabling comprehensive traffic monitoring and intelligent transportation system applications.

This review details a scalable system leveraging edge computing, graph neural networks, and federated learning to forecast traffic conditions in complex city-scale camera networks like those found in Bengaluru.

Processing the immense data streams from city-scale camera networks presents a significant challenge for real-time traffic analytics due to limitations in latency, bandwidth, and compute resources. This paper, ‘Scaling Real-Time Traffic Analytics on Edge-Cloud Fabrics for City-Scale Camera Networks’, introduces a scalable AI-driven Intelligent Transportation System leveraging an edge-cloud fabric to address these constraints. By transforming multi-camera feeds into dynamic traffic graphs and employing Spatio-Temporal Graph Neural Networks, we demonstrate the ability to process thousands of frames per second and accurately forecast traffic conditions. Will this approach pave the way for truly responsive and adaptive urban traffic management systems capable of handling the demands of future smart cities?

The Urban Mobility Challenge: A System Under Strain

The relentless march of urbanization is fundamentally reshaping the daily experience for billions, and nowhere is this more acutely felt than in increasingly congested city streets. As populations concentrate in urban centers, the sheer volume of vehicles strains existing infrastructure beyond its capacity, resulting in lost time, heightened stress, and diminished air quality. This isn’t merely an inconvenience; traffic congestion represents a significant drag on economic productivity, impeding the movement of goods and services and increasing operational costs for businesses. Studies consistently demonstrate a direct correlation between traffic delays and reduced gross domestic product, highlighting the urgent need for effective solutions to mitigate the negative impacts of urban sprawl and ensure sustainable mobility for a growing global population.

Conventional traffic management approaches, designed for predictable commuter flows, are increasingly overwhelmed by the dynamism of modern urban environments. These systems typically rely on static scheduling and limited sensor data, proving inadequate when confronted with the sheer volume of vehicles, the unpredictability of incidents, and the growing prevalence of ride-sharing and delivery services. The inherent limitations in processing capacity and analytical capabilities hinder their ability to respond effectively to real-time fluctuations, leading to cascading delays and reduced road network efficiency. Consequently, cities experience not only increased congestion but also heightened pollution levels and economic losses as commuters and goods are impeded by inefficient traffic flow. The escalating complexity demands a paradigm shift toward intelligent systems capable of adaptive control and predictive analysis.

Bengaluru’s sprawling urban landscape has become a stark illustration of the challenges facing rapidly growing cities worldwide. The sheer volume of vehicles navigating its roadways routinely pushes traditional traffic management systems to their limits, resulting in significant delays and economic losses. To address this complexity, researchers have developed a real-time analysis platform capable of concurrently processing data from over 1000 video streams. This innovative system doesn’t simply monitor traffic; it actively interprets patterns, identifies congestion points, and facilitates a more dynamic and responsive approach to urban mobility – a capability recently showcased in a live demonstration, suggesting a path toward alleviating gridlock and improving quality of life for Bengaluru’s residents and potentially other megacities facing similar pressures.

A deep neural network pipeline successfully detects and counts vehicles within complex traffic scenes captured in Bengaluru.

Decentralized Intelligence: The Rise of Edge Computing

Edge computing in traffic analysis implements a distributed computing architecture where data processing is performed on devices located in close proximity to traffic cameras, rather than relying solely on centralized cloud infrastructure. This decentralization minimizes data transmission distances, shifting computational tasks – such as object detection, classification, and tracking – to the “edge” of the network. By processing video streams directly at the source, the system reduces the volume of data that needs to be transferred, enabling faster response times and more efficient bandwidth utilization compared to traditional centralized models.

The implementation of localized data processing via edge computing significantly minimizes both latency and bandwidth demands for real-time traffic analysis. By processing video streams directly at the source – traffic cameras – the need to transmit large volumes of data to a centralized server is reduced. This architecture enables the processing of over 2000 frames per second using the current edge cluster configuration, facilitating immediate detection of incidents and enabling rapid responses. Reduced latency is critical for applications such as adaptive traffic signal control and autonomous vehicle support, while decreased bandwidth usage lowers transmission costs and network congestion.

The deployment of edge-based analytics is facilitated by utilizing readily available and cost-effective hardware such as Jetson Accelerators and Raspberry Pi devices. Specifically, the Jetson Orin AGX offers a processing capability of approximately 200 frames per second (FPS), allowing for substantial localized data analysis. This distributed approach enables scalable deployments, as multiple units can be networked to handle increased data streams and larger geographical areas. The comparatively low cost and energy consumption of these devices, relative to traditional server infrastructure, contribute to a significantly reduced total cost of ownership for real-time traffic analysis systems.

The performance of Raspberry Pis decreases as the number of hosted <span class="katex-eq" data-katex-display="false">RTSP</span> streams increases. — The performance of Raspberry Pis decreases as the number of hosted $RTSP$ streams increases.

The Analytical Engine: DeepStream and YOLO in Concert

The Nvidia DeepStream SDK is a streaming analytics toolkit designed to simplify the development and deployment of intelligent video analytics (IVA) applications. It provides a production-ready pipeline for processing video streams, leveraging the computational capabilities of Nvidia GPUs. DeepStream integrates elements such as GStreamer, a multimedia framework, with deep learning inference engines, enabling developers to ingest, decode, process, understand, and act on real-time video data. The SDK includes pre-built components and APIs for tasks including video decoding, pre-processing, inference, post-processing, and streaming, reducing development time and complexity. It supports multiple inference engines, including TensorRT, and is optimized for Nvidia hardware, allowing for scalable and efficient IVA solutions.

The YOLO (You Only Look Once) object detection model functions as a core component within the Nvidia DeepStream SDK for identifying vehicles in video streams. This model employs a single convolutional neural network to simultaneously predict bounding boxes and class probabilities, enabling real-time processing speeds. By analyzing entire frames at once, rather than processing regions individually, YOLO minimizes computational overhead and maximizes throughput. The architecture prioritizes speed without significant compromise to accuracy, making it suitable for deployment on embedded systems and edge devices, and allowing for efficient video analytics applications focused on vehicle detection and tracking.

The YOLO object detection model, utilized within the DeepStream SDK, achieves enhanced performance in Indian traffic scenarios through specialized training data. While initially trained on the general COCO dataset, accuracy is significantly improved by supplementing this with the large-scale UVH-26 dataset, which is specifically curated for the complexities of Indian roadways. This combined training methodology yields an accuracy improvement ranging from 8.4% to 31.5% compared to models trained solely on the COCO dataset, demonstrating the critical impact of localized data for effective object detection in diverse environments.

Persistent Understanding: BoT-SORT for Robust Tracking

The BoT-SORT algorithm addresses a critical need in automated video analysis: reliable vehicle tracking. Building upon established object detection techniques, it doesn’t simply identify vehicles in each frame, but actively maintains their identities as they move through a video sequence. This is achieved through a sophisticated system that associates detections across frames, even when faced with common challenges like occlusion – where vehicles are temporarily hidden – or changes in lighting and perspective. By robustly linking detections over time, BoT-SORT provides a continuous record of each vehicle’s trajectory, enabling more accurate and meaningful analysis of traffic patterns and events than would be possible with isolated detections alone. This persistence of identity is crucial for applications ranging from intelligent transportation systems to autonomous driving, where understanding individual vehicle behavior is paramount.

The ability to persistently identify individual vehicles across multiple video frames is central to the BoT-SORT algorithm’s utility. This consistent tracking isn’t merely about following moving objects; it facilitates a detailed understanding of traffic dynamics. By maintaining unique vehicle identities, the system can accurately calculate traffic flow rates, pinpoint the onset and extent of congestion, and automatically identify potential incidents like stalled vehicles or accidents. This granular level of analysis moves beyond simple vehicle counts, providing data essential for proactive traffic management, optimized route planning, and improved emergency response times, ultimately contributing to safer and more efficient transportation networks.

The system’s efficiency extends beyond tracking, incorporating capacity-aware scheduling to dynamically allocate resources across edge devices. This optimization is crucial for maintaining scalability and responsiveness, even under heavy workloads-demonstrated by manageable forecast latency achieved with four concurrent clients. Utilizing Jetson Orin AGX 64GB devices significantly enhances data collection capabilities, providing 1.2 to 5 times more data compared to the 32GB version. This increased data throughput supports more detailed analysis while maintaining a practical per-image annotation latency of 4.0 seconds using SAM3, ensuring real-time or near-real-time processing for demanding applications.

The presented framework underscores a crucial point about complex systems – infrastructure should evolve without rebuilding the entire block. Just as a city’s transport network benefits from incremental improvements rather than wholesale reconstruction, this edge-cloud architecture prioritizes scalability through distributed processing. This mirrors Dijkstra’s observation: “It’s always possible to do things differently, and usually better.” The ability to ingest and analyze thousands of video streams, coupled with the use of graph neural networks for traffic forecasting, exemplifies a design philosophy that favors adaptation and continuous refinement over monolithic solutions. The system’s success hinges on understanding the interplay between edge and cloud resources, acknowledging that structural integrity dictates behavioral effectiveness.

Beyond the Horizon

The presented framework, while demonstrating scalability across a complex urban landscape, ultimately highlights the inherent fragility of distributed systems. If the system survives on duct tape – cleverly orchestrated message passing and adaptive resource allocation – it’s probably overengineered. The true challenge isn’t simply processing more streams, but extracting meaning from the noise. Current metrics focus on prediction accuracy, but a truly intelligent transportation system anticipates not just congestion, but the why behind it – the subtle shifts in collective behavior that precede gridlock.

Modularity, so often lauded, risks becoming an illusion of control. Decomposing the problem into edge-cloud components is useful, but without a holistic understanding of the information flow-the dependencies and feedback loops-it’s akin to rearranging deck chairs. The current reliance on centralized model updates, even within a federated learning paradigm, suggests a lingering discomfort with true decentralization. The network’s intelligence remains tethered to a core, creating a single point of failure and limiting its adaptability.

Future work must move beyond incremental improvements in processing speed. The focus should shift towards developing systems capable of learning the city itself – understanding its rhythms, anticipating its needs, and responding with genuine autonomy. This demands a move away from purely data-driven models and towards systems that incorporate causal reasoning and contextual awareness. The architecture must evolve from a pipeline for prediction to a living, breathing model of the urban environment.

Original article: https://arxiv.org/pdf/2603.05217.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Urban Mobility Challenge: A System Under Strain

Decentralized Intelligence: The Rise of Edge Computing

The Analytical Engine: DeepStream and YOLO in Concert

Persistent Understanding: BoT-SORT for Robust Tracking

Beyond the Horizon

See also: