Predicting Overtakes in Trucks Using CAN Data

Safe overtakes in trucks are crucial to prevent accidents, reduce congestion, and ensure efficient traffic flow, making early prediction essential for timely and informed driving decisions. Accordingly, we investigate the detection of truck overtakes from CAN data. Three classifiers, Artificial Neural Networks (ANN), Random Forest, and Support Vector Machines (SVM), are employed for the task. Our analysis covers up to 10 seconds before the overtaking event, using an overlapping sliding window of 1 second to extract CAN features. We observe that the prediction scores of the overtake class tend to increase as we approach the overtake trigger, while the no-overtake class remain stable or oscillates depending on the classifier. Thus, the best accuracy is achieved when approaching the trigger, making early overtaking prediction challenging. The classifiers show good accuracy in classifying overtakes (Recall/TPR>93%), but accuracy is suboptimal in classifying no-overtakes (TNR typically 80-90% and below 60% for one SVM variant). We further combine two classifiers (Random Forest and linear SVM) by averaging their output scores. The fusion is observed to improve no-overtake classification (TNR>92%) at the expense of reducing overtake accuracy (TPR). However, the latter is kept above 91% near the overtake trigger. Therefore, the fusion balances TPR and TNR, providing more consistent performance than individual classifiers.


INTRODUCTION
The development of Advanced Driver Assistance Systems (ADAS) has emerged as one of the most popular areas of research in artificial intelligence.Through several sensors, ADAS is designed to alert the driver of potential hazards or control the vehicle to ultimately avoid collisions or accidents.For those tasks, the vehicle must gather information about its surroundings to decide what to do and how to do it.Knowing the driver's intention is an integral part of the system, to determine if the ADAS should activate, providing opportune aids or alerts, or even overriding the driver's inputs [1].
Among the most important driving manoeuvres is the overtaking manoeuvre in particular.Lane changes, acceleration and deceleration, and estimation of the speed and distance of the vehicle ahead or in the lane it is travelling in are all part of the process.Though there is a lot of work in the literature that aims at predicting driving manoeuvres, very few address overtaking [2,3,4], and no realworld dataset is available due to the risk associated with overtaking [5].Most works address the estimation of lane change [1] or turning intention at intersections [6].In doing so, different data sources are typically used, including information from the driver (via cameras or biosensors capturing EEG, ECG, etc.), from the vehicle (CAN bus signals), or the traffic (GPS position or relative position or velocity of surrounding vehicles via cameras or lidar).
In this paper, we present ongoing work on overtake detection, in particular for trucks.Trucks carry heavier loads than cars, so a truck accident can be This also avoids privacy concerns related to cameras looking inside or outside the cabin, or sensors capturing data from the driver.We employ real CAN data from real operating trucks provided by Volvo Group participating in this research.The contribution of this paper is that, to the best of our knowledge, we are the first to study overtake detection in trucks, particularly from real CAN bus data.We also demonstrate that the fusion of classifiers can help to obtain a balanced performance in detecting the two classes (overtake, no-overtake).To avoid running out of storage, the data logger is programmed to record only when a precondition trigger to detect potential overtakes is met.Such trigger is activated based on specific thresholds to certain signals: signal 8 (active), signal 5 (more than 50 km/h), signal 2 (less than 200 m), and signal 4 (more than 0.1 km/h).When the trigger is activated, the logger saves the CAN signals from 20 seconds before the trigger up to 45 seconds thereafter.Data also includes video from a camera in the dashboard looking ahead the vehicle.Afterwards, a person manually labels the files by watching the videos and determines if it is an overtake or not.

EXPERIMENTAL FRAMEWORK
With this procedure, we obtained 264 noovertake files and 448 overtake files.Notice that the precondition trigger is designed to detect when the vehicle is to change lane (signal 8), to be sufficiently close to the vehicle ahead (signal 2), and to move laterally to the left (signal 4), which are indicative signs of an overtake.However, it is not always the case, since around 37% of the obtained files correspond to other driving situations.After watching the videos, such no-overtake situations occur, for example, when turning left at an intersection, or surpassing a stopped vehicle.Looking at the left turn indicator (signal 9) would produce false positives as well.Also, the minimum speed condition (signal 5) is designed to filter out situations that can occur in city traffic at low speeds but are not really overtakes.As a result, our files contain data mostly from highways or non-urban roads.

Classifiers
To detect overtakes, we have used 3 classifiers: Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM, with linear and rbf kernels).They are based on different strategies and are a popular choice in the related literature [7].An ANN consists of several interconnected neurons that are arranged in layers (i.e., input, hidden, and output layers).Nodes in one layer are interconnected to all nodes in the neighbouring layers.Two design parameters of ANNs are the number of intermediate layers and the amount of neurons per layers.An extension of the standard classification tree algorithm, the RF algorithm is an ensemble method where the results of many decision trees are combined.This helps to reduce overfitting and to improve generalization capabilities.The trees in the ensemble are grown by using bootstrap samples of the data.Finally, SVM searches for an optimal hyperplane in a high dimensional space that separates the data into two classes.SVM uses different kernel functions to transform data that can be used to form the hyperplane, such as linear, gaussian or polynomial.In this work, the available files are cropped Fig. • ANN and SVM use standardization (subtract the mean, and divide by std of training data) • The ANN iteration limit is raised to 1e6 (from 1e3) to facilitate convergence • Similarly, the SVMrbf iteration limit is raised to 1e8 (from 1e6)

RESULTS
In Figure 1, we present the boxplots of the decision scores of each classifier towards the two classes.
Notice that the classifiers are set to produce the probability that a sample belongs to a specific class (i.e.belonging to [0,1]).It can be observed that the output probability of class1 (overtake) usually increases as the precondition trigger approaches (x-axis=0), whereas class0 keeps a stable or oscillating probability, depending on the classifier.Thus, from the right plot of Figure 1, it can be seen that it will be easier to detect overtakes closer to the trigger.We then report in Figure 2 the Precision-Recall (PR) curves of the classifiers at different moments before the precondition trigger.In choosing the metrics to report our accuracy results, we follow related studies on driver intention prediction [7,9,6].We also provide results considering all samples of the files at any given instant from -10 seconds to +1 seconds around the trigger.Table 2 gives the AUC values.Precision measures the proportion of detected positives which are actually overtakes, quantified as: Recall, on the other hand, measures the amount of overtakes that are actually detected, as: A summarizing measure of P and R is the F 1score, defined as: Figure 3 provides the F 1-score for different values of the threshold applied to the decision scores.The mentioned curves confirm the observation that "the closer to the trigger, the better".It can be seen that orange curves (0s before the trigger) and red curves (1s before the trigger) usually appear above the others.The black curves (which use samples in the entire range of -10 seconds to +1 seconds around the trigger) always show the worst behaviour.This confirms that samples earlier than 3 seconds before the trigger actually provide detection capabilities, making more difficult to predict overtakes earlier.
We then select the threshold of each classifier and moment that provides the highest F 1-score.Table 3 reports P , R and F 1, whereas Table 4 reports the true positive rate (T P R) and false positive rate (F P R), calculated as follows: T P R measures the amount of overtakes that are actually labelled as overtakes, whereas T N R measures the amount of no-overtakes that are actually labelled as no-overtakes.Notice that T P R = R.The bold values in the tables show that Random Forest (RF) usually stands out as the best individual classifier, consistently obtaining the highest F 1 at any given moment in time.To better observe the evolution of T P R/T N R, we graphically show in Figure 4 their values at different moments before the trigger.T P R stands above 90% for all classifiers, even when using all samples within 10 seconds before the trigger, meaning that actual overtakes can be well detected.Random Forest gives the best accuracy (>98% at t-1), although its performance is somehow more erratic across time.ANN is the classifier with the most stable T P R at any time (above 94%).Interestingly, not all classifiers have their best T P R at t (exact moment of the trigger).As it was observed in the boxplots of Figure 1, the score towards the positive class (right columns) tends to decrease abruptly exactly at the trigger.This could be Table 3: Precision, recall and F1-score (values in %) of the classifiers at different moments before the overtake manoeuvre starts (t corresponds to the precondition trigger, t-1 to one second earlier, and so on).We use the threshold (th) which gives the maximum F1-score (Figure 3).The row variation shows the difference between RF+SVML and the best of the RF and SVML classifiers.The bold number in each column indicates the results of the best individual classifier.If the fusion RF+SVML improves the best individual classifier, such a cell is also marked in bold.because the window is capturing a portion of samples after the trigger, which is shown to actually be detrimental to the detection.Regarding T N R (left plot of Figure 4), its values can diminish to as low as the 50-60% range, meaning that a substantial percentage of no-overtakes would be actually labelled as overtakes.Here, RF and ANN show better numbers (T N R above 70-80%).Also, in this case, it is actually observed that the farther away from the trigger, the lower the T N R.
From the results above, we observe that T N R is not as high, so the classifiers are not as good in classifying no-overtakes.Also, ANN and SVMrbf show some strange behaviour, such as that the threshold of maximum F 1 is too low (Table 3), or the P-R curves are too "shaky".This suggests that the default values of these classifiers may not be the best choice.We thus take RF and SVM linear further and fuse their output scores by taking their mean.The AUC, P , R, F 1, T N R and T N R of the fusion have been also provided in Tables 2-4.It can be observed that AUC, Precision, F 1 and True Negative Rates improve for all moments before the trigger.On the other hand, Recall and True Positive Rates are seen to decrease.The observed effect of the fusion is that the ability to classify no-overtakes is increased, at the cost of reducing overtake detection capabilities.However, the increase in T N R is much bigger than the decrease in T P R (Tables 4).Overall, the fusion provides a more balanced accuracy of these two metrics, situating them beyond 91%.For example, at t-1 or earlier, T N R was below 80%, but after the fusion, as early as 3 seconds before the trigger, both classes have an accuracy of 87% or higher.Such stability and well-balanced accuracy can also be observed in Figure 4.
Fig. 4: Graphical plot of T P R/T N R at different moments before the overtake manoeuvre starts (t corresponds to the precondition trigger, t-1 to one second earlier, and so on).

CONCLUSIONS
We demonstrate the suitability of CAN bus data to detect overtakes in trucks.We do so via traditional widely used classifiers [7], including Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM).To the best of our knowledge, we are the first to apply machine learning techniques for overtake detection of trucks from CAN bus data.The classifiers employed performed well for the overtake class (TPR ≥ 93%), although their performance is not as good in the no-overtake class.With the help of classifier fusion, the accuracy of the later class is observed to increase, at the cost of some decrease in the overtake class.Overall, the fusion balances TPR and TNR, providing more consistent performance than individual classifiers.
As future work, we are exploring the optimization of classifiers beyond their default values [10].Parameters like the size of the sliding window employed or the time ahead of the precondition trigger are also subject to discussion in the literature [1,7].There is the possibility of capturing large amounts of continuous unlabeled data from Volvo Group participating in this research.We are also considering the improvement of the developed classifiers by training them on a larger dataset obtained via pseudo-labeled data [11], for example, selecting samples with high prediction probability as given by the classifiers trained with labelled data.This would avoid the time-consuming manual labelling issue.A bigger dataset would also enable the use of data-hungry popular models such as Long Short-Term Memory (LSTM) networks [12].

Table 1 :
Files employed per truck and class for training and testing.
2: Precision-Recall curves of the classifiers at different moments before the overtake maneuver starts.AUC (Area under the curve) values are given in Table2.F 1-score vs. threshold at different moments before the overtake maneuver starts.

Table 2 :
AUC-PR of the classifiers at different moments before the overtake manoeuvre starts (t corresponds to the precondition trigger, t-1 to one sec-

Table 4 :
T P R/T N R of the classifiers at different moments before the overtake manoeuvre starts (t corresponds to the precondition trigger, t-1 to one second earlier, and so on).The row variation shows the difference between RF+SVML and the best of the RF and SVML classifiers.The bold number in each column indicates the results of the best individual classifier.If the fusion RF+SVML improves the best individual classifier, such a cell is also marked in bold.