You previously purchased this article through ReadCube. Institutional Login. Log in to Wiley Online Library. Purchase Instant Access. View Preview. Learn more Check out. Abstract We introduce a dynamical system that we call the AdaBoost flow. Volume 68 , Issue 5 May Pages Related Information. Close Figure Viewer. Browse All Figures Return to Figure. Previous Figure Next Figure. Email or Customer ID. Forgot password? Old Password. New Password. In the grid search performed, we can see that Decision Tree and Random Forest have been configured to build shallow trees.
In relation to the minimum number of samples required to split an internal node, the Random Forest used a large number of samples to perform this task. In the Random Forest model that we build, 80 samples are needed to divide a node. In the Max depth parameter, which concerns the number of features to consider when looking for the best split, the two models were configured with n where n is the number of the training sample. In the Adaptive Boosting and Gradient Boosting models, we performed the N estimators and learning rate parameters.
In the N estimators, we defined the number of boosting stages to perform. The boosting models are fairly robust to overfitting so a large number usually results in better performance. In the learning rate, the set value has the purpose to define the contribution of each classifier. The nearer to zero is the learning rate, it means that the classifiers, individually, will have less contribution to the ensemble, i.
For the Gradient Boost model, the parameters Min sample split and Max depth were also configured. In this case, these parameters have the same purpose as the configuration performed in the Random Forest and Decision Tree models. In the case of CNN, because it is a neural network, we had to perform the configuration in a hybrid way. The number of convolutional layers and the number of filters in each layer was defined empirically. The size of each filter was specified using the grid search process. In grid search, to find the best configuration of the convolutional filters, we determine the range of values of [1—5].
For each layer, all filters have the same size. As we defined four convolutional layers, combinations were tested to determine the architecture as shown in Figure 7. The construction of this architecture took into account the observations made by Szegedy and collaborators [ 65 ].
In the development of the CNN, the authors demonstrate that the use of small kernels is more efficient than the use of larger kernels. In addition to decreasing the processing load, Szegedy and collaborators [ 65 ] also emphasize that the use of multiple small filters can match the representativeness of larger filters. The concept of representativity is related to the capacity of the convolution to be able to detect structural changes in the analyzed object.
In this context, we decided to use four convolutional layers with, respectively, 60, 40, 40, and 60 filters in each layer. As already explained, the filter size of each layer was defined from the grid search process. The construction, configuration, training, testing, and validation of all models was performed using the Scikit-learn API and its dependencies.
In the case of the neural network, we also use the Keras API and its dependencies. The objective is to verify if any model presents good classification performance using the features of a single sensor. If this is possible, it means that we can reduce the amount of information needed to distinguish passengers and drivers in the event of reading while driving.
In all the experimental phases, we have set aside two third of the dataset to performing the training and validation of the model using fold cross-validation methodology. Additionally, to ensure the stability of the models, in each phase, we repeated ten times the process of training and validation of the models using the fold cross-validation methodology. To evaluate the performance of the models, we use the accuracy, precision, recall, F1-score, and Cohen kappa score that are well-known metrics for machine-learning designers.
The accuracy metric checks the proportion of correctly classified samples. The accuracy evaluates the overall performance of the classifier. In that sense, when we have unbalanced classes or a multiclass problem, the good accuracy rates may not reflect the actual performance of the model. For a better evaluation of the model, it is evident that besides the global vision of the metric accuracy, there is also a need for metrics such as precision and recall to evaluate the specificities of each class. The precision metric, for a C x class, checks the proportion of samples correctly classified as C x based on the examples that have been classified as C x.
In a classification problem, good precision rates indicate the correctness of the classifier. About the recall metric, this checks the proportion of samples that were classified as C x in function on the samples that should be classified as C x. In the classification task of detect passengers and drivers in the event of reading while driving, a good precision rate in the classification of drivers indicates the non-inconvenience of the passengers, i. For the recall metric, your good rates indicate the security of the system.
To check the balance between non-inconvenience and system security, we used the F1-score metric. The F1-score is the harmonic mean of precision and recall. The last metric used was the Cohen kappa, which is a robust statistic useful for reliability tests between raters. In the machine-learning perspective, we can use kappa to measure how closely the instances labeled by the classifiers matched the data labeled as ground truth. Based on the methodology described in the Section Machine-Learning Models, Figure 8 and Figure 9 show the performance of the models in the first phase of the experiment.
We built the boxplot using the validation performance of the training phase. The boxplot was constructed with one hundred validation performances obtained with the ten replications of the fold cross-validation. We can observe that the Gradient Boosting and CNN models obtained the best performance in all analyzed metrics. The Decision Tree model presented similar behavior to our baseline model Logistic Regression. Also, for almost all metrics, the Decision Tree DT presented the most significant tail. This behavior tells us that DT was the model that presented greater instability in the classification of drivers and passengers.
Boosting | The MIT Press
In the specific case of the SVM model, we can observe that it presented little disparity in the recall metric and a considerable difference in the precision metric. This behavior shows us that the SVM model was better to ensure the safety of drivers than to avoid the inconvenience of passengers. Radar chart of the test phase left and the validation phase right. Figure 9 shows the radar chart of the test phase left and the validation phase right.
As we can see, for all metrics, all models have kept their performance in the test and validation phase. This behavior demonstrates that no model has specialized only in the training set, and each one with its level of performance, was able to realize the generalization of the problem. About the generalization capacity, we can classify the models into three levels.
At the first level are the Gradient Boosting and CNN models as the models with the highest generalization capacity. Finally, in the third level, we have the Decision Tree and Logistic Regression models as the models with the least generalization capacity of the problem. In the second experiment, we analyzed the performance of the models taking into account only the parameters collected from the same sensor. The first subset contains the parameters collected from the accelerometer sensor. The second subset includes the parameters obtained from the gyro sensor. The third subset contains the parameters collected from the magnetometer sensor.
Table 6 shows the parameters belonging to each subset. At this stage of the experiment, we did not use the speed feature calculated from the data collected by the GPS sensor because it was the only information coming from the GPS. In this case, we have analyzed that single speed is not enough information to distinguish passengers from motorists in the event of reading while driving.
Figure 10 , Figure 11 and Figure 12 show the performance of the models for each subset of parameters specified in Table 6. For each subset, we plot the validation data boxplot and the radar chart of the validation and test data. In this phase, the training of the models was also carried out by running ten times the fold cross-over estimation method. Figure 10 plots the performance of the models with the features collected by the accelerometer sensor subset 1. Figure 11 shows the performance of the models with the features collected by the gyroscope sensor subset 2.
Figure 12 shows the performance of the models when analyzed with the data collected by the magnetometer sensor subset 3. In the context of driver behavior, the combination of the accelerometer and gyroscope sensors allows us to identify changes in acceleration, curves, and braking patterns. However, when analyzing the data from these sensors separately, we can note that the independent use of the accelerometer and the gyroscope sensors are not enough to distinguish passengers and drivers in the event of reading while driving.
McHugh [ 66 ] suggests that kappa values less than 0. Thus, we can observe that in the analysis of subsets 1 and 2, all models are with kappa values below the expected one. Figure 12 shows the performance of the models for the magnetometer sensor data. We can see in the boxplot that the performance of the Gradient Boosting and CNN models were similar to those verified in the first experiment. Regarding the Decision Tree model, although it is the model that presents the most substantial tail in all metrics, we can notice a considerable improvement in its performance when compared to its performance of the first experiment.
On the other hand, the SVM model that was in the second stage of performance in the first experiment, this time presented results close to that of our baseline model.
In the radar chart of the validation and test, we can see that the models Gradient Boosting and CNN presented the pentagon that indicates the capacity of generalization of the model. In this phase of the experiment, observing the similarity of performance of the GB and CNN in both the first and second phases of the research, we decided to apply a statistical significance test to verify the equivalence between the models.
We want to verify if the developed models are significantly equaled, the null hypothesis of the test H0 considers that the models are equal and the alternative hypothesis H1 considers that the models are different.
If we accept the null hypothesis, it means that there is no significant difference between the compared models. We apply the test for all two-to-two combinations of the models. As we are comparing four models, then we have performed six hypothesis tests. Table 7 shows the p -value for each test and the description whether the hypothesis H0 was accepted or rejected. We can see in Table 7 that only the comparison of the Gradient Boosting models had their hypothesis rejected.
If we take into account the performance of the models, the amount of information required, and statistical equivalence. We can say that the model C N N 2 is the most suitable to carry out the classification of drivers and passengers in the event of reading while driving. C N N 2 is the best model since it is statistically equivalent to all others, presents a similar performance for all analyzed metrics, and only requires information from the magnetometer sensor to perform the classification.
In a random forest, the idea is that the calculation used to determine the node of a decision tree can be used to evaluate the relative importance of this node about the predictability of the target variable. In this sense, features used at the top of the tree are more important to the final prediction decision. In scikit-learn, the library used in this research, the mean decrease in impurity MDI is used to estimate the feature importance in a random forest. The Equation 8 calculates the MDI of a feature. Thus, we train and compare the performance of the models taking into account the six most important features.
Then, we repeat this procedure for the five most important features and then for the four most important ones. We did not evaluate the top 3 subset because we observed the performance drop in the top 6, top 5 and top 4 experiments. In Figure 14 we can observe the radar chart of the models. From a global perspective, we can note that model performance decreases as the set of features also decreases. In the radar chart of the top 6 and top 5 subsets, we can observe the performance hierarchy that also is seen in the previous experiments.
However, in the analysis of the top 4 subset, the performance hierarchy is non-existent. This behavior and the performance drop across all metrics show that the models cannot generalize the problem using only the top 4 subset as input data. Analyzing the boxplot of the top 6 subset, we can observe that the quartiles of the two best models are closer than in the previous experiments.
This means that the Gradient Boosting and Convolutional Neural Network models present lower performance variation using the features of the top 6 subset as input data. In the radar chart, we can see that the GB and CNN models show similar behavior in all five metrics analyzed. To certify this similarity, we applied the statistical test to verify the equivalence between the models. In addition to this comparison, we also performed a statistical correlation with the best models of the previous experiments. Radar chart with results from the third evaluation, considering the validation data.
We can see that using the top 6 features, the overall results are better than using 5 or 4. Table 8 shows the results of the statistical equivalence test applied to the models. We use the kernel density estimation of the accuracy in comparing the models. The null hypothesis of the test H0 considers that the models are equal and the alternative hypothesis H1 considers that the models are different. Although the C N N t o p 6 and G B t o p 6 show similar performance for all analyzed metrics, the test indicates that statistically, these models are different.
However, when comparing the performance of these models with the best models obtained in the first two phases of the experiments, we can see that both are statistically equivalent in almost all comparisons. The G B t o p 6 , which is a Gradient Boosting model, was considered equivalent to all the best models of the first two experiments.
Evolution of machine learning
Thus, taking into account the analysis of the three experiments, we can verify the following evidence:. The problem of classifying passengers and drivers in the event of reading while driving can be solved in a non-intrusive way from a machine-learning approach. This fact was demonstrated from the performance of the developed models. According to the analysis and statistical test, the Convolutional Neural Network was the best model to work with this subset. According to the analysis and statistical test, Gradient Boosting was the best model to work with this subset. Studies show that driver distraction is an increasing problem for road traffic injury and deaths.
In this context, this research presented a machine-learning analysis to solve the problem of reading while driving in a non-intrusive way. To analyze the effects of distraction, we collect the information using the GPS, accelerometer, magnetometer, and gyroscope sensors of the smartphone. We have built an app to perform the dataset labeling.
The application generated random messages that should be read by the user. During reading, we were able to store the sensor values and label if a driver or passenger read the message. In total, we built a dataset with 11, samples. We split the database into three disjoint sets: training, validation, and testing. The training and validation sets were divided and applied in the folds cross-validation process.
After each phase of the experiment, we used the test set to verify the generalization capability of the models. Three experiments were carried out. In the second experiment, we analyzed the performance of the models considering as input only the features collected by the same sensor. Thus, we looked at how the models behaved taking as input data features from the accelerometer, magnetometer, and gyroscope sensor. The results show that the models did not generalize the problem only with the accelerometer or gyroscope information.
Regarding the magnetometer data, the results show that CNN had the best generalization. The statistical tests also pointed out the equivalence between this CNN model and the two best models of the first experiment. In the third experiment, we used the Random Forest model to verify the importance of the features. However, with the subset of the six most essential features obtained by the Random Forest model, it was possible to construct a Gradient Boosting model statistically equivalent to the best models purchased in the previous experiments.
Taking into account the results presented in this research, we can conclude that CNN and GB are efficient models to distinguish passengers and drivers in the event of reading while driving. The fact that we build models that work with non-intrusive data indicates the possibility of making commercial solutions that can provide driver safety and convenience for passengers.
As future work, we intend to analyze the effects of reading while driving when the device is attached to the vehicle. With this approach, we can apply the analysis to other devices such as the onboard computer.
With Jetson TX2 we were able to build, train, and test machine-learning models more efficiently. We also want to thank the 18 volunteers who participated in the data collection phase of this project. Individual contributions presented in this research: conceptualization, R. National Center for Biotechnology Information , U.
Google Boosting Original Reporting With Algorithm Change
Journal List Sensors Basel v. Sensors Basel. Published online Jul Author information Article notes Copyright and License information Disclaimer. Received Apr 24; Accepted Jul 2. Abstract Driver distraction is one of the major causes of traffic accidents. Keywords: driver distraction, reading while driving, machine learning, deep learning. Introduction According to the World Health Organization, the number of road traffic deaths continues to increase, reaching 1. For example, considering a solution that detects texting while driving; if this solution does not distinguish passengers from drivers, two situations can happen: The first situation is to block the typing even if it is a passenger who is texting.
Open in a separate window. Figure 1. Figure 2.
Table 1 Number of messages read by each volunteer. Figure 3. Figure 4. Figure 5. The conduction of the systematic review was performed to answer the following research questions: Question 1: What types of methods are used for the analysis or detection of driver distraction? Question 3: If so, what are the sensors used for data collection? Table 4 List of machine-learning ML approaches filtered in the literature review. Figure 6. AdaBoost The central idea of Adaptive Boosting is to create a strong classifier from some weak classifiers.
Figure 7. Figure 8.
- Google Boosting Original Reporting With Algorithm Change - video dailymotion.
- Shop by category.
- The Shangri-La Adventure (adventures Book 2).
- Jack of All Trades, Mastered by One;
Figure 9. Figure Performance of the models with the features collected by the accelerometer. Performance of the models with the features collected by the gyroscope. Performance of the models when analyzed with the data collected by the magnetometer. Conclusions Studies show that driver distraction is an increasing problem for road traffic injury and deaths. Author Contributions Individual contributions presented in this research: conceptualization, R. Funding This research received no external funding.
Conflicts of Interest The authors declare no conflict of interest. References 1. Chan M. Global Status Report on Road Safety World Health Organization; Geneva, Switzerland: Lipovac K. Mobile phone use while driving-literary review. Part F Traffic Psychol. Ambev S. Technical Report. Carney C. Maya S.
Relationships among smartphone addiction, stress, academic performance, and satisfaction with life. Elsevier Comput. Topics covered include the Probably Approximately Correct PAC learning framework; generalization bounds based on Rademacher complexity and VC-dimension; Support Vector Machines SVMs ; kernel methods; boosting; on-line learning; multi-class classification; ranking; regression; algorithmic stability; dimensionality reduction; learning automata and languages; and reinforcement learning.
Each chapter ends with a set of exercises. Appendixes provide additional material including concise probability review. This second edition offers three new chapters, on model selection, maximum entropy models, and conditional entropy models. New material in the appendixes includes a major section on Fenchel duality, expanded coverage of concentration inequalities, and an entirely new entry on information theory.
More than half of the exercises are new to this edition. Skip to main content.