Jacob Ayers - Electrical Engineering (Senior)
Ian Singson - Aerospace Engineering (Senior)
Shoh Mollenkamp - Computer Engineering (Junior)
The base plate has a grid of m3 holes that are 1.5 cm apart horizontally and 2.5 cm apart vertically. There is an extended hole in the bottom right of the base plate to accommodate for mounting to the vehicle and attaching the Jetson Nano at an angle to allow for more room for the wiring.
The camera mount was designed with two legs, a right and left shown in the images below respectively. The legs can each be attached to the base plate via two m3 screws. The camera plate is what holds the camera via four m2 screws and attaches to the two legs by a m3 screw in the back of the plate that goes across both legs on their top most hole. The camera plate was designed to be flush against the two legs and not allow any rotation so that only one screw may be used. We found that the camera did not need to be put at an angle because it was high enough so that the camera could see in what was directly in front of the robot but also far away. The lidar plate can be attached to the camera legs via two m3 screws in the middle of the legs. The lidar plate uses two screws to support the weight of the lidar.
Donkeycar Autonomous Laps
ROS2 Autonomous Laps
Our final project aimed to use image detection that would autonomously cue the robot to play a song while it is going around the track. These images will be placed on the side of the track and will be detected by another webcam on the robot that is facing outward with respect to the track.
We were able to do this through a webcam training data collection script using transfer learning from ResNET18. This script allowed us to take images of each class of the musicians faces around the track in order to train a model that would be able to recognize if there was a musician face on the side of the track, and which musician it was. We took roughly around 1200 images around the track for 5 classes: Beethoven, Frank Sinatra, Freddie Mercury, John Lennon, and a none class for when there was no musician. This ResNET18 model was then used in a script that would then output which class the webcam detected onto a text file from a Pytorch prediction pipeline. This prediction pipeline only outputs to the text file when the prediction is above a threshold value of 0.95. This text file is constantly getting read by another script which plays a song of the artist outputted on the notepad. The none class keeps playing the song from the artist before but when another artist shows up, their song starts, ending the current song. This script uses a background Linux process to shuffle a song of the artist.
Our original idea was to have the camera pointing directly perpendicular to the track to view the images on the side of the track but realized having the camera at an angle, we chose it to be 45 degrees, was more ideal as it gave the camera more time to detect the image. Additionally, with the camera being perpendicular to the track, the image would be very blurry and would make the image hard to recognize. Putting the camera 45 degrees from the front solved this issue.
Second Camera Holder
Our second camera is webcam that is meant to wrap around the monitor so a different design was needed. This design allows the camera to be securely wrapped around the camera holder. The camera had some degrees of rotation already so rotation did not have to be a design factor.
First and foremost, as with all deep learning projects, it is necessary to collect a large amount of data. To do this, we created a cardboard box with faces of the musicians we want to classify. From there we created a script (webcam_training.py) that allows us to collect jpg images directly from a webcam attached to our robot. We used that script to collect ~140 images for training on each class and about 40 images for validation. We collected ~350 training images for our "None" class and ~100 validation images. This is because we want our model to be avoid changing classes too often.
We used transfer learning from the Convolutional Neural Network model ResNET18. The model used and the amount of images allowed us to train the model, using the training_pipeline.py script, on an old laptop with an i3 processor, 4GB of RAM, and an onboard GPU in about 30 minutes.
We created a live prediction pipeline (live_predictions.py) that would read in images from a webcam at a rate of 3Hz, applying the relevant transformation on the images to allow them to be fed into the prediction pipeline one at a time. Once the 5 class predictions have been created, we take the max prediction, and if it is above a certain threshold, we write that class output to a text file.
While the live predictions are occurring, we run a script that constantly looks at the last line of the aforementioned text file, and shuffles a song from a playlist whenever the most recent artist changes in the text file.
More details can be found in the Github Repository: https://github.com/JacobGlennAyers/ECE148_Final_Project
For our use, once we had trained a model, we would run some code from the course, such as the ROS linerunner or the donkey manual driving, then start the live_predictions.py, and then start the play_songs.py script. We would do this all in separate terminals.
Extra (Purchased) Hardware
Image Recognition Demo
ROS2 Linerunner + Image Recognition Demo
Work for Those Later
Make it do something useful for the Triton AI team such as shouting out actions, reacting to a speed sign, or even avoiding obstacles. Another thing would be to tie our system to ROS2 so that only one webcam would be necessary. We noticed our system getting slower once we ran ROS2 and our system so perhaps running two Jetsons to split up the tasks could be another thing to work on.
Some problems we had came in the form of ease of use due to the need to have 3/4 SSH terminals open, depending on whether we were using our system along with Donkey's manual driving feature or the ROS2 Linerunner autonomous driving. Would be nice to just have to run one script to get everything up and running. Our system also had a fair amount of false positives when it came to the Freddie Mercury class. To address this, the error threshold could be increased, more training data could be acquired, the images could be modified to be more distinct from one another, more data augmentation could be used, or maybe the model could be run with more epochs.
We did run into some power issues related to the Jetson nano. Especially running the linerunner and our code base concurrently, as well as in the training of the model. This could likely be fixed by using Pytorch GPU acceleration capabilities.
Progress Reports and Final Presentation
Thank you to Dom and Harou for being great TA's and huge thanks to Jack for setting up this class and making this possible.