- 1 About Our Team Members
- 2 Final Project Description
- 3 Hardware
- 4 Repositories w/ Instructions
- 5 ROS
- 6 (PII) Pedestrian Intention Interpretation
About Our Team Members
Team 7 is composed of UCSD undergraduate students.
- Nicholas Compton - Mechanical Engineering Senior
- Jesse Liang - Computer Engineering Senior
- Wen Tian - Computer Engineering Senior
Final Project Description
For this project, we are implementing ROS for some basic lane detection/pose estimation as a start. The car will drive on the track and stop when it predicts a human is crossing or wants to cross.
Intel RealSense 435 RGB-D
Steering Servo Motor
Throttle DC motor
1/10 scale RC car chassis with ackerman steering.
12V-5V DC-DC voltage converter
433MHz remote relay
11.1V 3S LiPo Battery
Battery Voltage Sensor
Intel RealSense 435 RGB-D Case
Repositories w/ Instructions
ROS (Robot Operating system) is a framework that allows user to utilize tools and libraries to write robot software. Our team used ROS to manage programmatic operation of the robot. Tasks that relied on ROS included fetching video feed from a RGB-D camera and making imagery available for computer vision processing, determining correct throttle and steering decisions and relaying digital instructions to the car's control surfaces (steering and throttle).
OpenCV is a computer vision library that features many functions capable of managing object detection, edge finding, color filtering, and more. We used OpenCV to detect the yellow center lines of the track.
For the sake of brevity, I will only summarize the ROS driving implementation, as we based it on Dominic Nightingale's ROS implementation.
Because of issues of compatibility, we needed to use the Intel Realsense RGB-D camera as our "driving camera". Using this image and computer vision, our AI isolates the yellow lines on the road and follows it, based on how far from the middle the line is. Both throttle and steering are controlled this way. Here is a demo.
GPU-Accelerated Pose Estimation
This part required that we upgrade our version of Jetpack to 4.5.1. For instructions, please follow the instructions in our repo. In summary, for this to work on the Jetson, we need TensorRT 7.x and Deepstream 5.x, both requiring Jetpack 4.5.1. Again, please follow in instructions at the repo if you want this to work.
After we upgrade Jetpack and install Deepstream, we have to install trt_pose, again instructions here. Included is a live_demo.ipynb Jupyter notebook that I modified to help us collect data for our AI. Running the notebook, we can see a live demo of pose estimation
(PII) Pedestrian Intention Interpretation
A major goal of this project was to develop an AI to predict a pedestrian's intention and make decisions based on them. Mainly, the AI tries to predict whether a person is crossing the road or not, then decides whether to continue driving or to stop.
With our previous DonkeyCar implementation, we trained a convolutional-neural-network to learn to race around the track. Although we achieved a certain level of success, the AI can sometimes get confused by random noise or stimuli, such as new environments or lighting.
For our PII implementation, we opted to minimize the noise by integrating real-time Pose Estimation, which tracks a person's body and returns their location. With pose estimation, we can reduce our input data to several points (shoulder, arm, etc.). Then, we tested several ML models to find the best model for our purpose.
We trained our AI to stop for crossers when they look like they might cross the street. When drivers stop for crossers, it is usually when they walk towards the road or when they are facing the road. The driver will not stop if the pedestrian is walking along the sidewalk, facing car, etc. In our rudimentary ML models, we tried to train a classifier to achieve this.
Our original implementation uses SVM classification on the current available pose estimation points (a model for when only a leg is detect, a model for when all points are detected, etc.). Due to our low resources, we wanted to see if a lower-resource ML model like SVM would suffice. Initially, our model yielded favorable results as shown in the model training scores below. (The best score is 1.0, worst is 0.0)
However, after further testing, we found that SVM is insufficient for our needs, with real world scores of 0.60 or less. We resorted back to using deep learning to solve this problem. Based on this PyTorch Classification Tutorial, we managed to get an extremely accurate model. Because of computational limitations, we could only load a single model, therefore had to only accept a fixed input length. This means our model can only work when a certain number of features are detected by pose estimation. Here were our results.
Our results were extremely promising. One caveat of this model is that it can only produce an output when we have the correct number of inputs. We chose to use all body points except for the face/head. This means that the model can only work when every point outside of the face/head are detected (does not matter if the face/head is or is not detected). This also means our training data consists of data of when all those points are detected, which may not represent real life situations. Regardless, we got the model to work pretty well in this demo. Note: our car was broken at the time and drove in reverse.
The NVidia Jetson Nano's CUDA support makes it great for AI and robotics applications. However, we still encountered problems as a result of the limitation of the hardware.
While the Jetson Nano excelled in GPU-heavy use cases like deep learning, it struggled to meet our computational and memory needs. While running pose estimation implementation, provided by NVidia for the Nano, our process was killed due to low memory. Only after upping our swap file size to 8GB, did the script begin working. Even then, the script would take upwards of 20 minutes to initialize and finally display the real-time pose estimation. While loading, the Jetson was unusable, probably due to low memory.
This was a major issue because it would take an unreasonable amount of time to test if our code worked. We figured out that the script was taking time to optimize during initialization. By saving and loading this optimized model, we save a substantial amount of time, but the script still takes several minutes to start.
Given an image, pose estimation returns the points of the body that it can detect (shoulder, head, etc.). Our previous ROS implementation used a camera and OpenCV to follow road lines. Our first attempt of implementation used this previously built camera ROS topic, but failed due to type errors. The issue may have had to do with the data type, as ROS topics require a imgmsg format, as opposed to other image formats like numpy.ndarray. While this error is potentially fixable, we did not have success after spending many hours trying to fix this problem. The unreasonable load times as described in the Hardware Limitations/Issues above made debugging these type compatibility issues very difficult.
Our solution was to use two cameras. One camera would act as our original lane guidance camera, and the other camera would act as our pose estimation camera.