Jay Tapadiya - Electrical Engineering (Senior)
Raini Wu - Electrical Engineering (Senior)
Chen Yang - Computer Science (Senior)
For our project, we set out to implement a system of object detection using only a LIDAR. We came up with a novel autonomous driving system written in ~2000 lines of C++/CUDA, which implements a supervised reinforcement learning (RL) algorithm to navigate around obstacles and follow targets. Our system implements real-time learning with no pre-training required for full-speed operation - and full-speed means full-speed, with a blazingly fast 100Hz control/learning loop that we found to perform even better at our robot's redline speeds. The end result is a very configurable RL framework which can be applied to many navigation problems through intelligent modification of our provided configuration files.
Our goal with this project was to use a LIDAR as the primary environment sensor for a robot and to enable intelligent decisions from the robot based on the input. Our project has real-world relevance because autonomous driving is a quickly growing field. A lot of newer vehicles are using LIDAR as one of their many sensors to determine how to drive.
Although there is an existing LIDAR library and steering/throttle functionality in ROS, we are choosing to implement our solution in C/C++. We believe ROS has too many unnecessary abstractions for our use and they will create overhead in the program’s performance. In addition to using C/C++, we will be using the CUDA library to fully utilize our Jetson Nano’s resources. The LIDAR will produce significant amounts of data, and the Jetson Nano will benefit from the performance in C/C++. This project is different from previous class works because this is a new LIDAR system and other groups have not developed a C/C++ implementation for driving the vehicle.
We built a real-time supervised reinforcement learning based system from scratch using C++ and CUDA. To do this, we decompose LIDAR object detection problem into a finite Markov decision process. Take a look at how it works!
To transform our complicated LIDAR object detection problem into a neat problem which can be easily solved using off-the-shelf reinforcement learning algorithms, we simplify the continuous (floating point) data collected from the LIDAR into some integer number of regions, each with a possible integer state. The state which the reinforcement learning algorithm makes decisions on is then the array of states, with one state for each region.
With this state defined, we can explore our RL algorithm. We take advantage of an off-the-shelf RL algorithm known as Q-learning. Essentially, Q-learning performs actions, then evaluates the result of this action based on some reward policy. We define a reward policy based exclusively on the center region - the change from one state to a next after some action is evaluated only for the center region, and the reward for each action is determined from there. Although we do not utilize the additional regions for the reward decision process, the RL algorithm recognizes this as a separate state, and can therefore make decisions informed by the state of these adjacent regions.
The probability that each action results in a positive reward is stored in a massive table known as a Q-table. In this Q-table, there exists an entry for every single possible state - in our case, (number of states per region) ^ (number of regions). Each entry contains four floating point numbers, corresponding to the probability of maximizing reward for each of the four actions we implement (throttle up, throttle down, turn left, turn right). This table is updated with every action, and actions are determined based on this Q-table. We implement an action policy known as epsilon-greedy; The action with the highest probability of reward is taken, except with probability epsilon a completely random action may be taken. Higher levels of epsilon are recommended if the robot is found to be prone to being stuck in behavior loops.
Since our environment is not very well suited for unsupervised reinforcement learning (mistakes usually mean the robot is in pieces), we implement a pre-weighting to this Q-table to allow our robot to head down the appropriate path as soon as possible. Our currently implemented Q-table pre-weighting is not very heavy, and incorrect weightings could be erased by the robot RL behavior extremely quickly. If a prebiased entry (ranging from 0.9 to 1) is incorrect and constantly receives a negative reward, it would be eliminated in n = 1/(D*1/LR) iterations of the loop, where D is discount factor and LR is learning rate. With our default values, this corresponds to approximately 100 iterations, or 1 second of operation. However, this mild pre-biasing is enough to allow our robot to make correct decisions and progressively learn while not immediately failing on start. This pre-biasing also allows our robot to function immediately out of the gate.
Outside of our RL algorithm, other significant design decisions include the choice to interface directly with our robot's PWM controller and LIDAR using the respective C-based drivers. This further increases our robot's responsiveness as we do not need to worry about significant system hang from expensive abstractions like ROS. Note that the LIDAR does not refresh at the rate at which our RL algorithm refreshes (60Hz vs 100Hz). We found very early on that the LIDAR does not return values at a consistent rate - instead, every call to the LIDAR value getter would get some arbitrary number of values which have changed since the last call, which could result in major loss of data depending on the rate of call. As sensor synchronization is out of the scope of the project, we simply decide to have the control loop sample the LIDAR data at some frequency twice the frequency on paper (Nyquist rate).
Peruse our system diagram here:
Our software is primarily designed to execute our RL design. To enable this, we require the ability to perform actions quickly, evaluate the result of those actions, as well as feed the appropriate data into the RL algorithm to make decisions quickly. As our software stack is rather substantial, this will only provide a high-level overview - for more detail, we encourage you to explore our Github repository.
To abstract away the interfacing with the PWM driver and LIDAR driver, we implement two C++ classes - Drive and Lidar. These classes set PWM values and obtain LIDAR data respectively. In order to maintain responsiveness, we directly send the LIDAR data to CUDA and perform all cleaning operations (e.g. state extraction, region generation) on the CUDA device. However, we recognized that an additional Robot class was required to translate the output delta actions (throttle up, throttle down, turn left, turn right) to the PWM data.
The most substantial class is Control and its CUDA processing stack. Control runs the control loop and manages all data used for the CUDA processing, including allocation, deallocation, and efficient transfer. In object-oriented terms, Control has-a Lidar, and Control also has-a Robot. Control interfaces with the environment exclusively through these two classes. To perform processing efficiently, Control reads directly from CUDA device memory locations to obtain the calculated action, as well as writes directly to CUDA device memory the requisite LIDAR data. Note that the current implementation removes extraneous LIDAR data during this memory copy process from host to CUDA device through pointer arithmetic - possibly a source of bugs when the target center exceeds the defined range. More detail can be found in the Control::control() function located in our source code (src/rc/Control.cpp).
All of these components are required to perform processing asynchronously from each other. Naturally, this would require threading to implement this CPU-level parallelism in addition to the CUDA device-level massive parallelism. Although one of our dependencies is pthread (required by the LIDAR software driver), we choose to use the C++17 <threads> standard template library for our CPU-level multithreading. The primary control threads include the Control thread and the LIDAR thread. Robot and Drive operations are run on the Control thread, as actions should only be taken when Control assigns the appropriately determined actions.
Due to our use of threads for the Control loop, note that proper use assumes the Control user keeps the application alive while the Control loop is running. The parent thread is responsible for calling the Control::start() and Control::stop() functions to manage the Control thread. An example could be seen in our source code (src/main.cpp).
The Lidar class also implements a debug graph of found Lidar values. This can be called using Lidar::graph(std::ostream) (implemented in src/iface/Lidar.cpp).
Finally, there is a massive list of configuration options - including RL parameters, state cleaning options, action availability, LIDAR maximum range, LIDAR target distance, LIDAR graph options, etc. All configuration options could be found in the `hdr/Common.h` folder of our source code. Appropriate modification can drastically change the behavior of the robot - definitely try changing these values to see if they move to your needs.
Note that all code was written under extreme time pressure - please excuse any instances of poor coding style and lack of in-line documentation. We spent time in our high level architecture to ensure components are understandable contextually even if some specific implementations are done in dirtier code. If there is any bug, feel free to open a Github issue, we'll try to fix it if there is demand.
Here is our hardware wiring diagram:
One limitation of our project is that, as three software focused engineers, we really did not focus on the hardware aspect. The only major modification made to our wiring diagram from the traditional setup found in the course is the removal of the camera unit and the addition of the LIDAR. All components were mounted mechanically using copious amounts of tape. As a result, our robot is not particularly aesthetically pleasing - unless you really like tape, in which case it is extremely aesthetically pleasing. Still, our robot performed exceptionally well despite the suspicious mechanical engineering involved - leaving us in the opinion that tape is king when rapid prototyping. The ability to move components at will was very helpful when working with the LIDAR. As the LIDAR sometimes has some blind spots - either due to the hacky software sampling or a real mechanical failure of the device - we can simply rotate the LIDAR as needed. Additional tape was used to secure the external LIDAR USB board, which we found had a tendency to rise up and block some directions of the LIDAR.
Due to the 360-degree nature of the LIDAR, we were required to mount it at the highest point on our robot. However, this exemplified an iceberg effect where tapered objects (such as cones) were seen as much smaller than they actually were. This is a result of the fact that the LIDAR data produced is only information about a single horizontal plane, greatly limiting the data collected. We found that the iceberg effect was acceptable given that the only tapered objects encountered were cones, which do not result in much consequence when hit. However, future modifications to LIDAR mounting may include simply sticking it in the front and removing the data that collides with the vehicle, or designing some robot that's so incredibly flat that the highest point is essentially the same plane as the rest of the vehicle.
Our complete codebase can be found here.
The prerequisites and install instructions may be found in the README.md within the repository. Note that nvidia-cuda-toolkit is apparently not available on the Jetson Nano image which we tested on - however, the preinstalled CUDA libraries worked with some finicking with the CMake files. This finicking is included with the repository.
Our repository also takes advantage of Github Actions for some basic continuous integration. Every push was checked using Github's automated tools for whether or not it could be compiled. This was very helpful, especially in the early stages of our project where we were yet to begin rapid iteration on physical hardware. Dependencies could be truly verified, and any changes to CUDA code could be checked via compiler even on machines without nvcc (nvidia cuda compiler) installed.
If you have any questions or bugs you really need to be solved, feel free to open a Github issue, or email me at firstname.lastname@example.org