From MAE/ECE 148 - Introduction to Autonomous Vehicles
Jump to navigation Jump to search


Our project aims to demonstrate the possibility of implementing adaptive cruising for autonomous driving. We wanted to focus on training our car to detect a change in its environment while driving autonomously. Specifically for our case, a tennis ball was used to signify a change in the car’s surrounding. Through the use of two cameras, two time-of-flight sensors, and OpenCV, we were able to train our car to detect and follow a tennis ball while driving autonomously.

Team Member

Jiayun Ju - MAE
Sivleng Kouv - BENG
Chunyao Liu - ECE
Yangting Sun - ECE


Hardware Design

Mounting Plate

The design of the mounting plate allowed for a lot of surface area to attach the components. Moreover, the spacing in the middle was crucial in enabling easy access to other components and easy wiring.


We used AutoCAD to design the mounting plate and a laser cutter to cut out the plate.

Camera Mount

For our camera mount, we wanted flexibility in both the height and angle of the camera. That being said, the handles attached to the base of the camera mount is moveable so the height can be adjustable. Likewise the flat piece that is directly attached to the camera is also adjustable so that the camera angle can be modified for optimal viewing.



The webcam was positioned below the mounting plate because it guaranteed the most security and additionally, placing it there prevents it from interfering with the peripherals of the piCamera.


A USB camera is used for detecting the tennis ball. The output of the webcam will go through OpenCV and become a mask of the tennis ball. With this mask, the Donkey Car can locate the tennis ball according to the ball's radius and xy position in the mask.

Distance Sensor

We used two Pololu VL53L1X TOF sensors to detect the distance. Since both of the sensors had default address 0x29 for the Raspberry Pi, XSHUTDOWN ports were used to change the addresses of the sensors. We used the function 'change_address' in VL53L1X2 library to assign two different addresses (--0x29 and --0x30) to the sensors.




Software Process


OpenCV is used to create a filter for the image taken by the webcam. The upper and lower bound for the filter in HSV color space are (30,45,70) and (64,255,255). The detail codes for this step is shown as follows:

  • WebCam Picture
           snapshot = self.cam.get_image()
           snapshot1 = pygame.transform.scale(snapshot, self.resolution)
           frame1 = pygame.surfarray.pixels3d(
               pygame.transform.rotate(pygame.transform.flip(snapshot1, True, False), -90))
  • Filter
           greenLower = (30, 45, 70)  # Set boundary for green (HSV Color Space)
           greenUpper = (64, 255, 255)
           img = frame1.copy()  # Import image
           blurred = cv2.GaussianBlur(img, (11, 11), 0)
           hsv = cv2.cvtColor(blurred, cv2.COLOR_RGB2HSV)  # Transfer to HSV color space
           mask = cv2.inRange(hsv, greenLower, greenUpper) # Create mask to find green areas
           mask = cv2.erode(mask, None, iterations=2)  
           mask = cv2.dilate(mask, None, iterations=2)
           cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) #Find all the contours in the mask
  • Find the tennis ball
           if cnts[1] == []: #In case no tennis ball
               x, y, radius = [0, 0, 0]
               c = max(cnts[1], key=cv2.contourArea) #Select the contour with the maximum area as the contour of the ball
               ((y, x), radius) = cv2.minEnclosingCircle(c)  # get center and radius of the ball
               # img = cv2.circle(img, (int(y), int(x)), int(radius), (255, 0, 0), 2)
               for row in range(mask.shape[0]):
                   for col in range(mask.shape[1]): 
                       if (row-x)**2 + (col-y)**2 > radius**2:
                           mask[row][col] = 0
                           mask[row][col] = 255

Fusion Image

  • Data sources

-In order to use the neural network, we use only one image as the input of the training process at a time. To achieve that, we replace three channels of the image with the data from three different sources.(Picam, Webcam and TOF sensor)

  • Fusion Picture

-We create a new class called MutilCamera in camera.py to get the fusion image. For Picam, we transform the image from RGB color space to gray scale in order to compress 3 channels into one channel. For webcam, we use the mask as mentioned before. For the sensors, the outputs are the distances measured by two sensors. We expand two numbers into a matrix of the image's size so as to combine them with the other two pictures.

Fusion pic.jpg

  • Reason for this

-The reason why we process the datas like this is because we can easily fit the data pipeline of the car. We can store and train the datas as same as how we did before. For the future works, we will change the structure of the Neural Network to make it trainable with 7 channels: 3 for PiCamera picture, 3 for WebCamera picture and 1 for ToF sensor data.


Demo of Convolution Operation [1]

  • Code sample:
           # PiCamera: frame2
           f = next(self.stream)
           frame2 = rgb2gray(f.array)
           # Sensor info
           Dist_M = np.zeros((self.image_h, self.image_w))
           distance_in_mm1 = self.tof.get_distance(1) # Get the output of the sensors
           distance_in_mm2 = self.tof.get_distance(2)
           d1 = np.round(min(distance_in_mm1 / 1000 * 255, 255)) # map the distance to 0-255
           d2 = np.round(min(distance_in_mm2 / 1000 * 255, 255))
           Dist_M[:, :int(self.image_w / 2)] = int(d1) # Fill the matrix with the output number
           Dist_M[:, int(self.image_w / 2):] = int(d2)
           # Fusion of both cameras, testing
           self.fusion[:, :, 0] = frame2 #Picam for channel 1
           self.fusion[:, :, 1] = mask #Webcam for channel 2
           self.fusion[:, :, 2] = Dist_M #Distance Sensor for channel 3
           self.frame = self.fusion #Get final fusion image

Full codes of the camera can be found here.

Training & Testing

At the very begining, we tried to follow the tennis ball only. For this purpose, we didn't care about the surroudings, and thus, we only used the mask images processed by the webcam for training. The Donkey car was placed on the car stand facing the wall during data collecting to decrease the noise. The results turned out to be acceptable as shown in the video below.


Next step, we used multiple cameras and trained the car on the outdoor track in EBU2. Though the outdoor track was much easier to be influenced by the weather, it was available to us at any time and the white track could be easily distinguished from the tennis ball. Two datasets are collected for training --one following the track only and the other following both the track and the tennis ball.


Here is the link to our Testing with Outdoor Track demo

After the training process, we took the car outside to test its ability to recognize a tennis ball while driving autonomously. Although the performance is poor in that the car is very slow to detect the change in its environment, our result nevertheless serves as a proof of concept that adaptive cruising is possible. Optimizing this project, however, would require more time, training, and other resources.


[1] Convolutional Neural Networks (CNNs / ConvNets). (n.d.). Retrieved from http://cs231n.github.io/convolutional-networks/?fbclid=IwAR3YB5qpfcB2gNavsqt_9O9FEQ6rLwIM_lGFmrV-eGGevotb624XPm0yO1Q