From MAE/ECE 148 - Introduction to Autonomous Vehicles
Jump to navigation Jump to search

Project Outline

The original objective is to make the DJI tello drone autonomously land on a moving car. What we've achieved so far was to be able to land on a stationary object.

  1. Obtain Useful Video Input
  2. Image Processing
  3. Drone follow car
  4. Drone land on car

Team Members

  • Ali Zolfaghari - Electrical Engineering
  • Amir Levine - Computational Physics
  • Dylan Vizcarra - Computer Engineering
  • Wei-hsiu (Matthew) Hung - Mechanical Engineering

Step 1. Obtaining Video Input

The idea is that, since we are landing the drone on the car, it should be our best interests to be able to see the car from the drone. i.e. we wanted a video stream of birds eye view.


The drone does have cameras on the front side and on the bottom side, but only the front camera was accessible for us. The solution was straightforward: to mount a mirror in front of the drone. The hard part was that almost all the surfaces on the drone are turning curves, so we take off the lid to find mounting points. Another hard part was that the CAD model of the drone wasn't available, so there are a lot of craftsmanship, and many trial-and-error prints before we get a perfect one to work with.

CAD of mirror mount
Assembly CAD of mirror mount parts
Putting it on
She's looking at you


We initially began the process using open-source Python and OpenCV-Python package and a TelloPy package to interface with the drone. The TelloPy package presented difficulties during implementation in which we were unable to obtain video feedback from the drone.

To overcome this, we made the transition to Go programming language which is used to support the Gobot framework and offers simple interfacing with the Tello Drone. To implement hsv/object detection, we required the open-source package, GoCV (download shown below).
To begin interfacing and receiving video-feedback from Tello, the following tutorial can be used: https://gobot.io/blog/2018/04/20/hello-tello-hacking-drones-with-go/

Step 2. Image Processing

The input is the video stream, and output is the coordinates of the car within the video frame. We ran parallel on two methods to detect where the car is. Using strict image processing (HSV fitlers), we were able to get good object detection due to our unique blue-colored object and no other interference with surrounding blue objects. To take things one step further, we also trained a cascade classier on our unique object in multiple different ways in order to make it more applicable to real-world scenarios in which we would expect an uncontrolled environment and unexpected interference. The hope was to have two methods complementing each other's disadvantages.

Object detection with HSV Filters (Color Tracking)

One of the methods we used for object detection was tracking via color. In order to do so we put a big blue plate on top of our vehicle which stood out compared to the surrounding. This method works extremely well if the color being tracked is the only one in the frame. That being said, if there is two blue objects in the camera view, then tracking one is going to be complicated. In order to detect the the blue plate, we first have to convert the camera feed from the BGR (Blue Green Red) Colormap, to the HSV (Hue Saturation Value) colormap. This is solely because doing image analysis on HSV images is easier than BGR images. Then we defined two arrays, which held the upper and lower limits for the color we were looking for. Using those upper and lower boundaries we created a binary image (purely black or white) which showed a white pixel if the original color of the pixel was in the range, and black otherwise. The output was a black and white image which made all the blue pixels white and everything else black. Once we had detected our colored object then we had to determine its center. We accomplished this by finding the moments of the picture. Moment00 gave us the total area or the total number of white pixels. Moment01 and Moment10 gave us the intensity in the x and y direction. By dividing the the total area by the axis intensities, we got two numbers which gave us the center in the x and y axis. Putting them together and we had the center of intensity of the frame which theoretically speaking was the center of the object we were trying to follow.

How detector codes work.

Sudo Code:

for {

               buf := make([]byte, frameSize)
               if _, err := io.ReadFull(ffmpegOut, buf); err != nil {
               //reads image
               img, _ := gocv.NewMatFromBytes(frameY, frameX, gocv.MatTypeCV8UC3, buf)
               if img.Empty() {
               //copy of orginal feed
               img2 := gocv.NewMat()
               address := &img
               //Blur/Smoothing picture
               //gocv.MedianBlur(img, &img, 5) 
               //changes the img from BGR to HSV
               //HSV lower and upper limits to detect the color blue
               low := gocv.NewScalar(100.0,50.0,50.0,0.0)
               high := gocv.NewScalar(140.0,255.0,255.0,0.0)
               //makes a binary image if the color is in range 
               //finds the moments of the picture
               moments := gocv.Moments(img,true)
               moment10 := moments["m10"]
               moment01 := moments["m01"]
               area = moments["m00"]
               //filter out noise
               if area < 200 {
                   area = 0
                   detected = false
               } else {
                   detected = true
               //center of intensity
               posX = int(moment10/area)
               posY = int(moment01/area)
               //fmt.Println(posX," ",posY)
               //fmt.Println(detected, "\n")
               //draw center
               if window.WaitKey(1) >= 0 {
                       //time.Sleep(5000 * time.Millisecond)


Object Detection with HAAR Cascade Classifier and GoCV

To get started with Object Detection, there are multiple ways to go about it. If you have the full version of OpenCV installed (not just OpenCV-Python) then it should have included a HAAR cascade trainer in the download. With this, there is the capability to run the commands listed below directly from the command line. Check out the following subtabs opencv_createsamples and opencv_traincascade.

If you don’t want to download the full version or have trouble downloading it (like in my case), there’s a way to download the HAAR cascade trainer as a standalone program following this tutorial: https://www.cs.auckland.ac.nz/~m.rezaei/Tutorials/Creating_a_Cascade_of_Haar-Like_Classifiers_Step_by_Step.pdf

Because some of the executables are already generated, you’ll see that they can only accept images in BMP format. To overcome this, we wrote a simple python converter that will take all JPG images (or any other type you need) located in the Raw Data - Copy directory and convert them all to BMP files. This is extremely helpful for large datasets:

from PIL import Image
import glob
import cv2
i = 0
for img in glob.glob("rawdata - Copy/*.jpg"):
   n= cv2.imread(img)

With this conversion, we were able to run the normal the commands as given in the tutorial. With the full OpenCV, you shouldn't have this problem.

Given below is a basic data flow diagram of how the cascades are generated/used. The diagram on the right shows the flow of generating the cascade classifier. The diagram on the left shows the flow of the using the trained detector.

How detector codes work.
How classifiers are generated.


This process is fairly straightforward. It rotates the picture you took, and scale down to the size you want. Also add background to it if you specified.
The input is a photo.
The output is a vector file, which can be used by the cascade classifier training program.
An example line of command:

opencv_createsamples -vec pos_ebu2_shade/e2sh_pos25_wh24_$i.vec -img pos_ebu2_shade/img$i.jpg -bg pos_ebu2_shade/bg.txt -bgthresh 0 -num 25 -maxxangle 0.3 -maxyangle 0.3 -maxzangle 3.1 -w 24 -h 24 -rngseed 736637 

The followings are the parameters that I worked with, and some of them are not very well described in the tutorial. Hopefully this list can help you out:

- vec <vec_file_name> : Name of the output vector file containing the positive samples for training.
- img <image_file_name> : Source object image (e.g., a company logo).
- bg <background_file_name> : Background description file; contains a list of images which are used as a background for randomly distorted versions of the object.
- num <number_of_samples> : Number of positive samples to generate.
- w <sample_width> : Width (in pixels) of the output samples.
- h <sample_height> : Height (in pixels) of the output samples.
  • Observations 1. Square pictures are better
While it is being rotated, the dimension of the sample doesn't change. The image will be distorted very badly if you have a long image that is turned 90 degrees and then scaled back to the original dimension.
- show : Useful debugging option. If specified, each sample will be shown. Pressing Esc will continue the samples creation process without showing each sample.
- maxxangle <max_x_rotation_angle> : Maximal rotation angle towards x-axis, must be given in radians.
- maxyangle <max_y_rotation_angle> : Maximal rotation angle towards y-axis, must be given in radians.
- maxzangle <max_z_rotation_angle> : Maximal rotation angle towards z-axis, must be given in radians.
  • Observations 2. Rotation Limits
I believe too much rotation actually introduces too much noise, and gives a bad classifier. For my purpose, the drone is always looking from above, so I didn't need to rotate it too much.
- bgcolor <background_color> : 
- bgthresh <background_color_threshold>:
Background color (currently grayscale images are assumed); the background color denotes the transparent color. Since there might be compression artifacts, the amount of color tolerance can be specified by -bgthresh. All pixels within bgcolor-bgthresh and bgcolor+bgthresh range are interpreted as transparent. If you want the entire image as it is, simply set -bgthresh to be zero.
- rngseed <some integer>
  • Observations 4. Random Seed
If not specified, the default value is 12345 and it doesn't change. That means every source will go through the same series of rotations. I'm not sure how this would affect the training but I use different value for every source anyways.
- inv : If specified, colors will be inverted.
- randinv : If specified, colors will be inverted randomly.
- maxidev <max_intensity_deviation> : Maximal intensity deviation of pixels in foreground samples.

Using the -show option, you can see them. Example results are:

Some big rotation
If bg are specified
  • Observations 5. Merge vectors files from different sources
In the beginning, I generated 500 samples from one single source and trained on it. Later on, I generate 10 samples from 50 different sources each, and train on these 500 samples. This worked better for me. The code that is required to merge different vector files into one is here: https://github.com/wulfebw/mergevec . The instruction is clear and simple.


First of all, a very useful tutorial on classifier training: https://docs.opencv.org/3.4/dc/d88/tutorial_traincascade.html
The input is the positive samples, negative samples, and there are a few parameters to specify.
The ouput is the casacde classifier .xml file, which can be used later for object detection.
An example line to run:

opencv_traincascade -data train_ebu2_550from22_wh40_p500n1000_FAR0.5 -vec pos_e2_550from22 -bg neg_ebu2.txt -numPos 500 -numNeg 1000 -numStages 25 -precalcValBufSize 5120 -precalcIdxBufSize 5120 -acceptanceRatioBreakValue 0.00001  -w 24 -h 24 -minHitRate 0.99 -maxFalseAlarmRate 0.5 -weightTrimRate 0.95 -maxDepth 1 -maxWeakCount 100

The followings are the parameters that I worked with, and some of them are not very well described in the tutorial. Hopefully this list can help you out:

- data <cascade_dir> : Where the trained classifier should be stored. This folder should be created manually beforehand.
- vec <vec_file> : vec file with positive samples (created by opencv_createsamples utility).
- bg <background_file> : Background description file. This is the file containing directory path to the negative sample images.
- numPos <number_of_positive_samples> : Number of positive samples used in training for every classifier stage.
- numNeg <number_of_negative_samples> : nneg is the amount of negative windows, that are grabbed from your set of negative images. It are not the images itself, but the negative windows, equalling the size of your model window that are randomly grabbed from your negatives dataset. So yes, you can actually use like 3 images of 1000x1000 pixels and deduce over 1000 negative windows of 15x15 pixels for example.
  • Observations 1. Background in the Positive Samples
On my first attempts to generate the positive samples, I did my best to remove all the information except the object itself, in the hope to minimize false detection. However, it turns out to be a bad move. The boundary between the object and the background was just unnatural.
detects nothing
the classifier was generated from positive samples like this
- w <sampleWidth> : Width of training samples (in pixels)
- h <sampleHeight> : Height of training samples (in pixels)
  • Observations 2. Sample Sizes
I started on -w 24 -h 24 is because this is roughly the sample size that I see on all the example codes. When the classifier didn't work for me, I suspected that making the sample sizes small might causes me to loose information within the image. So I bumped it up to -w 40 -h 40. It turns out, it didn't give me a better classifier, but the training time increased from 5 min to 2.5 hours. It takes much more time for the detector code to process the classifier too, which is very bad for our goal. We are aiming to do real-time detection while the drone is flying.
- numStages <number_of_stages> : Number of cascade stages to be trained. But if acceptance ratio break value is reached sooner, the training will stop early.
- minHitRate <min_hit_rate> : Minimal desired hit rate for each stage of the classifier. Overall hit rate may be estimated as (min_hit_rate ^ number_of_stages)
- maxFalseAlarmRate <max_false_alarm_rate> : Maximal desired false alarm rate for each stage of the classifier. Overall false alarm rate may be estimated as (max_false_alarm_rate ^ number_of_stages)
  • Observations 3. max False AlarmRate
Occasionally, lower max false alarm rate does help.
False Alarm Rate set at 0.5
False Alarm Rate set at 0.25
- acceptanceRatioBreakValue <break_value> : This argument is used to determine how precise your model should keep learning and when to stop. A good guideline is to train not further than 10e-5, to ensure the model does not overtrain on your training data. 
  • Observations 4. Acceptance Ratio
Actually I went over the suggested value and obtained better results.
training stops at 0.001
training stops at 0.0001
training stops at 0.00001
training stops at 0.000002
- weightTrimRate <weight_trim_rate> : Specifies whether trimming should be used and its weight. A decent choice is 0.95.
- maxDepth <max_depth_of_weak_tree> : Maximal depth of a weak tree. A decent choice is 1, that is case of stumps.
- maxWeakCount <max_weak_tree_count> : Maximal count of weak trees for every cascade stage. The boosted classifier (stage) will have so many weak trees (<=maxWeakCount), as needed to achieve the given -maxFalseAlarmRate.
- mode <BASIC (default) | CORE | ALL>
three modes to try out
- precalcValBufSize <precalculated_vals_buffer_size_in_Mb> : Size of buffer for precalculated feature values (in Mb). 
- precalcIdxBufSize <precalculated_idxs_buffer_size_in_Mb> : Size of buffer for precalculated feature indices (in Mb).
* * *  The more memory you assign the faster the training process, but keep in mind that -precalcValBufSize and -precalcIdxBufSize combined should not exceed you available system memory.  * * *

Detector Codes: detectMultiScale()

With a classifier generated, a basic detectMultiScale() function can be called similar to normal OpenCV and will return an iterable object for possible detected objects. This is the most important function call to implement object detection. There are methods for converting the image in different ways before running the function to try to yield better results. For example, converting to gray scale with OpenCV and perform detection:

target_cascade = cv2.CascadeClassifier("yourClassifier.xml")
img = cv2.imread("someImage.jpg")
gray_img = OpenCV.cvtColor(img, cv2.COLOR_BGR2GRAY)
detected_objects = cascade.detectMultiScale(gray_img)

Overlapping Different Cascades

The idea behind cascade overlapping is straight-forward. We were able to run multiple trained cascades simultaneously to obtain an iterable object returned by each cascade in the form of (x,y,w,h). In this form, (x,y) are the coordinate to the upper-left detected object and (w,h) are its width and height, respectively. Given two rectangles a simple function can be written to determine if there's an overlap:

func is_overlap(rect1 image.Rectangle, rect2 image.Rectangle) bool{
   if rect1.Min.X < rect2.Min.X{
     if rect1.Min.Y < rect2.Min.Y {
        return rect1.Max.X > rect2.Min.X && rect1.Max.Y > rect2.Min.Y
     } else{
        return rect1.Max.X > rect2.Min.X && rect1.Min.Y < rect2.Max.Y
   } else if rect1.Min.X > rect2.Min.X {
     if rect1.Min.Y < rect2.Min.Y {
       return rect2.Max.X > rect1.Min.X && rect2.Max.Y > rect1.Min.Y
     } else{
       return rect2.Max.X > rect1.Min.X && rect2.Min.Y < rect1.Max.Y
   } else {
     return true

Rectangles can be drawn over the image when the result is true giving results similar to the what is shown below, where the red and blue square represent the object detected by separate cascade classifiers:

How detector codes work.

Knowing where two different cascades overlap can be used give a more accurate determination of the detected object thus further reducing false positives. There are multiple way that this can be implemented, either using the same data to train multiple cascades or to use different data when training each cascade. When using the same data, it's important to train under different parameters to develop a slightly different cascade as explained above. When using different data, each cascade should be generated using parameters to optimize object detection for the given data set. For us, this was done by trial and error to train what we thought to be an efficient cascade classifier.

from openCV to goCV

GoCV is a wrapper for OpenCV to be compatible with the GO programming language. The GoCV installation listed below will include OpenCV.
There's no major reason why we prefer GO(for GoCV) over Python(for OpenCV). We just happened to get video streaming more easily with Gobot than with python.
An example code that does the object detection is:

myClassifier := gocv.NewCascadeClassifier()
obj_coords := myClassifier.DetectMultiScale(img)

Step 3. Algorithm of Following

- - - - -  hey Amir can you put more description into this part? - - - - - 
Tello turn.jpg

Step 4. Algorithm of Landing

- - - - -  hey Amir can you put more description into this part too? - - - - -

In short, when the drone believes that it is right on top of the car, it drops altitude. Based on how big the landing pad is in the image, we can tell roughly how close we are. When the drone is close enough, it cuts power with drive.halt().

IMPLEMENTATION: Consolidate All into Codes


Code for Drone Tracking Car using Color Detection can be found here: https://github.com/mohammadali1923/ECE148Fall2018Team7AutonomousDrone.git


Because we wanted the process of object detection to not interfere with the drone controls, we needed to implemented simple multi-threading in GO. Although it wasn't used in our case, Go supports communication across threads using "Channels" explained here: https://gobyexample.com/channels


Drone tracking Car

video1: https://youtu.be/d611BlFMB5E
video2: https://youtu.be/nUQHuwqCUlY
video3: https://youtu.be/nKBSIBvuE3Q

Drone Landing on stationary Object


Drone view


TUTORIALS for the next team: Software setup

GO Language

To get started with GO, download the appropriate package from: https://golang.org/doc/install#install


To learn how to use GoCV, this is a very good example code to work with: https://gocv.io/writing-code/face-detect/


To get started stated with Gobot, use the following tutorial: Gobot Tutorial



For mac user, it is rather simple. Get homebrew installed on your computer, then in the terminal window,

User$ brew install opencv