I had a chance to play with TensorFlow for one of the projects. I've never had any machine learning experience before, as well as Python itself (🔥). In this article I will explain how did I train my first model on a custom image set for object recognition purposes. For some of you this article may be supremely mediocre, so make adjustments for yourself.
Firstly, this tutorial has been designed for UNIX systems, so if you are running Windows - stages may be quite different.
The problem 🤔
I have a small sticky note with chessboard-style rectangular shapes. My imaginary task is to track/recognise this particular sticker on any given image. Here is a sticker for reference:
And the main idea is to find this sticker on any other image. Like this one:
Project setup 👩💻
For this task we need to install/download couple of mandatory repositories/tools.
TensorFlow is based on Python, and almost 99% of this tutorial will be based on it. Please, install the latest (on a moment of this post it is 3.7) version since some of supporting TensorFlow libraries are unavailable prior to 3.6.
pip install tensorflow
to install through the Python Pip package manager or follow the link to get a different version:
3. TensorFlow models
git clone https://github.com/tensorflow/models.git
Once everything is set up, it is time to prepare some test data!
Preparing test data
Our training data will represent a set of labeled pictures in CSV format. Simply speaking, we are using a supervised ML approach, where our model is trained on a data with correct outcome. Imagine that we are going show a number of pictures with highlighted sticker in it :)
For this purpose I used LabelImg which has an easy UI for zone highlighting and XML export. Let's install it!
Go to: https://github.com/tzutalin/labelImg and follow installation example for your system. For me it was a Python3 setup on High Sierra (from author):
brew install qt # will install qt-5.x.x brew install libxml2 make qt5py3 python3 labelImg.py As a side note, if mssing pyrcc5 or lxml, try: pip3 install pyqt5 lxml
Once the UI tool started, we highlight desired area (sticker in our case) and save as XML. In the end, your output file should look like:
<annotation> <folder>data</folder> <filename>1.jpg</filename> <path>/data/1.jpg</path> <source> <database>Unknown</database> </source> <size> <width>1080</width> <height>720</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>skate_tag</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>226</xmin> <ymin>443</ymin> <xmax>528</xmax> <ymax>532</ymax> </bndbox> </object> </annotation>
Now we should repeat the same process... for every test picture (in my case I had only 5, so it was not a big deal).
What we have done was basically to mark sticker area to highlight what we are expecting to recognise.
But there is another thing!
TensorFlow doesn't support XML files, so we need to convert all generated XML snippets into CSV. Thankfully, a fast Python script has been already available on Git will help us to convert labeled XML to CSV:
You can run above-described file in the same folder with all of your CSV label metadata.
Don't forget to change raccoon_labels.csv into training.csv.
As an output training.csv, you will get an aggregated version of all XML files in a single CSV:
filename,width,height,class,xmin,ymin,xmax,ymax 1.jpg,1080,720,skate_tag,226,443,528,532 2.jpg,1080,720,skate_tag,168,285,273,457 3.jpg,1080,720,skate_tag,548,182,997,474 4.jpg,1080,720,skate_tag,526,326,703,386 5.jpg,1080,720,skate_tag,570,194,708,335
Since we are already generating many files, let's agree on a folder structure:
./ └── data └── training.csv
Where training.csv is our CSV aggregation of XML labels.
What's next? CSV is awesome, but TensorFlow doesn't understand it. We need to convert aggregated dataset into TensorFlow Record 🔥
Converting CSV aggregated file into TensorFlow Record (TFR)
Luckily enough, there is already a script doing transformation of CSV into TFR:
This script accepts our CSV input path and TFR file output path. We can launch it by executing:
python generate_tfRecord.py --csv_input=data/training.csv --output_path=data/training.record
This record file will be used as a feed for training process.
Now our directory looks like this:
./ └── data ├── training.csv └── training.record
Oh, almost forgot!
We will also need to create a testing record file which we will feed with training record in order to train our model. I was lazy enough to literally copy-paste training dataset, but I definitely don't recommend to do it. In a perfect scenario training dataset should be much-much bigger than a testing one.
And also don't forget our images! We will need them... Final directory look should be like:
./ ├── data │ ├── testing.csv │ ├── testing.record │ ├── training.csv │ └── training.record └── images ├── 1.jpg ├── 2.jpg ├── 3.jpg ├── 4.jpg └── 5.jpg
Alright, we are done with a data! 🎉 Let's train!