Friday 9 March 2018

Adding vision to your AIY Project in 4 easy steps (and 1 tricky one)

Back in May 2017 the The MagPi came with a Google Voice HAT, and instructions, that would turn a Raspberry Pi into a Google Assistant. Initially triggered by a button press, but soon updated to voice activated, allowing you to ask it questions and give it commands in a similar manner to a Google Home or Amazon Echo device.

After having put together the kit and playing with it for a while I decided to look at adding a camera to the device and connect it up to Google's Vision APIs. This was something that was covered in the Raspberry Pi Blog back in 2016, but it looks like the APIs have changed slightly since then. So after a bit of hunting around and testing here are the steps I took to setup and extend the base Google AIY package to include vision support.

Install Google AIY image

Follow the instructions on the offical Google Voice Kit page (Assuming you haven't already gone through these steps). For reference the software image I used was aiyprojects-2018-01-03.img.xz.

You'll need to complete the 'Custom Voice User Interface' section of the instructions to enable the cloud speech APIs as well (This should end with a cloud_speech.json file in the home directory of the Raspberry Pi).

Configure Camera

Follow the official instructions on how to setup and connect the Raspberry Pi Camera.
If you're reading through these instructions before following them (which of course you should be!) its almost certainly worth connecting the camera cable to the Raspberry Pi first, so it can be fed through the slot on the Google Voice Kit HAT.

The cardboard case included in the kit has a convenient hole for the camera lens to poke through, with the flaps holding the camera in place without needing to tape or screw it into place. Almost as if it was meant to have a camera installed in it!

The camera is held in place between the two pieces of cardboard.The lens of the camera pokes through the hole.

Every time we run 'raspistill' to take a photo the camera performs various calibration tasks, setting up the hardware, working out the light level etc. Which usually takes 5 or more seconds to complete.

To avoid having this delay every time we ask the Raspberry Pi to identify an object we want raspistill to be constantly running in the background, ready to take a photo at any time. Luckily raspistill already supports this with the '-s' option.

raspistill -rot 180 -o /tmp/aiyimage.jpg -s -t 0 -w 640 -h 480

We want this to execute every time the RPi starts up, so edit crontab using 'crontab -e' and add the following line to the end of the file.

@reboot raspistill -rot 180 -o /tmp/aiyimage.jpg -s -t 0 -w 640 -h 480 &

To test the above command is working reboot the RaspberryPi and then run

kill -s SIGUSR1 `pidof raspistill`

If all has gone well then the Raspberry Pi will take a picture and store it at '/tmp/aiyimage.jpg'.

Enable Vision API in Google account

This is, potentially, the tricky step as it requires having a credit card to enable billing on your Google account, as well as the service being available in your country.
Just follow the instructions at making sure you enable the API on the correct project (aiyproject if you followed the Google Voice setup instructions above).

Install Vision API Python libraries

To utilise the Google Vision APIs we need to install the python libraries (as detailed at However the Google application runs within a Python Virtual Environment to keep its selection of python libraries separate from any others installed on the Raspberry Pi. This means we have to take the extra step of entering the virtual environment before installation. This can easily be achieved by launching the 'Start dev terminal' shortcut from the Desktop, or by running '~/bin/' from a normal terminal (e.g. if connecting via SSH).

Once inside the Virtual environment run the following to install the libraries

pip3 install google-cloud-vision

Using the Vision APIs

I've written two scripts that exercise the Vision APIs, '' which calls the Vision APIs themselves, and '' that talks to the Voice APIs. The scripts should be places in the /home/pi/AIY-projects-python/src/examples/voice folder, and can be easily fetched using the 'wget' command.

cd /home/pi/AIY-projects-python/src/examples/voice

Launch the script from inside the 'dev environment' (In the same way as you'd launch the regular demo)


Then all you have to do is point the camera at something, press the button and say one of the following voice commands.

What is that?Requests a list of labels from Google and reads out any over 80% confidence.
What logo is that?Requests Google to identify the logo in the picture
What does that say?Reads out any text detected in the picture

Example of use.

Below is a short demonstration video of the scripts in action. I've demonstrated this at a couple of Raspberry Jams and got interest from both kids and adults. The kids especially were trying different items for it to identify, one even taking his shoe off see what it would say ('blue trainer' being the result).
I do have an idea or two of what I can do with this script, nothing especially useful, but something that is a little more interactive. Hopefully, with the help of this guide, other people will come up with fun and interesting projects!