Nov 30, 2016

A hands-on look at the Amazon Rekognition API

Amazon Rekognition is a Deep Learning based image analysis service. Don't worry though, you won't have to wade through Machine Learning / Deep Learning mumbo jumbo to work with Recognition. Quite the contrary, as Rekognition provides a very easy-to-use API. 

It allows developers to:
  • detect thousands of objects and scenes; 
  • analyze faces; 
  • compare two faces to measure similarity;
  • build face collections and match faces against these collections.
As usual, this service can be used with the AWS CLI (as in 'aws rekognition' ), or with one of our language SDKs. I'll show you some CLI examples first and then we'll use the popular Python SDK, aka boto3.

First things first: how do we send images for processing? Two options: send the image as a byte blob or put it in S3. I suspect the most of use will use the second option, so that's what I'll use. Time to play!

$ aws rekognition detect-faces --image "S3Object={Bucket="jsimon-public", Name="julien1.jpg"}" 

{ "FaceDetails": [ { "BoundingBox": { "Width": 0.3883333206176758, "Top": 0.12222222238779068, "Left": 0.33666667342185974, "Height": 0.2588889002799988 }, "Landmarks": [ { "Y": 0.23426248133182526, "X": 0.46131378412246704, "Type": "eyeLeft" }, { "Y": 0.22791674733161926, "X": 0.5936729311943054, "Type": "eyeRight" }, { "Y": 0.27828338742256165, "X": 0.5404868721961975, "Type": "nose" }, { "Y": 0.3229646682739258, "X": 0.48395034670829773, "Type": "mouthLeft" }, { "Y": 0.31654009222984314, "X": 0.5957114696502686, "Type": "mouthRight" } ], "Pose": { "Yaw": 4.216298580169678, "Roll": -4.777482509613037, "Pitch": -2.406636953353882 }, "Quality": { "Sharpness": 70.0, "Brightness": 65.17163848876953 }, "Confidence": 99.99468231201172 } ], "OrientationCorrection": "ROTATE_0" } 

JSON, the cornerstone of any nutritious service. So, what do we have here? A face has been found with 99.99+% confidence. It's delimited by the BoundingBox coordinates (top left corner, face width, face height): these are fractional values with respect to the total height and width of the image. Eyes, nose and mouth have been located too (that's reassuring).

Now, let's see what Rekognition can tell us about this second picture.

$ aws rekognition detect-labels --image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}'

{ "Labels": [ { "Confidence": 99.29261779785156, "Name": "Human" }, { "Confidence": 99.2958984375, "Name": "People" }, { "Confidence": 99.2958984375, "Name": "Person" }, { "Confidence": 99.2667007446289, "Name": "Book" }, { "Confidence": 99.2667007446289, "Name": "Text" }, { "Confidence": 71.22590637207031, "Name": "Bookcase" }, { "Confidence": 71.22590637207031, "Name": "Furniture" }, { "Confidence": 71.22590637207031, "Name": "Shelf" }, { "Confidence": 52.00172805786133, "Name": "Portrait" }, { "Confidence": 52.00172805786133, "Name": "Selfie" } ] }

With a very good level of confidence, this is the picture of a human with books on a bookshelf, possibly a portrait. A pretty good summary.

Let's compare the two previous pictures. Is this truly the same person? Spoiler: yes, although I look 15 years older on the first one. Note to self: no more promo shots after 36 sleepless hours :D

$ aws rekognition compare-faces --source-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien1.jpg"}}' --target-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}

{ "FaceMatches": [ { "Face": { "BoundingBox": { "Width": 0.5596370100975037, "Top": 0.1318063884973526, "Left": 0.3889369070529938, "Height": 0.5596370100975037 }, "Confidence": 99.98912811279297 }, "Similarity": 98.0 } ], "SourceImageFace": { "BoundingBox": { "Width": 0.3883333206176758, "Top": 0.12222222238779068, "Left": 0.33666667342185974, "Height": 0.2588889002799988 }, "Confidence": 99.99468231201172 } }

Similarity is 98%. Jet lag or not, I'm always the same me.

See how simple this service is? I don't see how they could have made it easier. How long would it take to design, build and *train* something like this on your own? I have really no idea and to I don't intend to find out!

Enough CLI, let's switch to Python and run more visual examples. For this purpose, I've written a couple of scripts (available here), using boto3 and the Pillow image processing library.

In a nutshell:
  • rekognitionDetect.py bucket_name image [copy | nocopy ] : try to detect faces inside an image. If faces are found, each of them will be highlighted by a box and an updated image will be saved. The script will also report image labels and face information (gender, beard, glasses, etc.). Maximum number of labels and default confidence are respectively set to 10 and 75% by default.
  • rekognitionCompare.py bucket_name sourceImage targetImage [copy | nocopy ]: try to match a reference face to another image. If the face is found, it will be highlighted by a box and an updated image will be saved.
All images must be present with the same name both locally and in S3 . The last parameter for both scripts allows you to skip the copy to S3 if the file is already there.

Hopefully, the code reads like well-written prose (hi Uncle Bob). If not, blame jet lag (yes, it's the root of all evil). Anyway, there's nothing complicated here, I'm sure you'll figure it out in no time.

Let's play some more!

$ rekognitionDetect.py jsimon-public booth1.jpg nocopy 

output file

Label Human, confidence: 99.3180236816 
Label People, confidence: 99.3190917969 
Label Person, confidence: 99.3190917969 
Label Clothing, confidence: 92.1037216187 
Label Overcoat, confidence: 92.1037216187 
Label Suit, confidence: 92.1037216187 
Label Computer, confidence: 76.0058441162 
Label Electronics, confidence: 76.0058441162 
Label LCD Screen, confidence: 76.0058441162 
Label Laptop, confidence: 76.0058441162 
*** Face 0 detected, confidence: 99.999671936 Gender: Male HAPPY 96.4477920532 CALM 8.28260231018 CONFUSED 1.53788328171 
*** Face 1 detected, confidence: 99.9654922485 Gender: Male Beard Mustache HAPPY 98.5274353027 ANGRY 5.03668212891 CONFUSED 2.61067152023 
*** Face 2 detected, confidence: 99.9955444336 Gender: Male Eyeglasses HAPPY 97.6237945557 ANGRY 1.31589770317 CALM 0.939458608627 
*** Face 3 detected, confidence: 99.9996109009 Gender: Male Eyeglasses HAPPY 98.9962310791 SAD 11.4119710922 CONFUSED 1.69576406479

Say hi to Romain, Cédric and Damian, my friendly AWS colleagues. Rekognition sees 4 males, 1 with a beard, 2 with eyeglasses, all of them very happy... and I'm the calmest of the bunch, how about that. Amazingly, Rekognition manages to catch my hardly visible laptop (left edge of the picture, on the table).

Here's a tougher one (Hallo to my German friends).

$ rekognitionDetect.py jsimon-public oktoberfest.jpg nocopy 

output file

Label People, confidence: 99.0898742676 
Label Person, confidence: 99.0898971558 
Label Human, confidence: 99.0639343262 
Label Alcohol, confidence: 88.8537063599 
Label Beverage, confidence: 88.8537063599 
Label Drink, confidence: 88.8537063599 
Label Crowd, confidence: 84.0972671509 
Label Female, confidence: 84.0796279907 
Label Girl, confidence: 84.0796279907 
*** Face 0 detected, confidence: 99.9854202271 Gender: Male HAPPY 60.5386123657 ANGRY 12.2481765747 DISGUSTED 2.10083723068 
*** Face 1 detected, confidence: 99.9825744629 Gender: Female HAPPY 98.0062866211 SURPRISED 10.8561573029 SAD 0.810676813126 
*** Face 2 detected, confidence: 99.9904937744 Gender: Female HAPPY 84.5134887695 SURPRISED 8.68589305878 ANGRY 1.35719180107 
*** Face 3 detected, confidence: 99.9073257446 Gender: Male Beard Mustache HAPPY 80.5190963745 SURPRISED 23.9800624847 ANGRY 1.17569565773 
*** Face 4 detected, confidence: 99.9972229004 Gender: Male Mustache HAPPY 75.2949371338 CONFUSED 10.9511556625 DISGUSTED 1.91761255264 
*** Face 5 detected, confidence: 99.9999771118 Gender: Male HAPPY 35.9886474609 SURPRISED 3.75992059708 ANGRY 2.48707532883 
*** Face 6 detected, confidence: 99.9915084839 Gender: Female HAPPY 99.4766082764 CALM 0.791561603546 ANGRY 0.620931386948 
*** Face 7 detected, confidence: 99.9998931885 Gender: Female HAPPY 99.8826293945 SAD 7.21873044968 DISGUSTED 5.48685789108 
*** Face 8 detected, confidence: 83.6580963135 Gender: Male Eyeglasses SAD 94.9213943481 SURPRISED 76.9153442383 HAPPY 8.52976131439 
*** Face 9 detected, confidence: 99.9944610596 Gender: Male HAPPY 27.327457428 DISGUSTED 26.6790218353 ANGRY 12.1302127838 
*** Face 10 detected, confidence: 99.9998855591 Gender: Male SURPRISED 99.2624435425 HAPPY 22.0922241211 SAD 6.69546127319 
*** Face 11 detected, confidence: 99.9861831665 Gender: Male SURPRISED 60.7816810608 SAD 7.07310438156 HAPPY 3.66672611237 
*** Face 12 detected, confidence: 99.9990692139 Gender: Male HAPPY 48.0631027222 SURPRISED 2.61369943619 CONFUSED 2.40399837494 
*** Face 13 detected, confidence: 87.6368408203 Gender: Male HAPPY 16.2307357788 SAD 14.2565965652 ANGRY 12.3210906982 
*** Face 14 detected, confidence: 99.9553375244 Gender: Male HAPPY 54.3005943298 DISGUSTED 5.99133396149 SURPRISED 3.63597273827 

Wow, 15 people, including partial faces. All genders are correct. Emotions are mostly ok, but we definitely need to add 'DRUNK' to the list ;) The labels are spot on: a crowd of men and women drinking alcohol.

Let's try another one. Low res, low quality.

$ rekognitionDetect.py jsimon-public maradona.jpg nocopy 

output file

Label People, confidence: 99.2043991089 
Label Person, confidence: 99.2043991089 
Label Human, confidence: 99.1917037964 
Label Football, confidence: 97.2220993042 
Label Soccer, confidence: 97.2220993042 
Label Sport, confidence: 97.2220993042 
Label American Football, confidence: 83.3328475952 
Label Athlete, confidence: 78.3234786987 
*** Face 0 detected, confidence: 99.963470459 Gender: Male Mustache SURPRISED 21.8802871704 CALM 17.4065952301 SAD 11.6566238403 
*** Face 1 detected, confidence: 99.9813308716 Gender: Male Eyeglasses HAPPY 38.6969680786 ANGRY 6.79734945297 SURPRISED 2.61010527611 
*** Face 2 detected, confidence: 99.9385604858 Gender: Male SURPRISED 36.6970825195 SAD 7.66330337524 ANGRY 6.10639476776 
*** Face 3 detected, confidence: 99.9514923096 Gender: Male SAD 32.6836242676 DISGUSTED 4.55095767975 HAPPY 4.19711828232 
*** Face 4 detected, confidence: 99.8046951294 Gender: Male Beard Mustache SAD 46.0139579773 HAPPY 4.15547084808 DISGUSTED 0.981283187866 
*** Face 5 detected, confidence: 99.2888412476 Gender: Male SAD 90.2270889282 CALM 5.9303817749 HAPPY 3.26179981232 

Labels are fine, except for 'American Football'. 83%??? Gimme a break, the training set needs more Soccer images! In addition, I don't think number 4 is wearing eyeglasses, but again this is a low res picture. Apart from this, Rekognition correctly picked up all faces and funny enough, the expressions make sense too: "sad" and "surprised" are definitely how these guys must have felt against the legendary Diego!

A last one for the road: how about this complex abstract-ish nighttime picture of Shinjuku?

$ rekognitionDetect.py jsimon-public shinjuku.jpg nocopy 

output file

Label City, confidence: 88.4259796143 Label Downtown, confidence: 88.4259796143 
Label Metropolis, confidence: 84.8462677002 
Label Urban, confidence: 84.8462677002 
Label Night, confidence: 69.7816467285 
Label Outdoors, confidence: 69.7816467285 
Label Shop, confidence: 68.228477478 
Label Flyer, confidence: 60.3522796631 
Label Poster, confidence: 60.3522796631 
Label Neighborhood, confidence: 55.3994293213 
*** Face 0 detected, confidence: 97.9367828369 
Gender: Female SAD 46.1420478821 ANGRY 7.63346576691 HAPPY 6.28939962387 

Note that I lowered the confidence threshold from 75% to 50% get more labels. Still, Rekognition does a good job. It also gets the girl's face and yes, she does look quite sad. The Anime face isn't detected but I guess this is the desired behavior.

Alright, enough detection. Let's now try to match faces, using some of the previous pictures as well as some new ones.


$ rekognitionCompare.py jsimon-public julien1.jpg julien2.jpg nocopy 
Face match, confidence=99.9891281128, similarity=98.0 

$ rekognitionCompare.py jsimon-public julien1.jpg booth1.jpg nocopy 
Face match, confidence=99.999671936, similarity=96.0 

$ rekognitionCompare.py jsimon-public julien1.jpg booth2.jpg nocopy 
Face match, confidence=99.9991455078, similarity=84.0 

$ rekognitionCompare.py jsimon-public julien1.jpg keynote.jpg no copy 
Face match, confidence=99.9932250977, similarity=82.0 

Quite good! The last one is particularly nice, given the distance, the angle and the poor lighting (see actual picture above).

These are just a few examples and I'm sure you can't wait to try your own. Hopefully this post has given you a visual, hands-on overview of the Recognition service and how user-friendly it is. I didn't cover face collections, but the API is pretty much what you'd expect (create, delete, etc.).

Feel free to explore and experiment. Until we meet again, keep rockin'.

4 comments:

  1. Where can I find Javadoc for Rekognition and Polly? Also, I don't see Rekognition or Polly in the existing Java libraries

    ReplyDelete
  2. Hi Michael, thanks for reading. Here is what you're looking for:
    - Javadoc: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/
    - SDK: https://github.com/aws/aws-sdk-java

    ReplyDelete
  3. Rekognition looks like a lot of fun! Am I correct in that SDK's are currently available only for Python and Java?

    ReplyDelete
  4. Hi Mark,

    there's definitely a Rekognition client available in our Node.js SDK : https://github.com/aws/aws-sdk-js. I haven't tried it, though.

    ReplyDelete