Digital (dis)content: A hands-on look at the Amazon Rekognition API

Amazon Rekognition is a Deep Learning based image analysis service. Don't worry though, you won't have to wade through Machine Learning / Deep Learning mumbo jumbo to work with Recognition. Quite the contrary, as Rekognition provides a very easy-to-use API.

It allows developers to:

detect thousands of objects and scenes;
analyze faces;
compare two faces to measure similarity;
build face collections and match faces against these collections.

As usual, this service can be used with the AWS CLI (as in 'aws rekognition' ), or with one of our language SDKs. I'll show you some CLI examples first and then we'll use the popular Python SDK, aka boto3.

First things first: how do we send images for processing? Two options: send the image as a byte blob or put it in S3. I suspect the most of use will use the second option, so that's what I'll use. Time to play!

$ aws rekognition detect-faces --image "S3Object={Bucket="jsimon-public", Name="julien1.jpg"}"

{
    "FaceDetails": [
        {
            "BoundingBox": {
                "Width": 0.3883333206176758,
                "Top": 0.12222222238779068,
                "Left": 0.33666667342185974,
                "Height": 0.2588889002799988
            },
            "Landmarks": [
                {
                    "Y": 0.23426248133182526,
                    "X": 0.46131378412246704,
                    "Type": "eyeLeft"
                },
                {
                    "Y": 0.22791674733161926,
                    "X": 0.5936729311943054,
                    "Type": "eyeRight"
                },
                {
                    "Y": 0.27828338742256165,
                    "X": 0.5404868721961975,
                    "Type": "nose"
                },
                {
                    "Y": 0.3229646682739258,
                    "X": 0.48395034670829773,
                    "Type": "mouthLeft"
                },
                {
                    "Y": 0.31654009222984314,
                    "X": 0.5957114696502686,
                    "Type": "mouthRight"
                }
            ],
            "Pose": {
                "Yaw": 4.216298580169678,
                "Roll": -4.777482509613037,
                "Pitch": -2.406636953353882
            },
            "Quality": {
                "Sharpness": 70.0,
                "Brightness": 65.17163848876953
            },
            "Confidence": 99.99468231201172
        }
    ],
    "OrientationCorrection": "ROTATE_0"
}

JSON, the cornerstone of any nutritious service. So, what do we have here? A face has been found with 99.99+% confidence. It's delimited by the BoundingBox coordinates (top left corner, face width, face height): these are fractional values with respect to the total height and width of the image. Eyes, nose and mouth have been located too (that's reassuring).

Now, let's see what Rekognition can tell us about this second picture.

$ aws rekognition detect-labels --image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}'

{
    "Labels": [
        {
            "Confidence": 99.29261779785156,
            "Name": "Human"
        },
        {
            "Confidence": 99.2958984375,
            "Name": "People"
        },
        {
            "Confidence": 99.2958984375,
            "Name": "Person"
        },
        {
            "Confidence": 99.2667007446289,
            "Name": "Book"
        },
        {
            "Confidence": 99.2667007446289,
            "Name": "Text"
        },
        {
            "Confidence": 71.22590637207031,
            "Name": "Bookcase"
        },
        {
            "Confidence": 71.22590637207031,
            "Name": "Furniture"
        },
        {
            "Confidence": 71.22590637207031,
            "Name": "Shelf"
        },
        {
            "Confidence": 52.00172805786133,
            "Name": "Portrait"
        },
        {
            "Confidence": 52.00172805786133,
            "Name": "Selfie"
        }
    ]
}

With a very good level of confidence, this is the picture of a human with books on a bookshelf, possibly a portrait. A pretty good summary.

Let's compare the two previous pictures. Is this truly the same person? Spoiler: yes, although I look 15 years older on the first one. Note to self: no more promo shots after 36 sleepless hours :D

$ aws rekognition compare-faces --source-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien1.jpg"}}' --target-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}

{
    "FaceMatches": [
        {
            "Face": {
                "BoundingBox": {
                    "Width": 0.5596370100975037,
                    "Top": 0.1318063884973526,
                    "Left": 0.3889369070529938,
                    "Height": 0.5596370100975037
                },
                "Confidence": 99.98912811279297
            },
            "Similarity": 98.0
        }
    ],
    "SourceImageFace": {
        "BoundingBox": {
            "Width": 0.3883333206176758,
            "Top": 0.12222222238779068,
            "Left": 0.33666667342185974,
            "Height": 0.2588889002799988
        },
        "Confidence": 99.99468231201172
    }
}

Similarity is 98%. Jet lag or not, I'm always the same me.

See how simple this service is? I don't see how they could have made it easier. How long would it take to design, build and *train* something like this on your own? I have really no idea and to I don't intend to find out!

Enough CLI, let's switch to Python and run more visual examples. For this purpose, I've written a couple of scripts (available here), using boto3 and the Pillow image processing library.

In a nutshell:

rekognitionDetect.py bucket_name image [copy | nocopy ] : try to detect faces inside an image. If faces are found, each of them will be highlighted by a box and an updated image will be saved. The script will also report image labels and face information (gender, beard, glasses, etc.). Maximum number of labels and default confidence are respectively set to 10 and 75% by default.
rekognitionCompare.py bucket_name sourceImage targetImage [copy | nocopy ]: try to match a reference face to another image. If the face is found, it will be highlighted by a box and an updated image will be saved.

All images must be present with the same name both locally and in S3 . The last parameter for both scripts allows you to skip the copy to S3 if the file is already there.

Hopefully, the code reads like well-written prose (hi Uncle Bob). If not, blame jet lag (yes, it's the root of all evil). Anyway, there's nothing complicated here, I'm sure you'll figure it out in no time.

Let's play some more!

$ rekognitionDetect.py jsimon-public booth1.jpg nocopy

output file


Label Human, confidence: 99.3180236816

Label People, confidence: 99.3190917969
Label Person, confidence: 99.3190917969
Label Clothing, confidence: 92.1037216187
Label Overcoat, confidence: 92.1037216187
Label Suit, confidence: 92.1037216187
Label Computer, confidence: 76.0058441162
Label Electronics, confidence: 76.0058441162
Label LCD Screen, confidence: 76.0058441162
Label Laptop, confidence: 76.0058441162

*** Face 0 detected, confidence: 99.999671936
Gender: Male
HAPPY 96.4477920532
CALM 8.28260231018
CONFUSED 1.53788328171

*** Face 1 detected, confidence: 99.9654922485
Gender: Male
Beard
Mustache
HAPPY 98.5274353027
ANGRY 5.03668212891
CONFUSED 2.61067152023

*** Face 2 detected, confidence: 99.9955444336
Gender: Male
Eyeglasses
HAPPY 97.6237945557
ANGRY 1.31589770317
CALM 0.939458608627

*** Face 3 detected, confidence: 99.9996109009
Gender: Male
Eyeglasses
HAPPY 98.9962310791
SAD 11.4119710922
CONFUSED 1.69576406479

Say hi to Romain, Cédric and Damian, my friendly AWS colleagues. Rekognition sees 4 males, 1 with a beard, 2 with eyeglasses, all of them very happy... and I'm the calmest of the bunch, how about that. Amazingly, Rekognition manages to catch my hardly visible laptop (left edge of the picture, on the table).

Here's a tougher one (Hallo to my German friends).

$ rekognitionDetect.py jsimon-public oktoberfest.jpg nocopy

output file

Label People, confidence: 99.0898742676
Label Person, confidence: 99.0898971558
Label Human, confidence: 99.0639343262
Label Alcohol, confidence: 88.8537063599
Label Beverage, confidence: 88.8537063599
Label Drink, confidence: 88.8537063599
Label Crowd, confidence: 84.0972671509
Label Female, confidence: 84.0796279907
Label Girl, confidence: 84.0796279907

*** Face 0 detected, confidence: 99.9854202271
Gender: Male
HAPPY 60.5386123657
ANGRY 12.2481765747
DISGUSTED 2.10083723068

*** Face 1 detected, confidence: 99.9825744629
Gender: Female
HAPPY 98.0062866211
SURPRISED 10.8561573029
SAD 0.810676813126

*** Face 2 detected, confidence: 99.9904937744
Gender: Female
HAPPY 84.5134887695
SURPRISED 8.68589305878
ANGRY 1.35719180107

*** Face 3 detected, confidence: 99.9073257446
Gender: Male
Beard
Mustache
HAPPY 80.5190963745
SURPRISED 23.9800624847
ANGRY 1.17569565773

*** Face 4 detected, confidence: 99.9972229004
Gender: Male
Mustache
HAPPY 75.2949371338
CONFUSED 10.9511556625
DISGUSTED 1.91761255264

*** Face 5 detected, confidence: 99.9999771118
Gender: Male
HAPPY 35.9886474609
SURPRISED 3.75992059708
ANGRY 2.48707532883

*** Face 6 detected, confidence: 99.9915084839
Gender: Female
HAPPY 99.4766082764
CALM 0.791561603546
ANGRY 0.620931386948

*** Face 7 detected, confidence: 99.9998931885
Gender: Female
HAPPY 99.8826293945
SAD 7.21873044968
DISGUSTED 5.48685789108

*** Face 8 detected, confidence: 83.6580963135
Gender: Male
Eyeglasses
SAD 94.9213943481
SURPRISED 76.9153442383
HAPPY 8.52976131439

*** Face 9 detected, confidence: 99.9944610596
Gender: Male
HAPPY 27.327457428
DISGUSTED 26.6790218353
ANGRY 12.1302127838

*** Face 10 detected, confidence: 99.9998855591
Gender: Male
SURPRISED 99.2624435425
HAPPY 22.0922241211
SAD 6.69546127319

*** Face 11 detected, confidence: 99.9861831665
Gender: Male
SURPRISED 60.7816810608
SAD 7.07310438156
HAPPY 3.66672611237

*** Face 12 detected, confidence: 99.9990692139
Gender: Male
HAPPY 48.0631027222
SURPRISED 2.61369943619
CONFUSED 2.40399837494

*** Face 13 detected, confidence: 87.6368408203
Gender: Male
HAPPY 16.2307357788
SAD 14.2565965652
ANGRY 12.3210906982

*** Face 14 detected, confidence: 99.9553375244
Gender: Male
HAPPY 54.3005943298
DISGUSTED 5.99133396149
SURPRISED 3.63597273827

Wow, 15 people, including partial faces. All genders are correct. Emotions are mostly ok, but we definitely need to add 'DRUNK' to the list ;) The labels are spot on: a crowd of men and women drinking alcohol.

Let's try another one. Low res, low quality.

$ rekognitionDetect.py jsimon-public maradona.jpg nocopy

output file

Label People, confidence: 99.2043991089
Label Person, confidence: 99.2043991089
Label Human, confidence: 99.1917037964
Label Football, confidence: 97.2220993042
Label Soccer, confidence: 97.2220993042
Label Sport, confidence: 97.2220993042
Label American Football, confidence: 83.3328475952
Label Athlete, confidence: 78.3234786987

*** Face 0 detected, confidence: 99.963470459
Gender: Male
Mustache
SURPRISED 21.8802871704
CALM 17.4065952301
SAD 11.6566238403

*** Face 1 detected, confidence: 99.9813308716
Gender: Male
Eyeglasses
HAPPY 38.6969680786
ANGRY 6.79734945297
SURPRISED 2.61010527611

*** Face 2 detected, confidence: 99.9385604858
Gender: Male
SURPRISED 36.6970825195
SAD 7.66330337524
ANGRY 6.10639476776

*** Face 3 detected, confidence: 99.9514923096
Gender: Male
SAD 32.6836242676
DISGUSTED 4.55095767975
HAPPY 4.19711828232

*** Face 4 detected, confidence: 99.8046951294
Gender: Male
Beard
Mustache
SAD 46.0139579773
HAPPY 4.15547084808
DISGUSTED 0.981283187866

*** Face 5 detected, confidence: 99.2888412476
Gender: Male
SAD 90.2270889282
CALM 5.9303817749
HAPPY 3.26179981232

Labels are fine, except for 'American Football'. 83%??? Gimme a break, the training set needs more Soccer images! In addition, I don't think number 4 is wearing eyeglasses, but again this is a low res picture. Apart from this, Rekognition correctly picked up all faces and funny enough, the expressions make sense too: "sad" and "surprised" are definitely how these guys must have felt against the legendary Diego!

A last one for the road: how about this complex abstract-ish nighttime picture of Shinjuku?

$ rekognitionDetect.py jsimon-public shinjuku.jpg nocopy

output file

Label City, confidence: 88.4259796143 Label Downtown, confidence: 88.4259796143
Label Metropolis, confidence: 84.8462677002
Label Urban, confidence: 84.8462677002
Label Night, confidence: 69.7816467285
Label Outdoors, confidence: 69.7816467285
Label Shop, confidence: 68.228477478
Label Flyer, confidence: 60.3522796631
Label Poster, confidence: 60.3522796631
Label Neighborhood, confidence: 55.3994293213
*** Face 0 detected, confidence: 97.9367828369
Gender: Female SAD 46.1420478821 ANGRY 7.63346576691 HAPPY 6.28939962387

Note that I lowered the confidence threshold from 75% to 50% get more labels. Still, Rekognition does a good job. It also gets the girl's face and yes, she does look quite sad. The Anime face isn't detected but I guess this is the desired behavior.

Alright, enough detection. Let's now try to match faces, using some of the previous pictures as well as some new ones.

$ rekognitionCompare.py jsimon-public julien1.jpg julien2.jpg nocopy

Face match, confidence=99.9891281128, similarity=98.0

$ rekognitionCompare.py jsimon-public julien1.jpg booth1.jpg nocopy

Face match, confidence=99.999671936, similarity=96.0

$ rekognitionCompare.py jsimon-public julien1.jpg booth2.jpg nocopy

Face match, confidence=99.9991455078, similarity=84.0

$ rekognitionCompare.py jsimon-public julien1.jpg keynote.jpg no copy
Face match, confidence=99.9932250977, similarity=82.0

Quite good! The last one is particularly nice, given the distance, the angle and the poor lighting (see actual picture above).

These are just a few examples and I'm sure you can't wait to try your own. Hopefully this post has given you a visual, hands-on overview of the Recognition service and how user-friendly it is. I didn't cover face collections, but the API is pretty much what you'd expect (create, delete, etc.).

Feel free to explore and experiment. Until we meet again, keep rockin'.

4 comments:

Mike SlinnNov 30, 2016, 9:57:00 PM
Where can I find Javadoc for Rekognition and Polly? Also, I don't see Rekognition or Polly in the existing Java libraries
UnknownDec 1, 2016, 3:54:00 AM
Hi Michael, thanks for reading. Here is what you're looking for:
- Javadoc: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/
- SDK: https://github.com/aws/aws-sdk-java
UnknownDec 1, 2016, 10:15:00 PM
Rekognition looks like a lot of fun! Am I correct in that SDK's are currently available only for Python and Java?
UnknownDec 2, 2016, 12:54:00 AM
Hi Mark,

there's definitely a Rekognition client available in our Node.js SDK : https://github.com/aws/aws-sdk-js. I haven't tried it, though.

Nov 30, 2016

A hands-on look at the Amazon Rekognition API

4 comments: