Digital (dis)content: November 2016

Amazon Rekognition is a Deep Learning based image analysis service. Don't worry though, you won't have to wade through Machine Learning / Deep Learning mumbo jumbo to work with Recognition. Quite the contrary, as Rekognition provides a very easy-to-use API.

It allows developers to:

detect thousands of objects and scenes;
analyze faces;
compare two faces to measure similarity;
build face collections and match faces against these collections.

As usual, this service can be used with the AWS CLI (as in 'aws rekognition' ), or with one of our language SDKs. I'll show you some CLI examples first and then we'll use the popular Python SDK, aka boto3.

First things first: how do we send images for processing? Two options: send the image as a byte blob or put it in S3. I suspect the most of use will use the second option, so that's what I'll use. Time to play!

$ aws rekognition detect-faces --image "S3Object={Bucket="jsimon-public", Name="julien1.jpg"}"

{
    "FaceDetails": [
        {
            "BoundingBox": {
                "Width": 0.3883333206176758,
                "Top": 0.12222222238779068,
                "Left": 0.33666667342185974,
                "Height": 0.2588889002799988
            },
            "Landmarks": [
                {
                    "Y": 0.23426248133182526,
                    "X": 0.46131378412246704,
                    "Type": "eyeLeft"
                },
                {
                    "Y": 0.22791674733161926,
                    "X": 0.5936729311943054,
                    "Type": "eyeRight"
                },
                {
                    "Y": 0.27828338742256165,
                    "X": 0.5404868721961975,
                    "Type": "nose"
                },
                {
                    "Y": 0.3229646682739258,
                    "X": 0.48395034670829773,
                    "Type": "mouthLeft"
                },
                {
                    "Y": 0.31654009222984314,
                    "X": 0.5957114696502686,
                    "Type": "mouthRight"
                }
            ],
            "Pose": {
                "Yaw": 4.216298580169678,
                "Roll": -4.777482509613037,
                "Pitch": -2.406636953353882
            },
            "Quality": {
                "Sharpness": 70.0,
                "Brightness": 65.17163848876953
            },
            "Confidence": 99.99468231201172
        }
    ],
    "OrientationCorrection": "ROTATE_0"
}

JSON, the cornerstone of any nutritious service. So, what do we have here? A face has been found with 99.99+% confidence. It's delimited by the BoundingBox coordinates (top left corner, face width, face height): these are fractional values with respect to the total height and width of the image. Eyes, nose and mouth have been located too (that's reassuring).

Now, let's see what Rekognition can tell us about this second picture.

$ aws rekognition detect-labels --image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}'

{
    "Labels": [
        {
            "Confidence": 99.29261779785156,
            "Name": "Human"
        },
        {
            "Confidence": 99.2958984375,
            "Name": "People"
        },
        {
            "Confidence": 99.2958984375,
            "Name": "Person"
        },
        {
            "Confidence": 99.2667007446289,
            "Name": "Book"
        },
        {
            "Confidence": 99.2667007446289,
            "Name": "Text"
        },
        {
            "Confidence": 71.22590637207031,
            "Name": "Bookcase"
        },
        {
            "Confidence": 71.22590637207031,
            "Name": "Furniture"
        },
        {
            "Confidence": 71.22590637207031,
            "Name": "Shelf"
        },
        {
            "Confidence": 52.00172805786133,
            "Name": "Portrait"
        },
        {
            "Confidence": 52.00172805786133,
            "Name": "Selfie"
        }
    ]
}

With a very good level of confidence, this is the picture of a human with books on a bookshelf, possibly a portrait. A pretty good summary.

Let's compare the two previous pictures. Is this truly the same person? Spoiler: yes, although I look 15 years older on the first one. Note to self: no more promo shots after 36 sleepless hours :D

$ aws rekognition compare-faces --source-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien1.jpg"}}' --target-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}

{
    "FaceMatches": [
        {
            "Face": {
                "BoundingBox": {
                    "Width": 0.5596370100975037,
                    "Top": 0.1318063884973526,
                    "Left": 0.3889369070529938,
                    "Height": 0.5596370100975037
                },
                "Confidence": 99.98912811279297
            },
            "Similarity": 98.0
        }
    ],
    "SourceImageFace": {
        "BoundingBox": {
            "Width": 0.3883333206176758,
            "Top": 0.12222222238779068,
            "Left": 0.33666667342185974,
            "Height": 0.2588889002799988
        },
        "Confidence": 99.99468231201172
    }
}

Similarity is 98%. Jet lag or not, I'm always the same me.

See how simple this service is? I don't see how they could have made it easier. How long would it take to design, build and *train* something like this on your own? I have really no idea and to I don't intend to find out!

Enough CLI, let's switch to Python and run more visual examples. For this purpose, I've written a couple of scripts (available here), using boto3 and the Pillow image processing library.

In a nutshell:

rekognitionDetect.py bucket_name image [copy | nocopy ] : try to detect faces inside an image. If faces are found, each of them will be highlighted by a box and an updated image will be saved. The script will also report image labels and face information (gender, beard, glasses, etc.). Maximum number of labels and default confidence are respectively set to 10 and 75% by default.
rekognitionCompare.py bucket_name sourceImage targetImage [copy | nocopy ]: try to match a reference face to another image. If the face is found, it will be highlighted by a box and an updated image will be saved.

All images must be present with the same name both locally and in S3 . The last parameter for both scripts allows you to skip the copy to S3 if the file is already there.

Hopefully, the code reads like well-written prose (hi Uncle Bob). If not, blame jet lag (yes, it's the root of all evil). Anyway, there's nothing complicated here, I'm sure you'll figure it out in no time.

Let's play some more!

$ rekognitionDetect.py jsimon-public booth1.jpg nocopy

output file


Label Human, confidence: 99.3180236816

Label People, confidence: 99.3190917969
Label Person, confidence: 99.3190917969
Label Clothing, confidence: 92.1037216187
Label Overcoat, confidence: 92.1037216187
Label Suit, confidence: 92.1037216187
Label Computer, confidence: 76.0058441162
Label Electronics, confidence: 76.0058441162
Label LCD Screen, confidence: 76.0058441162
Label Laptop, confidence: 76.0058441162

*** Face 0 detected, confidence: 99.999671936
Gender: Male
HAPPY 96.4477920532
CALM 8.28260231018
CONFUSED 1.53788328171

*** Face 1 detected, confidence: 99.9654922485
Gender: Male
Beard
Mustache
HAPPY 98.5274353027
ANGRY 5.03668212891
CONFUSED 2.61067152023

*** Face 2 detected, confidence: 99.9955444336
Gender: Male
Eyeglasses
HAPPY 97.6237945557
ANGRY 1.31589770317
CALM 0.939458608627

*** Face 3 detected, confidence: 99.9996109009
Gender: Male
Eyeglasses
HAPPY 98.9962310791
SAD 11.4119710922
CONFUSED 1.69576406479

Say hi to Romain, Cédric and Damian, my friendly AWS colleagues. Rekognition sees 4 males, 1 with a beard, 2 with eyeglasses, all of them very happy... and I'm the calmest of the bunch, how about that. Amazingly, Rekognition manages to catch my hardly visible laptop (left edge of the picture, on the table).

Here's a tougher one (Hallo to my German friends).

$ rekognitionDetect.py jsimon-public oktoberfest.jpg nocopy

output file

Label People, confidence: 99.0898742676
Label Person, confidence: 99.0898971558
Label Human, confidence: 99.0639343262
Label Alcohol, confidence: 88.8537063599
Label Beverage, confidence: 88.8537063599
Label Drink, confidence: 88.8537063599
Label Crowd, confidence: 84.0972671509
Label Female, confidence: 84.0796279907
Label Girl, confidence: 84.0796279907

*** Face 0 detected, confidence: 99.9854202271
Gender: Male
HAPPY 60.5386123657
ANGRY 12.2481765747
DISGUSTED 2.10083723068

*** Face 1 detected, confidence: 99.9825744629
Gender: Female
HAPPY 98.0062866211
SURPRISED 10.8561573029
SAD 0.810676813126

*** Face 2 detected, confidence: 99.9904937744
Gender: Female
HAPPY 84.5134887695
SURPRISED 8.68589305878
ANGRY 1.35719180107

*** Face 3 detected, confidence: 99.9073257446
Gender: Male
Beard
Mustache
HAPPY 80.5190963745
SURPRISED 23.9800624847
ANGRY 1.17569565773

*** Face 4 detected, confidence: 99.9972229004
Gender: Male
Mustache
HAPPY 75.2949371338
CONFUSED 10.9511556625
DISGUSTED 1.91761255264

*** Face 5 detected, confidence: 99.9999771118
Gender: Male
HAPPY 35.9886474609
SURPRISED 3.75992059708
ANGRY 2.48707532883

*** Face 6 detected, confidence: 99.9915084839
Gender: Female
HAPPY 99.4766082764
CALM 0.791561603546
ANGRY 0.620931386948

*** Face 7 detected, confidence: 99.9998931885
Gender: Female
HAPPY 99.8826293945
SAD 7.21873044968
DISGUSTED 5.48685789108

*** Face 8 detected, confidence: 83.6580963135
Gender: Male
Eyeglasses
SAD 94.9213943481
SURPRISED 76.9153442383
HAPPY 8.52976131439

*** Face 9 detected, confidence: 99.9944610596
Gender: Male
HAPPY 27.327457428
DISGUSTED 26.6790218353
ANGRY 12.1302127838

*** Face 10 detected, confidence: 99.9998855591
Gender: Male
SURPRISED 99.2624435425
HAPPY 22.0922241211
SAD 6.69546127319

*** Face 11 detected, confidence: 99.9861831665
Gender: Male
SURPRISED 60.7816810608
SAD 7.07310438156
HAPPY 3.66672611237

*** Face 12 detected, confidence: 99.9990692139
Gender: Male
HAPPY 48.0631027222
SURPRISED 2.61369943619
CONFUSED 2.40399837494

*** Face 13 detected, confidence: 87.6368408203
Gender: Male
HAPPY 16.2307357788
SAD 14.2565965652
ANGRY 12.3210906982

*** Face 14 detected, confidence: 99.9553375244
Gender: Male
HAPPY 54.3005943298
DISGUSTED 5.99133396149
SURPRISED 3.63597273827

Wow, 15 people, including partial faces. All genders are correct. Emotions are mostly ok, but we definitely need to add 'DRUNK' to the list ;) The labels are spot on: a crowd of men and women drinking alcohol.

Let's try another one. Low res, low quality.

$ rekognitionDetect.py jsimon-public maradona.jpg nocopy

output file

Label People, confidence: 99.2043991089
Label Person, confidence: 99.2043991089
Label Human, confidence: 99.1917037964
Label Football, confidence: 97.2220993042
Label Soccer, confidence: 97.2220993042
Label Sport, confidence: 97.2220993042
Label American Football, confidence: 83.3328475952
Label Athlete, confidence: 78.3234786987

*** Face 0 detected, confidence: 99.963470459
Gender: Male
Mustache
SURPRISED 21.8802871704
CALM 17.4065952301
SAD 11.6566238403

*** Face 1 detected, confidence: 99.9813308716
Gender: Male
Eyeglasses
HAPPY 38.6969680786
ANGRY 6.79734945297
SURPRISED 2.61010527611

*** Face 2 detected, confidence: 99.9385604858
Gender: Male
SURPRISED 36.6970825195
SAD 7.66330337524
ANGRY 6.10639476776

*** Face 3 detected, confidence: 99.9514923096
Gender: Male
SAD 32.6836242676
DISGUSTED 4.55095767975
HAPPY 4.19711828232

*** Face 4 detected, confidence: 99.8046951294
Gender: Male
Beard
Mustache
SAD 46.0139579773
HAPPY 4.15547084808
DISGUSTED 0.981283187866

*** Face 5 detected, confidence: 99.2888412476
Gender: Male
SAD 90.2270889282
CALM 5.9303817749
HAPPY 3.26179981232

Labels are fine, except for 'American Football'. 83%??? Gimme a break, the training set needs more Soccer images! In addition, I don't think number 4 is wearing eyeglasses, but again this is a low res picture. Apart from this, Rekognition correctly picked up all faces and funny enough, the expressions make sense too: "sad" and "surprised" are definitely how these guys must have felt against the legendary Diego!

A last one for the road: how about this complex abstract-ish nighttime picture of Shinjuku?

$ rekognitionDetect.py jsimon-public shinjuku.jpg nocopy

output file

Label City, confidence: 88.4259796143 Label Downtown, confidence: 88.4259796143
Label Metropolis, confidence: 84.8462677002
Label Urban, confidence: 84.8462677002
Label Night, confidence: 69.7816467285
Label Outdoors, confidence: 69.7816467285
Label Shop, confidence: 68.228477478
Label Flyer, confidence: 60.3522796631
Label Poster, confidence: 60.3522796631
Label Neighborhood, confidence: 55.3994293213
*** Face 0 detected, confidence: 97.9367828369
Gender: Female SAD 46.1420478821 ANGRY 7.63346576691 HAPPY 6.28939962387

Note that I lowered the confidence threshold from 75% to 50% get more labels. Still, Rekognition does a good job. It also gets the girl's face and yes, she does look quite sad. The Anime face isn't detected but I guess this is the desired behavior.

Alright, enough detection. Let's now try to match faces, using some of the previous pictures as well as some new ones.

$ rekognitionCompare.py jsimon-public julien1.jpg julien2.jpg nocopy

Face match, confidence=99.9891281128, similarity=98.0

$ rekognitionCompare.py jsimon-public julien1.jpg booth1.jpg nocopy

Face match, confidence=99.999671936, similarity=96.0

$ rekognitionCompare.py jsimon-public julien1.jpg booth2.jpg nocopy

Face match, confidence=99.9991455078, similarity=84.0

$ rekognitionCompare.py jsimon-public julien1.jpg keynote.jpg no copy
Face match, confidence=99.9932250977, similarity=82.0

Quite good! The last one is particularly nice, given the distance, the angle and the poor lighting (see actual picture above).

These are just a few examples and I'm sure you can't wait to try your own. Hopefully this post has given you a visual, hands-on overview of the Recognition service and how user-friendly it is. I didn't cover face collections, but the API is pretty much what you'd expect (create, delete, etc.).

Feel free to explore and experiment. Until we meet again, keep rockin'.

Disclaimer: all opinions are my own (what did you expect?).

This is a presentation I've meant to build for a while. I guess I was just waiting for an excuse to spend enough time to do so. My talk at The Family's Lion program (http://joinlion.co) gave me this excuse and I want to thank them for the opportunity.

In a nutshell, I'm sick and tired of the current state of Software Engineering. On second thought, it's probably a key reason why I decided to step away from CTO roles and try something different. No more babysitting "engineers" who think they know it all and have no interest or respect for the vast body of knowledge that has been built over the last 60 years.

Fuck trends, hipsters and "rockstars". We need professionals. Software Engineering is Engineering. Computer Science is Science. There are Proven-Ways-To-Do-Things-Right.

Some would say that this is precisely what management is about. Teaching, training, coaxing, blah blah blah. Maybe, maybe not. What I know is that my time is my most precious asset. I will never waste it again on developers who are not willing to listen and learn from people who have been there before. Especially when top management doesn't have the guts to hire, fire or reward accordingly (and of course they'll say otherwise).

This presentation sums up some of my core technical beliefs. Feel free to read, learn, and ask questions: I'll be delighted to help. Or keep drowning in your shit trendy code, you deserve it.

Maybe one day I'll find a team of like-minded code warriors, or maybe I'll feel like building one again. I'm not holding by breath. AWS is home :)

Keep rockin', my brothers. You know who you are.

The Lost Tales of Platform Design (November 2016) from Julien SIMON

Digital (dis)content

Nov 30, 2016

A hands-on look at the Amazon Rekognition API

Nov 20, 2016

The Lost Tales of Platform Design

Nov 5, 2016

3 webinaires AWS en français :)

Keynote @ DevOps D-Day, 07/10/2016, Marseille