Hi, I’m Deshana!

Notes, experiments, and write-ups on vision + ML. Coffee-fueled, book-approved.

Optimal Decision Making in Fantasy Football

Fantasy Football is a game of constrained optimisation. Each gameweek, managers must choose players under fixed rules: a set budget, limited transfers, formation restrictions, and a 4-point penalty for every extra transfer beyond the weekly allowance. The uncertainty comes from player performance — injuries, form swings, and fixture difficulty — but the core decision each week is a transfer choice. To measure the real impact of those choices, I ran a season-long counterfactual analysis. I compared two scenarios: ...

Designing Markers for Easy Detection in Real Life Images

If you had an image and wanted to identify its contents—or if the image contained products you wished to recognize—you could label them using a sticker. What would this sticker look like? And how would you detect it? The sticker should feature an embedded design that can be reliably detected and decoded regardless of camera distance, lighting conditions, viewing angles, and other variations. Making stickers that can be Detected with ease. Thick Black Framing of Marker Pointed sharp edges are easier to detect than rounded edges. Squares and rectangles are therefore ideal to use for framing the pattern. We can detect the corners easily and check if the shape approximates a quadrilateral. ...

Fine-Tuning ImageNet model for Classification

What am I doing today? I have successfully installed Caffe along with the necessary libraries by following this excellent guide. My experiment’s goal is to fine-tune the VGG-16 network for a classification task. The VGG-16 model I’m using is pre-trained on ImageNet for classification, and I will leverage its caffemodel to adjust the weights for my specific application. Data-Preparation Getting the data prepared correctly finishes most of your work. ...

Fancy PCA (Data Augmentation) with Scikit-Image

Let’s start with the basics! We know that an integer variable is stored in 4 bytes. An integer array would be a consecutive stream of many such 4 bytes. A string of text would store number of bytes proportional to the characters perhaps with a little padding. Storage of numbers and text is understood, but how on earth, would we store an image? How do we turn an image into something that can be processed and stored in memory? ...

Fine-tuning pre-trained VGG Face convolutional neural networks model for regression with Caffe

Task: Use a pre-trained face descriptor model to output a single continuous variable predicting an outcome using Caffe’s CNN implementation. Picture Reference: https://deeplearning4j.org/linear-regression So, we’re going to take VGG-Face (a model that is pre-trained on Facial Images) and train this model to predict the salary of the person. This sounds daft! How can anyone predict a person’s salary simply by looking at a person’s face? You and I cannot do this. But the VGG-16 model has 138 million parameters (weights and biases) that it learns. Each of this weight corresponds to a single node which may refer to for example: the skin tone of a person or the shape of the jaw bone of a person or even features that are meaningless or perhaps unpredictable for us. ...

Evaluation of Results using Mean Average Precision

There are several reasons why the evaluation of results on datasets like the Pascal-VOC and ILSRVC is hard. It is well described in Pascal VOC 2009 challenge paper. Here are some of these: Images may contain instances of multiple classes so it is not sufficient to simply ask, “Which one of the m classes does this image belong to?” and then use the predicted result to compare with the actual. There are multiple images and every image contains multiple classes- and the added dismay of accuracy for every image being a non-binary value. ...

Playing around with RCNN- State of the art visual object detection system.

I was playing around with this implementation of RCNN released in 2015 by Ross Girshik. This method is described in detail in his Faster-RCNN paper resleased in NIPS 2015. (I was there and this groundbreaking unfurling of CNN+RCNN was happening around me which gives me all the more reason to be super excited!). I used the pre-trained VGG 16 net where VGG stands for Virtual Geometry Group and 16 because the network is 16-layered. The image is passed through a stack of convolutional layers which use a 3×3 filter followed by 3 fully connected layers. The convolutional stride is fixed at 1 pixel. ...