Convolutional Neural Network for Object Recognition and Detection

Basically computer vision has 4 main tasks :

1. Object Recognition/Classification

Classify the object in the image.

c3

2.Object Detection

Are there any object that we want to detect in the image? If yes, draw the bounding box around the image

c4

3. Object Localization

Are there any object that we want to detect in the image? If yes, draw the bounding box around the image and show the coordinates of the bounding box.

c1

The (x1, y1) would be the top left corner and the (x2, y2) the bottom right.

And finally … the latest one  :

4. Object Segmentation

By accommodating mask rcnn, we can get the exact pixel position for each object. This kind of development is very important for robotic vision.

Suppose you have a small robot

 

And we need to instruct the robot to get passes through this woman between her tiny legs.  By using mask rcnn, the robot knows the exact position of her legs.

This kind of trick can not be accomplished by object localization which uses bounding box since we need to know exact position of her leg.

Currently, The most suitable type of neural network to perform those 4 tasks is “convolutional neural network”.

Previously on my post, I wrote about “Cardboard Box Detection using Retinanet (Keras)”, it’s about train a custom model on keras retinanet for cardboard localization in the image.  RetinaNet is a convolutional neural network architecture.

Convolutional neural network is commonly used in computer vision for object detections, object localizations, object recognitions, analyzing depth of image regions, etc…

This post will cover about convolutional neural network in general, including some maths of convnet, convnet architecture and then continue with RetinaNet architecture.

Convolutional Neural Network

“A convolutional neural network is a class of deep neural networks, most commonly applied to analyzing visual imagery. CNN  is an improved version of multilayer perceptron”.  It’s a class of deep neural network inspired by human’s visual cortex.

Basically CNN works by collecting matrix of features then predicting whether this image contains a class or another class based on these features using softmax probabilities.

Convolutional Neural Network Architecture

Commonly, a convolutional neural network architecture consists of these layers :

1.Convolution Layer

The core idea between convolutional operation is for feature extractions or we can say filtering. Later, the network will be trying every possible matching features from the input image compared to the class’s image (class is an object name that we want to recognize, e.g : a car).

convnet_conv

In order to get enlightenment of how convolutional layer operate, have a look at above image.  Based from the above picture we have an input image of 6×6 px, and we have a 3×3 px 2d convolution kernel.  The kernel will do 1 stride from top left pixel of the image until the bottom of the image. The kernel is a 3×3 matrix of weight (each component of the matrix is a weight). This convolutional operation is used to extract features from image. The most frequently used kernel for convolutional is 2d convolution kernel.

Suppose we have a 6×6 pixel image with RGB color channel.

m1

Suppose we are going to do a convolutional operation using 3×3 matrix as kernel and stride = 1 (the number of strides defines how many pixels the kernel will step).

Here’s the RGB channel in 6×6  matrix extracted from numpy array :

m2


#!/usr/bin/env python3
from PIL import Image
import numpy as np
im = Image.open(“6px.png”)
imgarr = np.array(im)
print(“R channel”)
print(imgarr[:,:,0])
print(“_” * 30)
print(“G channel”)
print(imgarr[:,:,1])
print(“_” * 30)
print(“B channel”)
print(imgarr[:,:,2])
print(“_” * 30)

For this example, we are going to do a 1 stride convolutional operation on red channel using this 3×3 matrix of weight (sobel) :

As an example of convolutional operation, we are going to use “The Red Channel” matrix :

 

 

Here’s the  mathematical operation using convolutional operation :

 

#!/usr/bin/env python3
res = (46 * -1) + (67 * 0) + (161 * 1)+ (48 * -2) + (41 * 0) + (114 * 2) + (101 * -1 ) + (165 * 0 ) + (216 * 1)

print(res)

 

The next 1 pixel stride :

 

and so on, the stride will continue until the last pixel.

The result is called convolved feature map matrix.

Since the matrix is only 4×4 pixel, There will be 2 pixel padding for bottom, right, top and left.

The Linearity

Algebraically, a convolutional operation is a linear combination. We need to introduce non linearity hence an activation function is needed. Right after the convolutional operation, in order to introduce non linearity, the “ReLU” activation function is used. If we keep it linear, we do not need to use deep learning since it’s just a simple linear functions.

Mathematically, ReLU can be defined as

After ReLU, all negative pixel value from the previous convolved feature map matrix with negative pixel value will be replaced by 0.

Why Non Linearity is Needed ?

In Math and statistic, a non linearity  is commonly used to solve complex problem, meanwhile a linear equation is simple, if we define the input of a linear equation, the output can be found by simple algebra. Before we use activation function such as ReLU, basically the convolutional operation is only a linear function.

Consider an example of a simple this linear equation :

Y = a.x

No matter how many layers, the final activation function will always yield the exact same predicted output. In this condition we do not need to use deep learning with many layers, a simple one layer neural network is enough.

By using ReLU activation function right after a convolutional operation, we can introduce the non linearity hence the system can learn how to solve more complex problem.

Real-life image recognition is a complex problem which can’t be solved literally by computer.

For example we have trained our single layer neural network using dataset of cars and dataset of bat logos:

class 1 is honda civic

 

class 2 is bat logo

 

Then if we give an input image with something like this (the same image resolution with dataset)

 

The computer will be able to answer the correct prediction since it’s just answering a literally just the same image with the same pixels arrangement.

Unfortunately when we give this input image :

The computer will not be able to vote correctly whether this one is a bat logo or a honda civic class.

In order to solve this kind of complex problem (since the object in image might be rotated slightly or having a different pose or a bit different form)  the ideal neural network to solve this one need a non linearity.

By having a different pose or a slightly different form, this means that the prediction can not by simply solved by a linear regression, since

Y is no longer a.X hence we need to solve this using a non linear equation.

In convolutional neural network, we would update the weights and biases of the neurons on the basis of the error at the output. This process is known as back-propagation. Activation functions will introduce non linearity to the system thus making the back-propagation possible since the gradients are supplied along with the error / loss to update the weights and biases.

2.   Pooling Layer

Right after the ReLU, the next layer is a pooling layer.  The pooling layer basically is used to reduce the spatial size of the input hence reducing the number of parameters and computational complexity.

Commonly used pooling method is max pooling.

 

3.  Fully Connected Layer

The Fully Connected layer holds composite and aggregate information from previous layers. Before given as input of fully connected layers, those previous multi dimensional inputs will be flatten into a single dimensional inputs.

And finally, the prediction (voting) will be accomplished using the activation function, e.g : softmax.

 

Some Examples of CNN Architectures

Lenet

The input image of lenet 5 is 32×32 px image. Here’s the summary of lenet 5 architecture :

Other than using tanh activation function, we can use ReLU as activation function.

Here’s example of lenet implementation in keras :

import keras
from keras.models import Sequential
from keras import models, layers
model = keras.Sequential()
model.add(layers.Conv2D(filters=6, kernel_size=(3, 3), activation='tanh', input_shape=(32,32,1)))
model.add(layers.AveragePooling2D())
model.add(layers.Conv2D(filters=16, kernel_size=(3, 3), activation='tanh'))
model.add(layers.AveragePooling2D())
model.add(layers.Flatten())
model.add(layers.Dense(units=120, activation='tanh'))
model.add(layers.Dense(units=84, activation='tanh'))
model.add(layers.Dense(units=10, activation = 'softmax'))
model.summary()

 

 

VGG16

The input image of vgg16 is  224×224 px. Here’s the summary of vgg16 architecture :

Example of implementation of vgg16 in keras :

#!/usr/bin/env python3

import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras import models, layers

model = keras.Sequential()

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu', input_shape=(224,224,3)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

model.add(layers.Conv2D(128, (3, 3),activation='relu',padding='same'))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))

model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(256, (3, 3), activation='relu',padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))

model.add(layers.Conv2D(512, (3, 3),activation='relu',padding='same'))
model.add(layers.Conv2D(512, (3, 3),activation='relu', padding='same'))
model.add(layers.Conv2D(512, (3, 3),activation='relu',padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))

model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))

model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(1000, activation='softmax'))

model.summary()

Resnet

The main purpose of resnet architecture is to make a convolutional neural network with many layers to train effectively.

The problem of a deep convolutional neural network is that when we increase the network depth, there’s a vanishing gradient problem. As the network goes deeper, its performance gets saturated or even starts degrading in accuracy.

Resnet splits a deeper network into three layer chunks and passing the input into each chunk straight through to the next chunk, along with the residual output of the chunk minus the input to the chunk that is reintroduced.

An implementation of resnet from keras :

https://github.com/raghakot/keras-resnet/blob/master/resnet.py

 

RetinaNet

The problem with a single shot detection model such as yolo is : “there is extreme foreground-background class imbalance problem in one-stage detector.”

RetinaNet introduce “The Focal Loss” to cover for extreme foreground-background class imbalance problem in one-stage detector.

 

Retinanet is a single shot detection model just like Yolo. On RetinaNet, a commonly used backbone is resnet50, we add a FPN (Feature Pyramid Network) for feature extraction and later the network will use Focal lost to handle extreme foreground-background class imbalance problem.

Example implementation of RetinaNet using keras can be cloned from https://github.com/fizyr/keras-retinanet

Example of custom object detection using Retinanet :

 

 

Reference :

https://arxiv.org/abs/1708.02002

 

Advertisements

Convolutional Operation in Convolutional Neural Network

convnet_conv

In order to get enlightenment of how convolutional layer operate, have a look at above image.  Based from the above picture we have an input image of 6×6 px, and we have a 3×3 px 2d convolution kernel.  The kernel will do 1 stride from top left pixel of the image until the bottom of the image. The kernel is a 3×3 matrix of weight (each component of the matrix is a weight). This convolutional operation is used to extract features from image. The most frequently used kernel for convolutional is 2d convolution kernel.

Suppose we have a 6×6 pixel image with RGB color channel.

m1

Suppose we are going to do a convolutional operation using 3×3 matrix as kernel and stride = 1 (the number of strides defines how many pixels the kernel will step).

Here’s the RGB channel in 6×6  matrix extracted from numpy array :

m2


#!/usr/bin/env python3
from PIL import Image
import numpy as np
im = Image.open(“6px.png”)
imgarr = np.array(im)
print(“R channel”)
print(imgarr[:,:,0])
print(“_” * 30)
print(“G channel”)
print(imgarr[:,:,1])
print(“_” * 30)
print(“B channel”)
print(imgarr[:,:,2])
print(“_” * 30)

For this example, we are going to do a 1 stride convolutional operation on red channel using this 3×3 matrix of weight :

 

The above matrix is just an example, although 3×3 matrix is commonly used for kernel, but the above kernel is an identity matrix, it will do nothing. Since the goal of a convolutional layer is filtering (feature extraction) from image, you might consider using different kind of matrix as kernel.

As an example of convolutional operation, we are going to use “The Red Channel” matrix :

Here’s the  mathematical operation using convolutional operation :

#!/usr/bin/env python3
res =(46 * 0) + (67 * 0) + (161 * 0)+ (48 * 0) + (41 * 1) + (114 * 0) + (101 * 0 ) + (165 * 0) + (216 * 0 )
print(res)

 

The next 1 pixel stride :

#!/usr/bin/env python3
res = (67 * 0) + (161 * 0) + (250 * 0) + (41 * 0) + (114 * 1) + (65 * 0)+ (165 * 0) + (216 * 0 ) + (231 * 0)
print(res)

The next 1 pixel stride :

#!/usr/bin/env python3
res = (161 * 0) + (250 * 0) + (223 * 0 )+ (114 * 0) + (65 * 1) + (159 * 0) + (216 * 0) + (231 * 0 ) + (230 *0)
print(res)

and so on, the stride will continue until the last pixel.

The result is called convolved feature map matrix.

Since the matrix is only 4×4 pixel, There will be 2 pixel padding for bottom, right, top and left.

 

Cardboard Box Detection and Localization using RetinaNet (Keras)

Keras RetinaNet is keras implementation of RetinaNet object detection as described in Focal Loss for Dense Object Detection Paper by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar (https://arxiv.org/abs/1708.02002).

We are going to train a model using keras retinanet to detect and localize a custom object : “cardboard box” from the image.

Step 1.  Dataset Preparation

In order to detect cardboard boxes in the image, I have prepared several cardboard box images  :

a.jpg

The dataset can be downloaded from  :

https://screenku.com/repo/cardboard.tar.bz2

https://screenku.com/repo/cardboard2.tar.bz2

Keras Retinanet will resize any input image before input layer, the input image will be resized into an image with the maximal height 800px. For example, when we give input image 640×480 it will be resized into 1067×800. When we give input image 800×600, it will be resized into 1067×800.

This dataset was created using gimp and labelImg. It contains 640×480 images. We are going to train a new model using this dataset to detect image from webcam with 640×480 resolution.

Gimp was used to extract only relevant part of the image using lasso tool :

c

d.jpg

 

labelImg was used for labelling specific areas within image that contains our object. Do not use too much margin when labelling the object since the quality of our detection depends on our dataset quality.

e.jpg

 

Step 2.  Converting the Dataset from Pascal VOC Format into CSV Annotation Format for Keras RetinaNet

Next, we need to convert the dataset from voc into csv. We are going to use conv_pascal_to_csv, this utility can be cloned from github url :

https://github.com/akchan/conv_pascal_to_csv.git

$ ruby conv_pascal_to_csv.rb –help
Usage: conv_pascal_to_csv [options]
–annotation-path PATH path to xml annotations directory
–image-path PATH path to jpeg images directory
–val-ratio float sample ratio for validation (0.0 – 1.0). default=0.1
$ ruby conv_pascal_to_csv.rb –annotation-path /home/ringlayer/Desktop/convnet/keras-retinanet/datasets/cardboard2/voclabels –image-path /home/ringlayer/Desktop/convnet/keras-retinanet/datasets/cardboard2/images

Once the conversion finished, we will get 3 files in csv directory :

$ ls csv
annotations.csv classes.csv val_annotations.csv

I replaced image paths at annotations.csv and val_annotations.csv with absolute paths of my dataset location :

g.jpg

 

Step 3. Start Training New Model

In order to train new model, we can do transfer learning by using previously trained weight. Here, I use Resnet50 as backbone to train new model :

python3 keras_retinanet/bin/train.py –freeze-backbone –random-transform –weights weights/ResNet-50-model.keras.h5 –batch-size 8 –steps 500 –epochs 15 csv csv/annotations.csv csv/classes.csv

These parameters required since we were not feeding the neural net at once with all training images, instead we will divide it into several epochs (each epoch will be divided into 500 steps)

–batch-size 8

We used batch size 8, The batch size defines the number of images that will be propagated through the network.

–epochs 15

One epoch means an entire dataset is passed forward and backward through the neural network. Since we used 15 epochs, it means it will need 15 epochs to complete the entire dataset provided at annotations.csv.

–steps 500

We use 500 steps on each epoch.  This is the number of batch iterations before a training epoch is considered finished.

Here, I trained the model using gtx1080 with 8 gb vram :

t1.jpg

This configuration takes about 7gb vram

t2

In order to fine tune new model training we can use a gpu with 11 gb vram.

t3

The trained weight on each epoch will be saved to snapshots directory. We are going to use resnet50_csv_15.h5 for our final model to detect cardboard (I renamed resnet50_csv_15.h5 into cb20.h5).

The source code for testing this model on webcam and video can be fetched from github :

https://screenku.com/repo/keras-retinanet-custom-object-detection.tar.bz2

Testing our model to detect cardboard in video and image :

imagetest

 

videotest

 

 

 

Regular Inverse & Pseudo Inverse Matrix Calculation using Singular Value Decomposition

written by Antonius Ringlayer

https://ringlayer.github.io http://www.ringlayer.com

  • In order to calculate inverse kinematic solutions for robotic arm, we can use several methods. One of my favorite method is using the pseudo inverse of a jacobian matrix.

So what is pseudo inverse of a matrix and how it differs from the regular inverse of a matrix ?

Regular Inverse of a Square Matrix

Inversion a Matrix means a process to create a reciprocal version of a Matrix.

In math, the reciprocal of a number means inverse of a number, for example, we have a number called b, inverse of b number can be written as b-1 . b-1 comes from: 1/b (reciprocal of b). Meanwhile for a matrix, e.g matrix A, inverse of A matrix can be written as A-1

In order to be inverted, a matrix must meet 2 conditions :

  1. the matrix has the same number of rows and columns (square matrix)
  2. determinant of the matrix is not zero.

The Pseudo Inverse of a Matrix

The Pseudo inverse matrix is symbolized as A dagger.

However, sometimes there are some matrices that do not meet those 2 requirements, thus cannot be inverted. Where the condition is overdetermined, we can use a method called pseudo inversing to create a pseudo inverse matrix version of our original matrix.

We are going to use SVD (Singular Value Decomposition) Trick to decompose a non square matrix which doesn’t have eigenvalue and eigenvector.

Inversing an Invertible Square Matrix

Inversing a square matrix is easy. For example, We have A matrix:

1

In order to get inverse of A, we are going to use adjoint method :

2

Inverse of A can be acquired from these steps:

  • Step 1. calculating determinant of A
  • Step 2. find cofactor of A
  • Step 3. find adjoint of A
  • Step 4. find the inverse

Based on https://ringlayer.wordpress.com/2018/07/03/matrix-determinant-laplace-and-sarrus-method , we found determinant of our matrix = -6.

So we can go straight ahead to step 2 : “find cofactor of A”.

First of all, we need to get minor of A.  Minor of A can be obtained by getting 2×2 possible matrix determinants of A :

Visually, we can do the trick a lot easier:

3.jpg

Next:

5.jpg

hence we got :

|a| = 2 ,  |b| = 1 , |c| = -3, |d| = 2, |e| = -1, |f| = -3, |g| = -1, |h| = -2 , |i|= -3

Thus we got minor matrix:

mino3.jpg

Cofactor matrix can be obtained easily:

proc.jpg

Hence we got cofactor matrix:

cofactor.jpg

Step 3, find adjoint of A :

Adjoint can be obtained by transforming cofactor matrix columns into rows, Adjoint :

final_adjoint

Finally based on our formula:

6.jpg

A-1 =  1/-6 *  adjoint matrix, Hence we got our inverse matrix :

inverse

Pseudo Inverse Matrix using SVD

Sometimes, we found a matrix that doesn’t meet our previous requirements (doesn’t have exact inverse), such matrix doesn’t have eigenvector and eigenvalue. In order to find pseudo inverse matrix, we are going to use SVD (Singular Value Decomposition) method.

For Example,  Pseudo inverse of matrix A is symbolized as A+

When the matrix is a square matrix :

A+ = A-1

when the matrix is overdetermined :

A+ = -1 Ut

V, Σ and U are matrices from SVD of A.

Where SVD of A :

UΣVt

can be notated :

UsVt

or

USVt

or

UDVt

 

Example of Pseudo Inverse Calculation using SVD Method

Graphical-depiction-of-SVD-of-a-matrix-X-annotated-with-notations-adopted-in-this.png

For example, We have a non squared matrix called “A”  as follow :

svd1.jpg

 

U, s, VT of this matrix can be acquired manually or automatically using this simple python numpy script:

#!/usr/bin/env python3
import numpy as np
# example taken from Video Tutorials – All in One
# https://www.youtube.com/watch?v=P5mlg91as1c
a = np.array([[1, 2, 1],[2, 1, -1]])
np.set_printoptions(suppress=True)
np.set_printoptions(precision=3)
U, s, VT = np.linalg.svd(a, full_matrices=True)
print (“U:\n {}”.format(U))
print (“s:\n {}”.format(s))
print (“VT:\n {}”.format(VT))

svd8

or we can calculate SVD manually using this following steps:

Step 1. Finding singular vector and singular value of U & V

in order to find singular vector and singular value of U & V, we have to find eigenvector and eigenvalue of kU & kV, where :

kU = A . At

kV = At . A

svd2.jpg

= svd3

Finding eigenvalue and eigenvector of kU:

| A – λ I | = 0

λ = scalar (eigenvalue corresponding to u)

I = identity matrix

svd6.jpg

svd7

So :

(6 – λ)2 – 25 = 0

we got 2 eigenvalue :

λ1 = 11

and

λ2 = 1

next, we are going to find eigenvector of kU.

eigenvector of kU = x

(A – λ.I) * x = 0

svd9

for λ = 11 :

svd10

(-5 * x1) + (5 * x2) = 0

hence:

x1 = x2

x1 = -1

x2 = -1

for λ = 1 :

svd11.jpg

(5 * x1 ) + (5 * x2) = 0

hence:

x1 – x2 = 0

x1 = -1

x2 = 1

Finally we have collected enough informations for U:

λ1 = 11 -> x1 = -1 and x2 = -1

λ2 = 1 -> x1 = -1 and x2 = 1

Du =  svd12

u =  svd13.jpg

in order to get U we need to normalize u (orthonormal) using this following steps:

We have our u matrix:

svd16.jpg

to normalize u :

svd15

 

hence:

svd17

we got:

svd18

2 power 1/2 = square root of 2.

svd22.jpg

we got:

svd23.jpg

using the same method to normalize, another column, we got:

svd24.jpg

We got our normalize matrix:

svd25

so U =

svd26

in order to find V, we can do the same steps just like U, we got :

dv1.jpg

 

V

By transposing matrix V, we got:

V1.jpg

 

Step 2. Find D.

We can acquire D matrix by multiplying Du with Dv:

D = Du * Dv

= svd12dvv.jpg

 

V3.jpg

 

Finally we have collected enough informations to calculate pseudo inverse matrix using SVD method, we got this informations:

U = svd26

V

V3.jpg

Based on  our pseudo inverse matrix formula:

A+ = VΣ-1 Ut

or

A+ = VD-1 Ut

We can got our pseudo inverse matrix easily:

dvv1.jpg* Ut

 

We got our Pseudo inverse matrix (A dagger):

dvv3.jpg

 

Calculus 3 – First Order Partial Derivatives

Recently, I encountered a matrix called jacobian matrix. Jacobian matrix is a matrix that consists of first order partial derivatives of vector value function. Pseudo Inverse of jacobian matrix can be used to solve inverse kinematic problem in robotic field. So what is partial derivatives ? Partial derivative  symbolized as (partial dee). Partial Derivative is  a part of calculus. Based on literature :

“a derivative of a function of two or more variables with respect to one variable, the other(s) being treated as constant.”

example function:

f(x,y) = y³x + 4x + 5y

∂f/∂x  means partial derivative of f(x,y) in respect to x. where we treat y as constant.

∂f/∂x = y³ + 4

steps:

  • partial derivative (∂f/∂x) of y³x = y³x / x  = y³
  • partial derivative  (∂f/∂x) of 4x = 4x  / x = 4
  • partial derivative (∂f/∂x) of constant 5y = 0
  • so ∂f/∂x = y³ + 4

∂f/∂y  means partial derivative of f(x,y) in respect to y. where we treat x as constant.

f(x,y) = y³x + 4x + 5y

∂f/∂y = 2y²x + 5

steps :

  • partial derivative (∂f/∂y) of y³x = 2y²x
  • partial derivative  (∂f/∂y) of contsant 4x = 0
  • partial derivative (∂f/∂y) of  5y = 5y / y = 5
  • so ∂f/∂y = 2y²x + 5

Another example, partial derivative of this explicit function :

y = 3x2 – 5z2 + 2x2z – 4xz2 – 9

∂y/∂x = ?

steps :

  • ∂y/∂x of 3x2  = 2.3x2-1 = 6x
  • ∂y/∂x of 5z2 = 0
  • ∂y/∂x of 2x2z  = 2.2x2-1z = 4xz
  • ∂y/∂x of 4xz2 = 4z2
  • ∂y/∂x of 9 = 0
  • So ∂y/∂x of y = 6x + 4xz- 4z2

what about ∂y/∂z ?

steps:

  • ∂y/∂z of 3x2  = 0
  • ∂y/∂z of 5z2 = 2.5z2-1 = 10z
  • ∂y/∂z of 2x2z  =  2x2z * 1/z = 2x2
  • ∂y/∂z of 4xz2 = 2. 4xz2-1 =  8xz
  • ∂y/∂z of 9 = 0
  • So ∂y/∂z of y = -10z + 2x2 – 8xz

First Order Partial Derivative of a function that consists of 3 variables ?

Example : f(x,y,z) = z4 − x6y2 + 5

f’x = ?

steps :

  • ∂f/∂x of z4 = 0
  • ∂f/∂x of x6y2 = 6x6-1y2   = 6x5y2
  • ∂f/∂x of 5 = 0
  • thus ∂f/∂x of  f(x,y,z) = -6x5y2

f’y = ?

  • ∂f/∂y of z4 = 0
  • ∂f/∂y of x6y2 = x6 2y2-1   = x6 2y
  • ∂f/∂y of 5 = 0
  • thus ∂f/∂y of  f(x,y,z)  = -x6 2y

f’z = ?

  • ∂f/∂z of z4 = 4z4-1 = 4z3
  • ∂f/∂z of x6y2 = 0
  • ∂f/∂z of 5 = 0
  • thus ∂f/∂z of  f(x,y,z)  =  4z3

 

www.ringlayer.com

https://ringlayer.github.io

Matrix Determinant – Laplace and Sarrus Method

N.B
some of matrix picture might be good since it was generated from https://www.symbolab.com/solver/matrix-vector-calculator , meanwhile some of them are my handwriting.

https://ringlayer.github.io

It has been so long since my last time  playing with matrix. The last time I used matrix is when I was study about input and output optimization using leontief matrix. Since it has been so long, I decided to write an interesting topic about matrix called determinant.

What is matrix determinant ? Here is a short  description from wikipedia:

In linear algebra, the determinant is a value that can be computed from the elements of a square matrix. The determinant of a matrix A is denoted det, det A, or |A|

First of all, it’s only possible to find determinant of a matrix when a matrix has the same number of columns and rows. In order to calculate determinant of a matrix, there are many methods. There are 2 methods which I used frequently. They are : laplace method and sarrus method.

The most simple way to calculate a matrix determinant is when the matrix consists of 2 rows and 2 columns only.

Example we have this matrix called U matrix :

a

|U|  = (2 * 6) – (1 * 3) = 9

Sarrus Method

Based on sarrus method : “3×3 matrix determinant” is the sum of the products of three diagonal north-west to south-east lines of matrix elements, minus the sum of the products of three diagonal south-west to north-east lines of elements. Suppose we have 3×3 matrix called A :

1

To calculate |A| of matrix A using sarrus method :

2

We added 2 columns from matrix beside our original matrix, so we get the sum of the products of three diagonal north-west to south-east lines of matrix elements :

(1 *  3 * 2 ) + (2 * 1 * 2) + (1 * 3 * 1)

Then we need to get the sum of the products of three diagonal south-west to north-east lines of elements :

3

(2 * 3 * 1) + (1 * 1 * 1) + (2 * 3 * 2)

Based on above informations, we got this:

|A| = ( (1 * 3 * 2 ) + (2 * 1 * 2) + (1 * 3 * 1) ) – ( (2 * 3 * 1) + (1 * 1 * 1) + (2 * 3 * 2) )

|A| = (6 + 4 + 3) – ( 6 + 1 + 12) =  -6

Laplace Method

Calculating determinant of 3×3 matrix using laplace is simple.

Based on literature, to calculate 3×3 matrix using laplace :

laplace

So the matrix is  splitted into 3 small matrices which  2×2 matrix, where a,b and c are constants. Those 3 small 2×2 matrices are permutation from this columns and rows:

4

the 2×2 matrices :

5

6

7

Back again to A matrix:

1

mark

a = 1 , b = 2, c = 1

We got 2×2 permutation matrices :

mark1

mark2

mark3

lap1

determinant of first matrix = 5

determinant of second matrix = 4

determinant of third matrix = -3

so we got : (1 * 5 ) – (2 * 4) + (1 * -3) = 5 – 8 – 3 = -6

Forward Kinematics Calculation for Robotic Arm

written by Antonius Ringlayer

www.ringlayer.com

https://github.com/ringlayer?tab=repositories

Kinematics is a branch of mathematics, physics and classic mechanical engineering.   There are 2 mostly used kinematics in robotic field, they are : forward kinematics and inverse kinematics. Forward kinematics is frequently used to calculate the position of end effector when we know the degree value of each joint, meanwhile inverse kinematics is used to compute the degree of each joint when we know the position of the end effector.

To calculate forward kinematic, we can use simple trigonometry or denavit hartenberg parameter or screw theory . In this example we are going to use simple trigonometry to calculate 2d forward kinematics for 1 DOF and 3d forward kinematics for 3 DOF robotic arm.

Calculating 2D Forward Kinematics for 1 DOF robot arm

1dof.jpg

For example here we have 1 dof robotic arm. link length is 10 cm.  θ is 45°. The position of end effector on our cartesian coordinate (x, y) can be calculated easily using simple trigonometry.

adjacent-opposite-hypotenuse4

table

cos θ = x / l

x = cos θ * l  = cos 45° * 10 cm

sin θ = y / l

y = sin θ * l = sin 45°  * 10 cm

Here is python 2d 1 dof forward kinematic solver:

#!/usr/bin/env python3
'''
robotic forward kinematics solver
coded by Antonius Ringlayer
'''
import math
def solve_1dof_fk(_link, _degree):
    try:
        x = 0
        y = 0
        x = math.cos(math.radians(_degree)) * _link
        y = math.sin(math.radians(_degree)) * _link
        return x, y
    except:
        raise
x, y = solve_1dof_fk(10, 45)
print ("x=" + str(x) + " y=" + str(y))

Result:

$ ./2d_1dof_forward_kinematic_solver.py
x=7.0710678118654755 y=7.071067811865475

Position of end effector = p(7.07 cm, 7.07 cm)

Calculating 3D Forward Kinematics for 3 DOF robot arm

Calculating the position of the end effector on 3 dimensional space using trigonometry is not so hard.

For example here we have 3 dof robot arm :

Side view of robot arm :

side

Where : d2 is the height of second dof towards the floor, z is another dimension that we add to our cartesian geometry (the height of end effector from the floor), l1 = length of link 1, l2 = length of link 2, θ2 is d2 joint value, θ3 is d3 joint value.

Top View of Robot Arm :

toper

Cartesian coordinate represented from the top view of our robotic arm. θ1 is d1 joint value.

We are going to calculate the position of end effector (E) at 3 dimensional spaces (x, y , z).

Known variables : °

l1 = l2 = 10 cm

d2 = 7 cm

θ1 =  60°

θ2 = 30°

θ3 = 140°

We mark some more degrees and lengths :

side2.jpg

 

Steps:

since z = d2 + d3 – d6, first step is finding d3

step 1. finding d3

sin θ2 = d3 / l1

sin 30° = d3 / 10

d3 = sin 30° * 10 = 4.9 cm

Next step is finding d2 and d6 length. In order to get d2 and d6 length, we need to get more informations.

step 2. find θ4

180° = θ2 + 90° + θ4

θ4 = 180° –  (30° + 90°) = 60°

step 3. find θ5

θ3 = θ4 + θ5

140° = 60° + θ5 , hence θ5 = 80°

Based on all informations that we obtained, we can redraw our picture as follow:

side3.jpg

Step 6. finding d6 and z

Since we have θ5, finding  d6 is simply child play.

cos θ5 = d6 / l2

cos 80° = d6 / 10

d6 = cos 80° * 10 = 1.7 cm

So we have below information:

z = 7 cm + 4.9 cm – 1.7 cm = 13,6 cm

Next we need to find x and y.

total

 

We figure out that x and y can be obtained if we got the hypotenuse length (d1).

We figure out from side view that d1 = d4 + d5

step 7. finding d4

cos θ2 = d4 / l1

d4 = cos 30° * 10 = 8.66 cm

step 8. finding d5 and d1

since we have θ5 = 80°

sin  80° = d5 / 10, so d5 = sin  80° * 10 = 9.85 cm.

d1 = d4 + d5 = 8.66 +  9.85 = 18.51 cm

Back again to our top view, we figure out that we have collected enough information to find x and y.

d1

 

cos  θ1 = x / d1

cos 60° = x / 18.51

x = cos 60° * 18.51  = 9.25 cm

sin 60° = y / 18.51 , y = sin 60°  *  18.51 = 16.03 cm

Finally we find that p(x,y,z) = p(9.25 ,  16.03, 13,6)