Introduction
Having reduced (a part of) the research problem I was working on to finding a gradient, I was just about to be relieved when I realised I had no clue how to find “gradients” in the space of Hermitian matrices I was working in. And the function I wanted to find the gradient for wasn’t too simple looking either.
I turned to math.stackexchange in despair, and turns out the gradient to my function was super simple and elegant!
Now I owed it to complex matrix differentiation to go study that shit. Till the excitement lasts, atleast.
The answerer of the question suggested the book ‘Complex-Valued Matrix Differentiation’ which I started reading - hoping to cover up holes in complex differentiation and linear algebra on the way.
So, first things first. The word for “differentiable” for complex valued functions of complex numbers is analytic. Before we get into finding derivatives, we need to check whether the function is analytic in the domain we’re interested in.
Taking our function , and ensuring that the derivative is the same when a point is approached from all directions gives us the Cauchy-Reimann equations:
which happen to be a necessary and sufficient condition for a function to be analytic at a given point.
Aside: Changing independent variables changes derivatives - and we get to choose these!
Consider the following functions:
We can now write and in terms of and :
Now, consider :
orrrr,
So, which is the right partial derivative?
It turns out both are correct. In the first case, and are our independent variables, so turns out to be dependent on . In the second case, and are considered independent variables, so the partial derivative is 0.
So, these really depend on our choice of variables. Math, unlike CS sadly has no implication just because something is on the left or right of the sign, so we’re left to specify this clearly when we’re working in a system.
Real valued complex functions aren’t differentiable :(
When we do matrix calculus, we want to be able to differentiate everything. Scalar functions of vectors, matrix functions of scalars, scalar functions of matrices – everything.
However, just take the simple case of a real valued complex function. This means is always 0, and the first Cauchy-Reimann equation itself is not satisfied unless is constant with respect to . So, can’t be found for arbitrary real valued function.
Perfect. The first thing we learn is we can’t find derivatives of real valued functions, let alone the other grandiose plans involving matrices and vectors.
So, what do we do?
This is where we cheat. If we change the independent variables from and to and , where now,
Then we can take the partial derivative with respect to !
For instance for the real valued function , is just .
The partial derivatives with respect to and are called formal derivatives – which we have to satisfy ourselves with.
Tldr; we couldn’t find the derivative with respect to , so we changed the system such that became an independent variable, and made do with taking the partial derivative with respect to .
Aside: Derivatives for multivariable real valued functions?
One might ask, aren’t real valued complex functions analogous to having a real 2-variable function? What exactly happens there?
Firstly, there is no one-big-derivative defined for such functions. We can take the partial derivative with respect to each variable, and put them together in a gradient vector to find the direction in which the function slopes the most, but there is no one ‘number’ that is the derivative.
And the gradient exists if all the partial derivatives exist.
Which is the fundamental difference between complex differentiation and the above - the atomic unit we’re dealing with.
(You could still argue that the gradient is actually one complex number, I guess - still need to think about that!)
Thanks to Siddharth Bhat for the intuition on changing independent variables and Jayitha for the reasoning about real valued functions.
0 comments:
Post a Comment