Derivative of inner product

up vote
3
down vote

favorite

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

where $G$ is some symmetric matrix, if I want the derivative of this inner product with respect to $mathbf{x}$, I should get a vector as a result since this is the derivative of a scalar function by a vector (https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-vector).

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

asked Nov 30 at 9:15

The Bosco

510211

add a comment |

up vote
3
down vote

favorite

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

asked Nov 30 at 9:15

The Bosco

510211

add a comment |

up vote
3
down vote

favorite

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

asked Nov 30 at 9:15

The Bosco

510211

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

linear-algebra derivatives vectors inner-product-space

asked Nov 30 at 9:15

The Bosco

510211

asked Nov 30 at 9:15

The Bosco

510211

asked Nov 30 at 9:15

The Bosco

510211

asked Nov 30 at 9:15

The Bosco

510211

asked Nov 30 at 9:15

The Bosco

510211

add a comment |

4 Answers
4

active

oldest

votes

up vote
5
down vote

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Let's be more explicit:
begin{align*}
f(x+h)=& langle x+h,x+h rangle_G \
=& underbrace{langle x,x rangle_G}_{f(x)} + underbrace{2langle x,h rangle_G }_{df(x)cdot h}+ underbrace{langle h,h rangle_G}_{in o(|h|)}\
end{align*}

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

$$
df(x)cdot h = langle nabla f(x),h rangle = langle 2Gx,h rangle
$$
where $nabla f(x)=2Gx=left(begin{array}{c}partial_{x_1} f \ ... \partial_{x_n} fend{array}right)$. This is your "column" vector.

edited Nov 30 at 10:43

answered Nov 30 at 10:09

Picaud Vincent

1,05015

add a comment |

up vote
4
down vote

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line

answered Nov 30 at 9:29

caverac

12.1k21027

add a comment |

up vote
2
down vote

More generally, suppose we differentiate any scalar-valued function $f$ of a vector $mathbf{x}$ with respect to $mathbf{x}$. By the chain rule, $$df=sum_ifrac{partial f}{partial x_i}dx_i=boldsymbol{nabla}fcdot dmathbf{x}=boldsymbol{nabla}f^T dmathbf{x}.$$(Technically, I should write $df=(boldsymbol{nabla}f^T dmathbf{x})_{11}$ to take the unique entry of a $1times 1$ matrix.)

If you want to define the derivative of $f$ with respect to $mathbf{x}$ as the $dmathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $boldsymbol{nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $boldsymbol{nabla}f$, is an alternative convention.

answered Nov 30 at 9:42

J.G.

19.9k21932

add a comment |

up vote
0
down vote

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

Note that the derivative of $fcolonmathbb R^ntomathbb R$ is not a vector, but a linear form instead. The gradient $nabla^{langle .,.rangle_G}f$ in respect to the inner product $langle .,.rangle_G$ is the unique vector which represents this linear form in presence of the specified inner product. In our case we have
$$nabla^{langle .,.rangle_G}f(x)=2x,quadtext{that is}quad
D_p(langle x,xrangle_G)=langle p,2xrangle_G$$
whereas
$$nabla^{langle .,.rangle}f(x)=2Gx,quadtext{and that is}quad
D_p(langle x,xrangle_G)=langle p,2Gxrangle$$

edited Nov 30 at 16:21

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3019859%2fderivative-of-inner-product%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
5
down vote

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 at 10:43

answered Nov 30 at 10:09

Picaud Vincent

1,05015

add a comment |

up vote
5
down vote

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 at 10:43

answered Nov 30 at 10:09

Picaud Vincent

1,05015

add a comment |

up vote
5
down vote

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 at 10:43

answered Nov 30 at 10:09

Picaud Vincent

1,05015

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 at 10:43

answered Nov 30 at 10:09

Picaud Vincent

1,05015

edited Nov 30 at 10:43

answered Nov 30 at 10:09

Picaud Vincent

1,05015

answered Nov 30 at 10:09

Picaud Vincent

1,05015

answered Nov 30 at 10:09

Picaud Vincent

1,05015

add a comment |

up vote
4
down vote

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 at 9:29

caverac

12.1k21027

add a comment |

up vote
4
down vote

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 at 9:29

caverac

12.1k21027

add a comment |

up vote
4
down vote

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 at 9:29

caverac

12.1k21027

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 at 9:29

caverac

12.1k21027

answered Nov 30 at 9:29

caverac

12.1k21027

answered Nov 30 at 9:29

caverac

12.1k21027

answered Nov 30 at 9:29

caverac

12.1k21027

add a comment |

up vote
2
down vote

answered Nov 30 at 9:42

J.G.

19.9k21932

add a comment |

up vote
2
down vote

answered Nov 30 at 9:42

J.G.

19.9k21932

add a comment |

up vote
2
down vote

answered Nov 30 at 9:42

J.G.

19.9k21932

answered Nov 30 at 9:42

J.G.

19.9k21932

answered Nov 30 at 9:42

J.G.

19.9k21932

answered Nov 30 at 9:42

J.G.

19.9k21932

answered Nov 30 at 9:42

J.G.

19.9k21932

add a comment |

up vote
0
down vote

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 at 16:21

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

add a comment |

up vote
0
down vote

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 at 16:21

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

add a comment |

up vote
0
down vote

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 at 16:21

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 at 16:21

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

edited Nov 30 at 16:21

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

answered Nov 30 at 16:14

Michael Hoppe

10.6k31733

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vrftjkry