# Help me make sense of matrix multiplication

September 23, 2010 pm30 10:15 pm

I know how to multiply matrices. I can teach kids how to multiply matrices.

I would like help, both for myself, and for my students, understanding WHY we multiply them as we do.

Anyone out there?

you’re still pissed at yourself for this?

aw.

No. I am annoyed that I flubbed things in one class today.

But this question came from a student, I said I did not know, and that I would ask. And so I did.

Many years ago, I remember my linear algebra prof. explaining why matrices were multiplied like they are. For the life of me, I can’t remember what he said. Sorry I can’t help.

It’s an extension of the idea of linear combinations. If you go back to the idea of using matrices for systems, like

2x+y=3

x-y=0

It’s really like asking for a combination of

2

1

and

1

-1

to make

3

0

So you multiply the entire column by the x (the first row of the combination vector [x y])

and the second column by the second row.

I have an activity I do in Lin Alg adapted from a SIMMS lesson if you want it. goldenj at gvsu dot edu.

… And the order is a convention, like the right-hand rule, I assume.

Incidentally, I desperately want time to watch the MIT linear algebra lectures. I had an awful teacher for that, and the first MIT lecture is worth the price of admission.

MIT math 18.06, Gilbert Strang

That’s the piece I was going to fall back on – thanks for going there first.

I am not delighted not to have a good way to breathe some meaning into multiplication of 2×2 matrices for my kiddies, but they are not complaining (as long as the algorithm is clear).

Jonathon,

I lean towards John Golden’s explanation as an approach for teaching HS kids and explaining the “why”. It is just a way of encoding lots of information without re-listing variables repeatedly. It is probably more deeply related to the idea of dot products and the development of vectors in the late 19th century, but that is a long way round for students. The Gauss Jordan elimination approach are just mechanized approaches to the “elimination” we first taught in systems of equations.

So I can sell it like that (really, I already have). But it feels like the explanation of why we multiply, for example, 2x2s the way we do, that explanation feels out of reach for my students.

You need this guy’s book. In a nutshell, the basic thing you’re missing is just what a linear transformation is.

Always remember that the henscratches, and the algorithms governing their use have *meaning*. If you don’t teach the kids the meaning, you have taught them nothing.

http://linear.axler.net/

Hm – less elliptically, think about what the matrix of the *composition* of linear transformations looks like.

Googling the question also works perfectly well for this:

http://answers.yahoo.com/question/index?qid=20081012135509AA1xtKz

As an aid to your understanding, you should carry out the calculation yourself for the general (finite dimensional) case.

Thanks

When I taught it a LONG time ago, I found this introduction useful. I’ll put it on box.net with the link:

http://www.box.net/shared/ldcscu47v8

I think the added information helped the kids.

Ms. Cookie

This is good for hs students. Very applicable.

I teach it like this:

A matrix encodes a linear transformation (pause while I tell them what that means, and show them how it works by matrix x vector).

Obviously the composition of linear transformations is linear (pause for discussion).

So A(Bv) is a linear transformation on the vector v.

We want (AB)v to be the same linear transformation, and we define matrix multiplication to Make It So.

This will work as a good demonstration: talk about 2 by 2 matrices as transformations of the plane (particularly emphasize that the matrix is a description of where to send the basis vectors (1,0) and (0,1)). Then talk about composition of these transformations. Show them that the matrix of the composition of A and B is exactly what you get when you multiply A and B.

Wikipedia is often a good source for things like this. For the deeper “why” type questions, I find the Princeton Companion to Mathematics often clarifies and extends my knowledge.

I like how AX=B turns out to mean

a11 a12 x b1

a21 a22 times y = b2

and that means

a11x +a12y = b1

a21x + a22y = b2

It just feels elegant to me.

I don’t think I’ve added anything important to the discussion, but now I’ll know where to find some good resources when I want to go deeper. (And I can subscribe to the comments.)

Of course wordpress sucked out the spaces and what I wrote is illegible. I’ll send it by email…

Nothing substantive to add to what Dr. Rick and sherriffruitfly said in regards to the actual question, but I’ll add that this is something that I didn’t understand until I spent a long time studying linear algebra on my own in recent years, i.e. they didn’t teach me it in linear algebra class.

The story of me & matrices:

1992-3 I learn how to multiply matrices and calculate determinants in Precalc. We don’t use it to solve systems, we just do it. I can’t see the point at all, of either operation.

Fall 1994 I take linear algebra. The class is application-heavy and not theoretically serious. The teacher has us do all this Gauss elimination and application of matrices to problems in linear programming and voting theory, but doesn’t define an abstract vector space until like the third-to-last class. I don’t see the point of the definition, especially for vector spaces that aren’t defined in the usual geometric way. However I get a whiff of the utility of matrix multiplication because we use it to solve systems, so I get to see what Sue was talking about; and I learn that det = 0 implies the system has nontrivial solutions, although I don’t understand why.

Spring 1995 I take multivariable calculus. Again, the class is not theoretically serious. (I don’t know this at the time.) We take partial derivatives, differentiate some vector-valued functions, and compute a lot of multiple integrals, but there’s no vector-valued function

of a vector-valued variable, so no talk of the derivative as a linear transformation. So I can’t understand why this course has to come after linear algebra. Again, though, I get a whiff of something, albeit one I can’t understand at all: the change-of-variables formula for a double integralinvolves a determinant(the Jacobian). I haven’t learned any geometric interpretation of the determinant yet so I can’t make any sense of this.1995-6 I’m not taking math this year. I spend a good deal of time trying to figure out why that determinant showed up in the change of variables formula. I have a distinct memory of getting a vivid glimmer of the reason one day, but not in a way I could write down, so I lose it.

1996-2007 I spend a decade being vaguely unsatisfied regarding the usefulness of matrices and determinants.

2007-9 On my own I study Michael Artin’s

Algebra, and my understanding of and appreciation for linear algebra blossom. I can’t believe nobody ever told me any of this. I learn that linear transformations of vector spaces are actually the heart of linear algebra. I learn that the point of the definition of vector space is so that we canuseour geometric understanding of 3-space etc. to gain insight intoanysituation that fits the algebraic definition (e.g. the space of polynomials of degree ≤ n). I learn that the determinant of an nxn matrix gives you the factor by which the measures of objects in n-space are scaled by the corresponding linear transformation, and I get convinced that this is true. I kind of already knew that matrix multiplication described the composition of linear transformations, but I finally fully own this knowledge as I use it over and over again to do things I never knew how to do before: solve systems of homogeneous linear differential equations; decompose any distance-and-orientation-preserving transformation of n-space into a sequence of rotations in orthogonal planes; find closed forms for recursive sequences with linear recursions; prove that all degree 2 real algebraic curves in the plane are conics and prove a similar classification for degree 2 surfaces in 3-space; etc. My 180-degree turn on matrices is complete. I remember how inane I thought they were in 1992 and I can hardly believe it.2009-2010 I’m taking graduate classes in abstract algebra, topology and complex analysis, and I’m finding my new linear algebra knowledge useful in every class. Especially topology. In Sept. 2010 I start studying for the GRE, and therefore filling gaps in my mulitvariable calculus and diff. eq. knowledge, and now I’m finding the linear algebra knowledge

indispensablyuseful.This is all just to say that everything in linear algebra is deeply motivated, but I feel like this is a well-kept secret.

Heh. 2007-9 was a very good year for you.

“I learn that linear transformations of vector spaces are actually the heart of linear algebra.”

Yup.

I just watched some of the MIT linalg lectures (someone above mentioned them) – they suck balls. Those lectures will enable an industrious student to perform all the sexy calculations, but they won’t help a student understand *what’s actually going on* in the least. Unless the person already knows (like the prof there lol!).

Linalg teaching should really be split into 3 courses:

1) What’s really going on: start with linear transformations, go from there.

2) Basic (but detailed) calculation crap: Matrices, determinants, shit like that.

3) Numerical linear algebra: Factoring matrices, why in real life nobody actually inverts a matrix to solve Ax=b, etc.

It’s kind of funny. The reason why we multiply matrices as we do is simple: because applying 1 lin transformation after another requires it. But since most people have no idea what a linear transformation is, no answer is available to them. A huge failing in how linear algebra is taught – the MIT lecture series is a prime example of that.

Good post – I’m afraid your experience was all too typical.

I was lucky. I had a good lecturer on that topic, and he was Russian-educated. That helped a little. But I didn’t really GET what was happening until I read Artin.

JD: I’ve told you how I teach it – and I am completely convinced it’s the right way to teach it. You don’t need to teach the transformations first (though I do, and my syllabus pays lip-service to it), but you should *describe* a linear transformation (“it maps straight lines to straight lines, and it leaves the origin alone! That’s pretty much it, y’know. We’re interested in enlargements, rotations, reflections, scaling and shearing, really.”), point out without proof that this means a linear transformation (and define it; f(ax + by) = af(x) + bf(y) doesn’t take long). Then the composition of linear transformations is linear. Now, linear transformations can be written down as matrices (just tell ‘em, don’t go proving it for now :)). Now it’s enough to tell them “we’re going to define matrix multiplication like THIS because that will make composition work right”.

But what should you do to improve your understanding? Ben nailed that. Read Artin.

Ben, your story makes me want to get that book!

I’ve taught the community college version of linear algebra in the past, and I remember liking how there’s a bunch of different things that all end up amounting to the same thing. (det=0 iff dim<n iff AX=0 has non-trivial solutions iff something about column space iff …)

We didn't do enough with linear transformations, probably, but we did some.

Do. Artin is one of the most outlook-changing, “oh THAT’s what it’s about” books I know of – and I had a *good* algebra instructor!

Jonathan, if you want to study Artin together, I’m game. (Like we have time!) Anyone else interested? Used copies of international edition are under $25 (I find them through bookfinder.com). I just bought one.

You’ll do better with the Axler book I suggested.

one insightful exercise is to verify that matrix multiplication satisfies the property that the composition of two rotations by angle A is a rotation by angle 2A (I find that it helps to think of small angles).

|cosA -sinA|^2

|sinA cosA|

=

|cos2A -sin2A|

|sin2A cos2A|

Does this make the invertible rotation matrices into a Lie group?

There is an answer, it isn’t long, but it isn’t a one-liner either.

I address only the case of 2×2 integer matrices, but one has to crawl before one walks (I’m talking about everybody here, including me).

Define a ‘Simple Lattice’ modulo m as a subgroup of the integer coordinate domain generated by integer basis vector pairs {(r, 1), (m, 0)} and {(1, s), (0, m)}, gcd(r, m) = gcd(s, m) = 1, 0 < r, s, < m.

Such lattices lie in a one-to-one correspondence with all Simple Continued Fraction expansions, and hence with all real numbers. Note that for irrationals and transcendentals the period m must approach infinity. If this appears ridiculous then I agree, but that means the concept of real numbers must be ridiculus also.

A simple lattice in 2D may be viewed as a solution set to a linear congruence ax + by = 0 (mod m) or as a mapping of Z^2 by a 2×2 matrix, to name two ideas. The concept of MATRIX MULTIPLICATION corresponds to a simple lattice A mapping itself over another simple lattice B and treating B as if it were the Z^2 domain.

Now a unique set of coordinate transformations of the form (a, b) = k(c, d) + (e, f) apply over a given lattice L. These transformations correspond to the "partial quotients" in Euclid's Algorithm in BOTH the forward and backward sense. In fact the term "partial quotient" may now be discarded because the quotient numbers are coefficients of coordinate pairs rather than scalars as in a SCF expansion.

In applying a GEOMETRICAL INTERPRETATION of 2×2 matrix multiplication one must pay attention to the CENTRAL EUCLIDEAN VECTORS of the lattices involved.

You will not find ANYTHING concerning this geometrical interpretation (and hence real meaning given to) matrix multiplication because mathematicians know next to nothing about simple lattices. The next question is: Why don't mathematicians know about these lattices? The answer lies in the manner in which school children are indoctrinated into the mathematical paradigm (much like learning matrix multiplication according to a set of rules and not according to real thought).

I suggest you CHALLENGE mathematicians to present a geometrical interpretation of 2×2 integer matrix multiplication. Don't accept their statements like: the product of the determinants of the matrices determines the area of the fundamental parallelogram of the new lattice generated by the new matrix product AB. That simply is not good enough. They must explain to you what happens to the Euclidean vectors of the lattices involved and how these E-vectors relate to the E-vectors of the product lattice.

Of course you won't get an answer. What you'll get is a lot of GOBBLEDEGOOK which is what these people put out when they simply don't know what they're talking about.

Cheers, John C

None of the above has helped the teacher to explain why, in monosyllabic words, we multiply matrices; come to think of it why we dont divide matrices in the conventional sense of division

Actually, the linear transforms bits were quite helpful. I am hoping, btw, that “monosyllabic” was just a poor choice of words.

To answer Jon Oke in the terms he wants to hear: matrices started out as just an organised way of performing a set of related operations. The rectangular arrangement of numbers is the “guts” of a set of procedures with all the language and symbolism stripped away.

Do an internet search and you will learn about group theory which matrices fit into. To answer your question about normal division, in group theory these things have different meaning to what you learn about in school.

The bottom line however is that mathematicians have to keep on creating new ‘fields’ of inquiry that, like matrices, don’t obey the usual Ordered Field Laws of mathematics because their new objects of inquiry, like matrices, don’t commute in terms of multiplication or don’t have a unique multiplicative inverse, etc.

So you see mathematics, which claims to be about anything relating to number, is really just about changing the rules to suit the players – the mathematicians.

The so-called ‘real numbers’ are placed on a high alter by mathematicians and “sold” to unsuspecting high school kids via the ‘magic’ of calculus. It’s only later in life or in most cases never in life that these kids will finally realise that the rosy picture of mathematics painted for them in high school is just that – a rosy picture.

Mathematics is of course a farce and it more than anything else is leading this planet to ruin. Scientists won’t get off their backsides and think for themselves. They’ve had mathematical paradigms drummed into them at school and uni the same as you and me and yet they think they are independent thinkers. Wrong. If they use mathematics (they all do) then they’re the dogs being wagged by the tail. Mathematics isn’t simply a tool of the sciences, it DICTATES how these supposed scientists think about the world (the universe).

Go back to my first comments. Take things seriously or don’t do them. There aren’t any shortcuts.

JC

John, do you blog? I don’t know whether I’d agree with you, but I’d like to read more of your thoughts on this.

Hi Sue

Sorry about the long delay in replying. If you’d like an intro to simple lattices I could email this to you. I think you’d find it interesting.

John C

i seem to have missed this until now.

if a student were to ask me later today

why two-by-two matrices multiply as they do,

i’d start out with the idea of transforming

*variables* linearly.

thus. let capitals denote constants

and lower-case letters, variables.

(this is a very handy convention and

i wish it were in widespread use in

elementary textbooks.)

suppose x and y depend (linearly)

on u and v; suppose further that

u and v themselves depend linearly

on a and b.

for example, suppose

x = 2u + 3v

y = 4u – 5v.

suppose further that

u = a – 6b

v = 7a + 8b.

any “algebra 101″ student should now

be able to figure out how

x and y depend on a and b

(“directly”… i.e., without

mentioning u and v in the “answer”):

x = 2(a-6b) + 3(7a+8b)

y = 4(a-6b) -5(7a+8b).

simplify and hand in:

x = 23a + 12b

y = -24a – 64b.

now suppose we need to do

a gazillion of ‘em and hate

performing calculations

but don’t mind typing mindless code

into a piece-of-crap interface: the TI.

why then, all we have to do is

“multiply matrices” as follows.

[[2,3][4,-5]][[1,-6][7,8]]

(“enter”).

out pops what but

[[23 12]

[-24 -64]];

write out appropriate equations

and hand *those* in.

do the whole thing over on paper

with A,B,C,D,E,F,G,H

replacing 2,3,4,-5,1,-6,7,8.

voila: 2-by-2 matrix multiplication.

now go on and figure out what

AD-BC has to do with it.

yours in the struggle, v.

PS

“mathematics is of course a farce”…

for those who think; a tragedy for those who feel.

This is the meaning/purpose of matrix multiplication.

If you buy 5 bottles of water one day, then 6 bottles of water the next day, then 8 bottles of water a third day. Along with this you buy a small bag of chips the first day, 3 small bags of chips the 2nd day, and 6 small bags of chips the third day. The bottles of water are $1.25 and the small bag of chips is $2. You can set up a matrix like this

water and chips: Cost:

(5 6 8) ($1.25)

(1 3 6) ($2.00) Multiply these matrices to get the total.

Simple example but it is what is understandable at first.

Actually as I look at that example the matrix would have to look like this:

(5 1) ($1.25)

(6 3) ($2.00)

(8 6)

hope this helps.