John - the timbre terms are 12 basis vectors derived from the PCA decomposition. Tristan's describes the vector in this thread http://developer.echonest.com/forums/thread/794 as:
they are automatically derived from data, there's no actual term, but only my mere interpretation: by looking at the shape of the basis vector, you can sort of tell what they describe.
First one (loudness) was actually imposed as a way to control the average dB.
So technically, "timbre" as conventionally defined, starts at the next dimension that I interpreted as "brightness", simply because it emphasizes the weight or ratio of high frequencies versus low frequencies, and which is typically the "physical" measure that is correlated to the "perceptual" quality of brightness.
The next one is harder to describe, but it has something to do with flatness and "narrowness" of the sound (when lowest and highest frequencies are attenuated).
The following is clearly about the emphasis of the attack, or sharpness.
This is nice because several perceptual studies have shown that timbre was well discriminated by attack, and brightness qualities, which is pretty well confirmed by the data-driven representation here.
Now, as you move higher with dimensionality, it becomes harder to "label" the basis vector with an actual term or quality, but you'll notice that the complexity of the temporal/spectral shape increases. We've stopped at 12 dimensions as that described "most" of the underlying signal, but we could keep going like this much further.
It is up to you to interpret the "meaning" of each of these dimensions, but I believe that would be irrelevant given that most of those dimensions (including loudness really) are still "perceptually" inter-dependent, even when mathematically orthogonal.
Basically you can't really isolate them from each other, hence the complexity of describing or even talking about timbre, which is basically what's left when you're done talking about its pitch, and loudness.