The byte order varies a bit depending on the software/device that generates it. Those numbers only refer to how the original data is sampled.
The wikipedia is your friend: http://en2.wikipedia.org/wiki/YUV
However, typically 4:2:2 is ordered u1y1v1y2u3y3v3y4u5y5v5y6 …
Note that the wikipedia has this wrong. The way the wikipedia describes it here, it would appear as if the U & V samples align with different luma samples. That is generally NOT the case.
4:2:0 is usually represented in memory in a planar format, one buffer of Y, one of U, one of V, with the U & V buffers being 1/4 the size of the Y buffer.
These formats also don’t specify the number of bits per sample. Typically the sampling is 8-bit for each component, but there may be times when more bits are used to avoid banding, or loss during processing.
4:4:4 would be the YUV equivalent of RGB, One byte for each component per pixel.
4:2:2 has one sample of Y per pixel, and U & V are sampled at half the horizontal bandwidth, so there are half the samples.
4:1:1 is like 4:2:2, but the U & V are sampled at one quarter of the horizontal bandwidth.
4:2:0 has the same amount of samples as 4:1:1, but the U & V are samples at half the bandwidth in both the horizontal and vertical directions, resulting in the same total of 1/4 the numebr of samples as the luma.
There was a really nice diagram on the net that showed all of this, I thought it was in the wikipedia, but I can’t find it now.