00:00:01
Let's continue talking about Visualization
Analysis and Design by diving into color.
00:00:07
So, when we think about ways of visually encoding
so far, we've been pretty focused on spatial
00:00:15
arrangements of attributes. Now let's switch
over and go beyond just spatial arrangement,
00:00:21
and think about mapping to some of the
other visual channels, particularly color.
00:00:28
So what is going on with color? When I was
talking about marks and channels before,
00:00:32
there were these three different things that
had color in them. Let's think about that.
00:00:38
So, we need to decompose color, because the
first rule of color is: don't talk about color.
00:00:44
Specifically don't just talk about
color. It's extremely confusing
00:00:48
if we treat that as monolithic. What we
really need to do is decompose this into
00:00:54
three channels that have
different characteristics.
00:00:57
So two of these are magnitude channels
that are good for ordered attributes. The
00:01:03
luminance channel is how bright something is
- think of that as grayscale black and white.
00:01:08
And then the saturation channel
is how colorful something is. So,
00:01:13
how much pink is there in between gray
and light pink and fully saturated pink.
00:01:19
And then the third channel is an identity
channel. And that's good for showing ordered
00:01:24
attributes. So color, specifically hue,
is what we usually think of as color.
00:01:30
And so we can ask questions like what color
is that? Is it red? Is it blue? Is it green?
00:01:35
So, we've got that identity versus magnitude,
and that is what helps us think about color.
00:01:42
These channels have different properties. So what
do they convey directly to the perceptual system?
00:01:47
How much information can they
convey in terms of how many
00:01:51
discriminable bins can we use?
And so we'll dive into that.
00:01:56
So, let's talk about color
channels in visualization.
00:02:01
So, the question of whether we
think of color as categorical
00:02:05
or ordered is going to depend.
We can have it either way. Right,
00:02:09
in our upper left corner we're really focusing on
the fact that there are four different years here,
00:02:14
and those different changes of hue for 2010 versus
2013 versus 2011 distinguishes them as categories.
00:02:22
And in contrast, in the lower left, we're
much more emphasizing an order of these years,
00:02:28
as we go from light green, to darker green, to the
darkest green there is from 2010 through to 2013.
00:02:34
And we're doing something similar on the
upper right, where we've got a choropleth
00:02:39
map, where we're using color saturation
within geographic regions to color code,
00:02:44
and then we're seeing the relationship
between that and this bar chart.
00:02:48
And we obviously have redundantly encoded
00:02:51
the length of the bar, and the amount
of green in that bar, the saturation.
00:02:57
So we can see examples that color
really can be either one of these.
00:03:04
Now when we're thinking about categorical color,
the thing that a lot of people unfortunately get
00:03:09
wrong is, they want to encode more
levels of a categorical attribute
00:03:14
than there are available discriminable
bins for categorical color.
00:03:20
So remember that we talked a lot when
we were introducing marks and channels,
00:03:23
about the idea that human perception is
completely built on relative comparisons.
00:03:29
So that's fantastic if there's colored
blocks that are next to each other.
00:03:33
If they're contiguous, we have very
high precision ability to discriminate.
00:03:38
Here we've got a 21 chromosomes
on a mouse and we're color coding.
00:03:43
And even really subtle distinctions between
greens, like in chromosomes six through
00:03:48
nine, we're able to tell the difference
when they're right next to each other.
00:03:52
But, and here's something that's gone
wrong, when they now try to map those
00:03:59
into small and scattered regions - right
here what they've done is they've said okay
00:04:06
for a mouse chromosome where does it fall
in the human chromosome - let's see how
00:04:10
we're doing when we have these absolute
comparisons of small scattered regions.
00:04:14
Well, we've got tan, and green, brown, white, red,
blue, now we're at the top of chromosome 2. Okay,
00:04:22
light blue, purple, wait is that yellow,
is that the same tan we saw before ...
00:04:28
see a green ,is that the same green
as before, maybe there's another one.
00:04:34
So, I'm starting to, I really haven't
even run out of fingers but I'm starting
00:04:38
to lose the ability to discriminate these subtle
differences in color because they're separated.
00:04:45
And so it turns out that if you've got
non-contiguous small regions of color,
00:04:50
it's almost always fewer bins than people
wish that they had available to them.
00:04:54
So the rule of thumb is only between
6 and 12 bins can you really count on.
00:04:59
Remembering that you need a color for the
background, maybe you need a default color
00:05:03
for something, you might need a highlight color.
So in general, it's a lot fewer bins than people
00:05:10
actually would like to use.
So what can you do?
00:05:14
Well one thing you could remember is, you don't
only have color. You have other visual channels.
00:05:20
If you do want to encode with categorical
color, deliberately yourself, bin things
00:05:27
and then map those into color.
Right now what's happening is, chromosomes
00:05:31
say 5 through 11, are all getting roughly mapped
into the same bin by our eyes. It might be better
00:05:38
if we could do a more semantic meaningful binning
rather than just having the luck of the draw in
00:05:43
terms of which things get binned involuntarily
by us not being able to notice the difference.
00:05:52
Just soIi don't only pick on the
genomics folks, here's another example
00:05:57
of people using more discriminable bins than we
can really see. Notice how that light blue for
00:06:03
cancer is pretty hard to tell apart from
the light blue for ear, nose, and throat.
00:06:09
So we really do have a limit as to the number
of categories. Think carefully yourself
00:06:15
about re: transforming your data, deriving a new
00:06:20
categorical attribute with fewer bins if
you insist on using categorical color.
00:06:26
This is also an issue for ordered color. How many
bins can we discriminate? Often not quite as many
00:06:32
as people might like. This is a nice example from
Gregor Aisch, showing how if we use the same basic
00:06:38
idea of a choropleth map, of color coding onto
geographic regions, well, what do we notice?
00:06:44
When we just look at those legends across
the top it's always quite clear what the
00:06:48
differences are between neighboring bins.
But then, when we look at the scattering of
00:06:52
U.S. states - well great for two classes; we can
definitely tell those apart. For three classes,
00:06:58
no problem at all. For four
classes, still pretty good.
00:07:02
Now once we even start getting to five classes,
it might be a little bit tricky to distinguish.
00:07:08
Is Maine the same color as Utah?Probably?
But now once we start getting to six, seven,
00:07:15
and eight classes, it gets very difficult to tell
apart when things are separate from each other.
00:07:22
Understanding if West Virginia and Arizona
are the same color with the eight class one;
00:07:26
it's very difficult.
So, remember that we have a
00:07:30
limited number of discriminable bins if things are
separated as opposed to right next to each other.
00:07:38
The other thing we have to
remember with ordered data is,
00:07:41
everyone wants to use rainbows,
but they're really a poor default.
00:07:45
Now why is that? People get a little confused, I
think, about the physics of reality, which is yes,
00:07:51
sunlight going through a prism will scatter
into a rainbow. That's a physical fact.
00:07:56
But the biology of our eyes is
not actually necessarily using
00:08:01
that information in the way people hope.
It's not intrinsic to have the ordering
00:08:06
of the rainbow in terms of our perception.
If I lock two people into separate rooms with
00:08:10
no way to communicate, and I give them four
color chips of purple, and red, and orange,
00:08:16
and blue, are they going to put them in the
same order if I ask them to do that? Probably
00:08:21
not. I sure wouldn't bet a thousand dollars.
But if I had people in rooms and I said, "Okay,
00:08:27
what's the order in terms of if I got like a light
gray, and a white, and a black, and a dark gray."
00:08:32
Would they put them in the same order? Surely yes.
So remember we want this perceptual intrinsic
00:08:38
ordering, as opposed to anything that you
have to learn. People can learn, but we want
00:08:43
with the perceptual stuff to make sure that it's
something intrinsic to the human visual system.
00:08:48
But that's not the only problem - is the lack of
intrinsic ordering - there's also a non-linearity.
00:08:54
I'm going to say here's two regions of that
spectrum and notice how within the one on
00:09:00
the left I can go from red to orange to yellow. I
can clearly see at least three different colors.
00:09:06
And in a region of the same size i
see green, green, green, on the right.
00:09:10
Because of the way that the human eye responds to
the visual spectrum it is not a linear situation.
00:09:19
But, what are some of the benefits of
rainbows? It's not that they're all bad;
00:09:23
it's that we can actually have fine-grained
structure be visible and enable. And in
00:09:28
this picture we can talk about the red parts
versus the yellow parts versus the green parts.
00:09:32
That makes us focus on the fine grain.
Now in contrast with the one on the bottom,
00:09:39
it can actually be difficult to tell what's going
on. We'll come back to that one in a minute.
00:09:45
So let's compare a different color
map for that same data set on the top.
00:09:51
And when we just have two hues
going from purple through gray
00:09:55
into yellow we're seeing a really
different sense of this dataset.
00:10:00
We are much more able to focus on large
scale structure. So, it's harder to focus
00:10:05
on fine grain but it's easier to focus on large
scale. That is then a choice of the designer.
00:10:12
Now here we see an example
where the mystery is cleared up;
00:10:15
it's actually the coastline of Florida.
And there's something else going on here
00:10:19
there's a very carefully chosen change of
hue at that zero point, to really distinguish
00:10:25
the blues getting darker as we go down into the
depths, and then the heights of the mountains.
00:10:31
Something careful was done here. The luminance is
increasing, so we do have multiple colors - but
00:10:38
they're ordered by the luminance. So
we're going from dark up to bright.
00:10:43
So that's actually a general principle of
color map design. There's some nice ones.
00:10:48
Viridis and magma are great for sequential color,
carefully designed and now deployed in many tools.
00:10:54
And in these, the luminance is monotonically
increasing. It's the hues are ordered by
00:11:01
luminance. They have other nice properties.
They're perceptually uniform. They do have
00:11:05
multiple colors in them. And as we'll
talk about later they are colorblind safe.
00:11:12
So, these are both some useful
color maps for sequential data.
00:11:17
Let me mention one thing: rainbows are not
always bad if you have categorical attributes,
00:11:24
then you do want very, very bright colors
00:11:27
because the ability to perceive
small colors is dependent on size.
00:11:31
So if you have little bits of color scattered
around, we want saturated color. And of course the
00:11:37
most saturated colors we can get are those rainbow
colors. So if we have a segmented color map,
00:11:43
where we're doing something categorical rather
than a continuous one, then in fact rainbow colors
00:11:49
can be very good choices.
So, it all depends on matching
00:11:53
the characteristics of your visual encoding to
the characteristics of your data set, as always.
00:12:02
So I alluded to this idea that maybe there's
some interaction between channels. We really
00:12:07
have to be careful about this with color.
Color does not have separable characteristics.
00:12:13
In particular as I mentioned, size
heavily affects the salience of color.
00:12:19
So and in particular, if you have
small regions you need highly saturated
00:12:25
colors to be able to even notice them.
That's why we have highly saturated red maps
00:12:31
on top of these larger background swaths of pale,
more unsaturated, colors in the map on the left.
00:12:38
Now if you have large regions you typically
would want those to be low saturation.
00:12:42
Notice how these big, big areas of highly
saturated color really jump out at you and
00:12:47
they're quite jarring so we want to be aware of
the size as we think about the saturation to use.
00:12:54
That can be tricky in a context where
you don't know ahead of time how large
00:12:58
a region will be if you've got, of course, a
data-driven situation. So there's nuance there.
00:13:06
Some key things to know about saturation
and luminance. The most crucial is, they are
00:13:10
not separable from each other. If you encode with
one, you cannot usefully encode with the other.
00:13:17
And moreover, they're not separable
from transparency either. So these
00:13:21
three channels of saturation,
and luminance, and transparency,
00:13:24
are all, you have to use only one of them.
They they very much are not separable.
00:13:31
Here's just an example with the same tool. And
this tool, by the way, is called colorbrewer. It's
00:13:36
a great tool for generating some color palettes
that's quite informed by visualization design
00:13:43
principles. So notice how what I've done on the
left here, is I've got fully saturated colors
00:13:53
that I've specified in the middle there, but
00:13:57
the transparency is reasonably high.
But because I've got some transparency
00:14:04
now, these look unsaturated.
And in contrast, when I have the
00:14:08
unsaturated colors on the right, and
then I have the transparency all the way,
00:14:13
notice how I can't tell the difference.
So I'm unable to tell the difference
00:14:17
of transparency versus saturation. So, key
principle; definitely keep that in mind.
00:14:23
We're typically going to use transparency
often in the design of making visual layers
00:14:29
as we will talk about later, but we
don't usually try to explicitly encode
00:14:35
with transparency, because it is so hard to
separate out from luminance and saturation.
00:14:44
In general, if you're going to have small
separated regions, the safe thing to do is
00:14:47
to have only two bins. And, of course, remember to
only use one of either saturation or luminance or
00:14:53
transparency. Absolute max for things like
saturation coding for separated regions,
00:14:59
would be something like three to four bins.
So for contiguous regions, you can have many
00:15:06
bins if you're going to be making distinctions
of things right next to each other. But,
00:15:10
again, be careful not to try to use
these three channels simultaneously.
00:15:18
Let's talk about color palettes. Now we've
already started down this, with thinking
00:15:23
about things like categorical colors.
These are all color. We're talking
00:15:29
right now about palettes for a single
attribute univariate color palettes.
00:15:33
So we really want maximum distinguishability. That
sometimes means we go for as saturated as we can,
00:15:39
although you could also have less saturated ones.
Sometimes these are called qualitative or nominal,
00:15:45
particularly in the geographic literature
if you're using tools built after that.
00:15:51
So if you have ordered data then we really
distinguish between sequential and diverging.
00:15:57
So sequential is when you're going from min
to max, and diverging is when there's some
00:16:01
semantically meaningful midpoint. Typically
we're going to use a neutral color for that,
00:16:05
like white or yellow or gray, and then we'll pick
a saturated color for each of these end points.
00:16:11
And we see a bunch of examples of
these diverging palettes - some
00:16:14
with yellow, some with white midpoints.
Now if you're doing something sequential,
00:16:19
then we've got this ramp going
from a minimum to a maximum value,
00:16:23
where typically it's unsaturated on
one side and then fully saturated or
00:16:28
fully bright versus dark, depending on
whether we're thinking about this as
00:16:32
luminance or saturation. But those are quite hard
to tell apart we want to pick just one typically.
00:16:38
So here's a few examples of where we're doing
sequential color maps, and we have multiple hues,
00:16:45
and so we are being careful to order
them by luminance as we talked about.
00:16:51
Finally there's one more property we
can think about. Remember that you can
00:16:55
have ordered data that's cyclic, and you can
emphasize the cyclic nature of that by having
00:17:00
many hues. So multi-hue maps are useful for
trying to show the cyclic nature of the data.
00:17:09
So the design considerations for a color palette
for a single attribute are, first of all,
00:17:15
are we looking at segmented discrete little
boxes, or something that's continuous?
00:17:21
Are we looking at diverging or sequential or
cyclic as the properties of the data set we want
00:17:26
to emphasize? Do we have one hue or two hues
or multiple hues? Is it perceptually linear?
00:17:33
And have we ordered the hues by luminance? And
as we'll get to later, is it colorblind safe?
00:17:42
Well, we had been talking about color for
one attribute. What if we want to encode two
00:17:46
different attributes with color? These
are what's called bivariate color maps.
00:17:52
Now there is a straightforward case where
00:17:55
we have two different attributes, but one of
those attributes is binary. It's just on or off.
00:18:01
So here's an example of that where essentially
we've got a variation in hue, and then we have the
00:18:09
low saturation and the high saturation version of
that. So what's getting encoded with saturation
00:18:14
is just binary on or off. People are
reasonably able to deal with those.
00:18:21
But in the cases where you actually have multiple
levels in each of two different attributes,
00:18:28
it gets very tricky people do use these sometimes,
but there's a fair amount of empirical evidence
00:18:34
that people find them difficult to interpret.
Not impossible, but certainly difficult.
00:18:41
So be aware that it is a much more tricky design
problem to have bivariate color maps with multiple
00:18:49
levels in each of the directions as opposed
to just binary in one of those directions.