My Community

General Category => Tips and Tutorials => Topic started by: Ron D. on January 01, 2015, 03:47:08 AM

Title: Strategies for dealing with overplotting
Post by: Ron D. on January 01, 2015, 03:47:08 AM
Overplotting occurs when there are so many polylines in a PC plot that all you see is a mass that cannot be distinguished as individual lines. This is quite common and quite off-putting. As Alfred Inselberg, who invented the use of parallel plots for data visualization, wrote, "Do not let the picture intimidate you!" *

The very first thing I do is pick the variable of most interest in seeing relationships, select that axis (Alt-click on the name), and do an automatic range brush of that axis with the maximum number of colors (eight) as determined in the settings. This usually immediately transforms an incomprehensible plot into something that really shows overall behavior of the data. It is an astonishing transformation, really. I repeat this for other axes if needed, and then I start color brushing smaller groups of interrelated lines. The automatic "gap brushing" option applies up to 20 different color brushes to groups of lines separated by the largest gaps on an axis, and can also be useful to sort out data that exhibits important gaps.

You can start pruning away uninteresting polylines by zooming, panning, showing values at the axes on mouse-overs, and color-brushing and hiding lines of different colors. Selected (swiped) lines can then be saved to a new data file, and when this new file is opened the ranges of the axes are automatically adjusted to the new ranges of the variables, which spreads out the lines even more.

Often the interesting aspects of PC plots are not the lines but gaps between the lines! Look for outliers and small negative spaces that can differentiate groups of polylines for selection and color brushing.

Move axes into different positions, or better yet create the PC plot matrix, to view the polyline structure between different pairs of neighboring axes. Uncorrelated variables on neighboring axes creates a cluttered, random mass of line segments between them. Correlated variables on neighboring axes produce strong direct, reciprocal or enveloped line segments. By arranging axes to show correlations you will find that clutter in the plot gives way to interesting and revealing structures.

In addition, Sliver offers two ways of applying transparency (or alpha-blending) to plots, first by invoking a separate PC plot window with adjustable transparency and linewidths, and second by exporting the plots to PDF with a transparency assigned in the settings. The plots are rendered in vector form in the PDF, so zooming while viewing the PDF is very effective in seeing the inner structure of the PC plot.

* Inselberg, Alfred. Multidimensional Detective, 1997. Download from http://courses.ischool.berkeley.edu/i247/f00/readings/inselberg-ieee97.pdf

Ron