Generating a Histogram of the Data


Generating a Histogram of the Data

A histogram of a data variable within a data group can be generated with the hist command. Using the Sample Data, let's explore how this command works. hist takes the form

hist [-n numbins] [-l] [-c] [-t] datavar [minval] [maxval]
where
-n numbins sets the number of bins for the histogram (default = 11),
-l sets the intervals to be logarithmically-spaced,
-c counts only the particles displayed within a clipbox,
-t counts only particles in a subset defined by thresh or only,
datavar is the data variable to return the distribution for,
minval is an optional minimum value for the distribution range, and
maxval is an optional maximum value for the distribution range.

To see how this can be useful, let's generate some histograms of the Sample Data on a data variable we're familiar with, the color index coloridx. Recall that these values range from 1 to 10 and are fairly evenly distributed since the mean value is around 5. Let's see how many are in each color with

hist -n 10 coloridx
This should report:
hist -n 10 0(coloridx) 1 10 =>
Total 10420, 0 < min, 0 > max, 0 undefined, 0 clipped, 0 threshed
0 < 1
1007 >= 1
1026 >= 2
1070 >= 3
1060 >= 4
1059 >= 5
1024 >= 6
1071 >= 7
1009 >= 8
1083 >= 9
1011 >= 10
0 > 10
which makes sense with an even distribution between 1 and 10. Now, we could reduce the distribution of the histogram in half by specifying only 5 bins
hist -n 5 coloridx
which now creates bins that are wider, that is, provide less resolution on the data. We can also specify logarithmically-spaced bins with the -l option, as in
hist -n 5 -l colorindex

If we thresh these data according to luminosity, as in

thresh lumin 50 100
we can now get a histogram on just the subset of data by including the -t option
hist -n 10 coloridx -t

Note that once data are threshed and the -t argument is used, hist can't seem to forget about the thresh and, while you may have all the data displayed, the resulting histograms may report that there are data threshed. This is a bug.

If we have defined a clip box, we can use the -c option

hist -n 5 -c coloridx
to generate a histogram based on the remaining unclipped data, whether the clip box is on or off.

Providing a minval sets the minimum value of the data variable to begin the histogram while maxval sets the upper limit on the histogram. Try entering

hist -n 10 coloridx 2

This will report that there are 1007 points with coloridx less than two and 1026 particles with coloridx equal to two. If you run

hist -n 10 coloridx 3
you are setting the lower limit of the distribution so that now the histogram will report 1007 + 1026 , or 2033 particles less than 3, the base value. Similarly, a maximum value can be specified either with a minimum value specified or with a ‘-’ in place of the minval, causing Partiview to substitute it with the minimum from the data.

© 2002-2005 American Museum of Natural History
Last Modified: 2006-04-28 by Brian Abbott