Statistics: Charts

Intuitively describing data patterns.

When conducting statistical studies it is as important to present the results in an intuitive manner so users may understand and act on it as it is to conduct the study itself.

In this article I will briefly describe the recommended uses for each method.

Purpose Chart Type
Displaying Trends Over Time Line
Comparing Correlations XY Scatter Plot
Comparing Correlations and Variances Bubble
Comparing Distributions Column, Clustered Column, Bar, Pie
Comparing Distributions and Ranges Stacked Column, Area

Line Chart: Displaying Trends Over Time

Often in statistical studies the organization requires the ability to discern patterns within trends which spans dates.
The Line Chart represents one of the best methods to display trends of time.

In this article I will demonstrate how to display the effect a student’s study habits have on their resulting grades.

First, I’ll begin with an Excel table representing the number of hours a student spent studying every night along with the resulting grade of the next morning’s quiz.


As you can see, a correlation exists between the student slowly increasing their hours of study and their resulting grades.
However, the line chart presents a more intuitive method to convey the same results.

I’ll highlight both columns of data, then from the main menu choose Insert, then the Line icon.


Excel automatically creates a line chart showing a correlation between the hours which a student increases their study time and their grades.


As the student begins, you see their grade falls slightly below the number of hours they studied the night prior.
However, at 1.5 hours through the end of the study the student’s grades consistently increase with the most notable jump after 3 hours of night study.
It seems once the student has established consistent study habits, their grades will continue to improve without needing any more hours of study.


Column Chart: Comparing Distributions

Within most statistical studies categories of items exists which comparing against each other using a value list yields rich insight.

In this article I will describe how to use a column chart to compare data from a cancer study.
I’ll begin with a dataset which describes age groups and frequency of cancer diagnoses.


This dataset reveals a spike of cancer rates among the youngest in the sample, which subside as the age groups increase, then spike again during the older years.

First, I’ll highlight both columns of data and from the Insert menu, click the Column icon, then the column chart I want.



This chart utilizes columns to reveal more intuitively what I stated earlier about the dataset – that a trend exists between higher cancer rates and the youngest and oldest age ranges.


Clustered Column Chart: Comparing Categories & Sub Categories

Similar to column charts previous described, this chart type provides an intuitive way to describe categories with the addition of sub categories as well.

In this example, I will use this chart type to visually-describe a multiyear medical study of cancer treatment diagnoses between different age groups. The age groups represent the categories and years the sub categories.

First, I’ll begin with my dataset arranged so that each age group is listed along with its diagnoses under each year.


Next, I’ll highlight the data and use the Insert Menu to select the Clustered Column chart.


After updating my title, I have my dataset visually representing each age group’s cancer diagnoses for each year.


Generally speaking, it appears for most age groups the number of diagnoses increases from year-to-year with the exception of the youngest and oldest age group showing a steep decline.


Bar Chart: Comparing Distributions

First, I’ll begin with a set of test data reflecting sales revenue according to regions.

To create a chart, first I’ll select the range of values to be used by Axis. Then using the Ribbon bar, I’ll select to Insert a new chart.

As you can see, we now have a chart with the regions on the Y-axis and the values on the X-axis.

However, with all bars the same color and the generic “Revenue” labels, it’s not as intuitive as it should be.

First, I will right-click the chart and choose “Select Data.”
Then, I’ll click “Switch Row/Column” to the Region names are on the right. Also, it automatically color-coded the regions and matching legend.
I still don’t like the name, so I’ll rename it to “Sales by Region” by renaming the source column header.
Now the only problem is the scrunched sales figures so I’ll select the chart and using the side-handles, I’ll drag it wider so that all values have room without overlapping.

To format the legend displaying Region names, I right-click it and select Format Legend where I will apply a fill and border colors.

Now, I’ll do the same to the chart.


Stacked Column Chart: Comparing Distributions and Their Ranges

Similar to the Column Chart (above), this chart allows you to compare categories, but also ranges of values within each category.

In this example, I’ll begin with a dataset consisting of adult-student grade averages listed by age group and US regions.

Next, I’ll highlight the data and use the Insert menu to add a Stacked Column chart.

As you can see, the US region which reports the highest grades across all age groups is the Northeast with the Southwest reporting the lowest.
The same region reports the highest grades from the highest age group (51-80) with the Northwest reporting the lowest grades for the same age group.
Also, it appears the lowest two age groups report similar grades across all regions with the age groups of 36 – 50 and 51 – 80 showing the greatest diversity in grades.

Pie: Comparing Distributions of Data

Similar to using bar charts (above), pie charts provide a mechanism to compare distributions of data within a circle with the different categories touching each other.

In this example, I’ll begin with a dataset describing four medicines and their effectiveness.

To create a bar chart, I’ll highlight the dataset, use the Insert Menu and click the pie icon, then the 3-D Pie chart.

As you can see, the most effective medication is Med D and the least effective, Med B.
The margins between Med A and Med C are too close to see the difference so I’ll add data values to the graph.

Now I can see the difference between Med A and Med C – 10%.


Area Chart: Comparing Distribution Changes Between Data

When comparing data distributions, there are times when it’s important to compare differences between categories at certain data points.

In this example, I’ll begin with a dataset comparing differences between school subject fees accumulated between two fiscal years.

To create an Area Chart, I’ll highlight the dataset, then use the Insert Menu and click the Area chart icon.

As you can see by the areas illustrated, overall between the subjects a drop in fees was experienced between the first year and the next.
The least difference in fees between fiscal years was from Math and History the greatest.
Science and English experienced similar amount of drop in fees.

XY Scatter Plot: Comparing Data Correlations

Unlike other charts where data points form a contiguous shape whether in bars (bar chart) or slices (pie chart), scatter plot charts display distributions with individual data points, reflecting correlations between values. The name of this type of type begins with “XY” because the data points reflect positions on both axes.

In this example, I’ll begin with a dataset containing the number of hours a student spends studying a night along with the resulting grade.

Next, I’ll highlight the data and use the Insert menu to add an XY Scatter Plot chart.

Once the chart appears I’ll customize it by adding axes titles to distinguish the purpose of each set of axis values.
As you can see, the dots clearly indicate the more time a student spends studying each night, the greater their increase in grades, with 3 hours representing the greatest increase in scores from 2 hours per night.

Bubble: Comparing Data Correlations and Variances

Similar to XY Scatter Plot charts (above) where you compare correlations between data, you may elect to use a bubble chart to additionally view data variances.

Beginning with the same dataset for XY Scatter Plot charts example, I added some data to indicate significantly more data points within the 3 hours of study resulting in a grade average of 3.0 or greater.

Using the same steps as before to highlight the dataset and select a chart, I elect to use the Bubble chart instead.

Once the chart appears, I customized the bubble size so clusters of data points were more clear.
As you can see, the greatest concentration of results appear between the vector of 3 hours of study and grade average of 3.0. Also the greatest variance of values appears between those two results meaning more students reported those hours of study with similar results when compare to 1 or 2 hours of study and lower grade averages.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s