[Contents] [TitleIndex] [WordIndex

Create Bubble Charts in Scilab

Abstract

In this page, we present a function to create bubble charts in Scilab.

Introduction

According to Wikipedia [1] “A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size.”

The striking fact about bubble charts is that it can print a 3D information into a 2D picture, as we are going to see.

This feature is available from the "bubblechart" and "bubblematrix" functions in Stixbox, v2.2. To install it, please use

atomsInstall('stixbox')

and restart Scilab.

The script is also available in attachement:

This script defines a function which has the following calling sequence:

    bubblechart(data)
    bubblechart(data,legen)
    bubblechart(data,legen,fill)
    bubblechart(data,legen,fill,maxR)
    bubblechart(data,legen,fill,maxR,scale)

Here, the data is a n-by-3 matrix of doubles, where data(:,1) represents the x values, data(:,2) represents the y values and data(:,3) represents the z values.

For each row i in data, a colored bubble with center x=data(i,1), y=x=data(i,2) is plotted. The radius of the i-th bubble is computed depending on z=data(i,3), so that the area of the bubble is proportionnal to z.

The radius are scaled, so that the bubbles have a radius proportionnal to maxR. Reduce this parameter if the bubbles are too large, perhaps hiding other bubbles.

A demo

In the following script, we define a sample data, a legend, and use the bubblechart function [4].

// "Life Expectancy", "Fertility Rate", "Population"
data=[ 
 80.66, 1.67, 33739900
 79.84, 1.36, 81902307
 78.6,  1.84, 5523095
 72.73, 2.78, 79716203
 80.05, 2,    61801570
 72.49, 1.7,  73137148
 68.09, 4.77, 31090763
 81.55, 2.96, 7485600
 68.6,  1.54, 141850000
 78.09, 2.05, 307007000
];
legen=[
'CAN'
'DEU'
'DNK'
'EGY'
'GBR'
'IRN'
'IRQ'
'ISR'
'RUS'
'USA'
];
//
// Just the data
h=scf();
xtitle("","Life Expectancy", "Fertility Rate")
bubblechart(data)

The previous script produces the following output.

life.png

The function has a special feature, which makes so that the bubbles appear as circular as possible. This might be difficult, especially in the general case where the X and Y scales are different, e.g. X is from 0 to 1 and Y is from 0 to 100. In this case, if we plot a circle, it appears as an extremely flatten ellipse.

To solve this issue, the function really prints ellipses, which semi-axes are computed so that the ellipse is as round as possible. This is computed depending on the minimum and maximum X and Y values. But this is not a completely satisfactory solution, given that the user may change the size of the graphics window, making the almost circular bubble look flat, again.

To solve this, we can disable the scaling and set the isoview mode to "on", as in the following script.

h=scf();
xtitle("","Life Expectancy", "Fertility Rate")
bubblechart(data,[],[],[],%f)
h.children.isoview="on";

This produces the following figure.

life2.png

The previous picture is nice, but is quite large, because of the "isoview" option and the different in scales in X and Y. Moreover, the figure has no legend, which makes it difficult to identify the bubbles.

In the following script, we use datatips.

h=scf();
xtitle("","Life Expectancy", "Fertility Rate")
bubblechart(data,legen,2)

This creates the following figure.

life3.png

This is even more nice with the rainbow colormap.

h=scf();
h.color_map = rainbowcolormap(10);
xtitle("","Life Expectancy", "Fertility Rate")
bubblechart(data,legen,1:10)

life4.png

It is also possible to create a separate legend.

h=scf();
h.color_map = rainbowcolormap(10);
xtitle("","Life Expectancy", "Fertility Rate")
bubblechart(data)
legend(legen,"in_upper_right");

life5.png

Although the rainbow colormap is nice, it does not convey any information. Instead, we can select the color depending on the world zone. This allows to print a 4 dimensionnal information into a 2D figure, which is quite interesting in itself.

h=scf();
xtitle("","Life Expectancy", "Fertility Rate")
// 1 = CAN, USA (North America)
// 2 = Europe (DEU, DNK, GBR, RUS)
// 3 = Arab/Persian/Hebrew (EGY,IRN,IRQ,ISR)
h.color_map = rainbowcolormap(3);
fill=[
1
2
2
3
2
3
3
3
2
1
];
bubblechart(data,legen,fill)

life6.png

We can also use negative fill argument, which prevents from drawing the contours.

h=scf();
xtitle("","Life Expectancy", "Fertility Rate")
h.color_map = rainbowcolormap(3);
bubblechart(data,legen,-fill)

life7.png

It also possible to create a matrix-oriented bubble chart. The area of each bubble depends on a random number, chosen uniformly in [0,1].

n=5;
R=rand(n,n);
data=zeros(n^2,3);
[X,Y]=meshgrid(1:n,1:n);
data(:,1)=X(:);
data(:,2)=Y(:);
data(:,3)=R(:);
h=scf();
xtitle("Random numbers [0,1]")
h.color_map = rainbowcolormap(n^2);
bubblechart(data,[],[],0.5)

random.png

The following script is a typical example, adapted from [5].

data =[
14 12200 15
20 60000 23
18 24400 10
];
h=scf();
xtitle("Industry Market Share Study","Number of products",...
"Sales");
h.color_map = rainbowcolormap(3);
legen=string(1:3);
bubblechart(data,legen)

industry.png

The following example is adapted from [6].

data =[
10 10 100
5 5 75
8 5 65
3 2 60
5 3 50
1 2 35
];
legen=[
"Smith"
"West"
"Miller"
"Carlson"
"Redmond"
"Dillar"
];
h=scf();
xtitle("Salary Study","Years with Firm",...
"% Salary Increase");
h.color_map = rainbowcolormap(6);
bubblechart(data,legen)

salary.png

The following figure was adapted from [2], with data from [3].

students.png

With the same principle, we can create a categorical bubble chart. Consider the dataset [7], "Weights of 1996 US Olympic Rowing Team." The first column gives the name of the rower, the second gives his event, and the third gives his weight. There are 8 different event categories, with weight given as numeric data. It is easy to use the "dsearch" function in Scilab to create a matrix containing the number of athletes in a weight category, for a specific event. From there, we can create a bubble chart representing the number of athletes in each weight category and for each event.

In order to create the X and Y axes, we remove the original axes and replace them with the categorical axes. To do this, we use the "drawaxis" function.

weightLabels=["<150" "150-175" "175-200" ">200"]
eventLabels=[
"LW_double_sculls"
"LW_four"
"coxswain"
"eight"
"four"
"pair"
"quad"
"single_sculls"
];
R=[
  0     2        0        0
  0     4        0        0
  1     0        0        0
  0     0        4        4
  0     0        1        3
  0     0        1        1
  0     0        0        4
  0     0        0        1
];
// Print the table
disp([["";eventLabels],[weightLabels;string(R)]])
// Create a bubble chart for the data
m=size(R,"c")
n=size(R,"r")
D=zeros(m*n,3);
[X,Y]=meshgrid(1:m,1:n);
D(:,1)=X(:);
D(:,2)=Y(:);
D(:,3)=max(R(:),%eps);
h=scf();
xtitle("Area=Number of Athletes")//,"Weigth Cat.","Event")
bubblechart(D,[],2,0.2)
h.children.axes_visible=["off","off","off"];
drawaxis(x=0.8,y=1:n,dir="l",tics="v",val=eventLabels)
drawaxis(x=1:m,y=0,dir="u",tics="v",val=weightLabels)

weight1996.png

Epilogue

The function is now provided in Stixbox.

To install, just type:

atomsInstall("stixbox")

Appendix

The following script can be used to create the R matrix used in the 1996 US Olympinc Rowing team example above.

The data is available here :

The following script uses the "dsearch" function in order to compute the number of athletes in each weight and event category.

filename="Weights-1996-US-Olympic-Rowing-Team.csv"
data=csvRead(filename,";",[],"string")
// Get the number of events, the events
events=data(:,2)
eventLabels=unique(events)
nEventsCategories=size(eventLabels,"*")
// Get the weights
weights=csvTextScan(data(:,3))
// Compute the categories, the number of cat.
weightLabels=["<150" "150-175" "175-200" ">200"]
weightCategories=[0 150 175 200 1000]
nWeightCategories=size(weightLabels,"*")
// For each event, count the number of athletes 
// falling into a weight cat.
R=zeros(nEventsCategories,nWeightCategories)
for i=1:nEventsCategories
    // The athletes in this event category
    j=find(events==eventLabels(i));
    [ind, occ] = dsearch(weights(j),weightCategories);
    R(i,:)=occ;
end

We can then print the data that we have computed in the form of a table.

-->disp([["";eventLabels],[weightLabels;string(R)]])
 
!                  <150  150-175  175-200  >200  !
!                                                !
!LW_double_sculls  0     2        0        0     !
!                                                !
!LW_four           0     4        0        0     !
!                                                !
!coxswain          1     0        0        0     !
!                                                !
!eight             0     0        4        4     !
!                                                !
!four              0     0        1        3     !
!                                                !
!pair              0     0        1        1     !
!                                                !
!quad              0     0        0        4     !
!                                                !
!single_sculls     0     0        0        1     !

References



Author: Michaël Baudin, 2013


2022-09-08 09:27