Create Bubble Charts in Scilab
Abstract
In this page, we present a function to create bubble charts in Scilab.
Introduction
According to Wikipedia [1] “A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size.”
The striking fact about bubble charts is that it can print a 3D information into a 2D picture, as we are going to see.
This feature is available from the "bubblechart" and "bubblematrix" functions in Stixbox, v2.2. To install it, please use
atomsInstall('stixbox')
and restart Scilab.
The script is also available in attachement:
This script defines a function which has the following calling sequence:
bubblechart(data) bubblechart(data,legen) bubblechart(data,legen,fill) bubblechart(data,legen,fill,maxR) bubblechart(data,legen,fill,maxR,scale)
Here, the data is a n-by-3 matrix of doubles, where data(:,1) represents the x values, data(:,2) represents the y values and data(:,3) represents the z values.
For each row i in data, a colored bubble with center x=data(i,1), y=x=data(i,2) is plotted. The radius of the i-th bubble is computed depending on z=data(i,3), so that the area of the bubble is proportionnal to z.
The radius are scaled, so that the bubbles have a radius proportionnal to maxR. Reduce this parameter if the bubbles are too large, perhaps hiding other bubbles.
A demo
In the following script, we define a sample data, a legend, and use the bubblechart function [4].
// "Life Expectancy", "Fertility Rate", "Population" data=[ 80.66, 1.67, 33739900 79.84, 1.36, 81902307 78.6, 1.84, 5523095 72.73, 2.78, 79716203 80.05, 2, 61801570 72.49, 1.7, 73137148 68.09, 4.77, 31090763 81.55, 2.96, 7485600 68.6, 1.54, 141850000 78.09, 2.05, 307007000 ]; legen=[ 'CAN' 'DEU' 'DNK' 'EGY' 'GBR' 'IRN' 'IRQ' 'ISR' 'RUS' 'USA' ]; // // Just the data h=scf(); xtitle("","Life Expectancy", "Fertility Rate") bubblechart(data)
The previous script produces the following output.
The function has a special feature, which makes so that the bubbles appear as circular as possible. This might be difficult, especially in the general case where the X and Y scales are different, e.g. X is from 0 to 1 and Y is from 0 to 100. In this case, if we plot a circle, it appears as an extremely flatten ellipse.
To solve this issue, the function really prints ellipses, which semi-axes are computed so that the ellipse is as round as possible. This is computed depending on the minimum and maximum X and Y values. But this is not a completely satisfactory solution, given that the user may change the size of the graphics window, making the almost circular bubble look flat, again.
To solve this, we can disable the scaling and set the isoview mode to "on", as in the following script.
h=scf(); xtitle("","Life Expectancy", "Fertility Rate") bubblechart(data,[],[],[],%f) h.children.isoview="on";
This produces the following figure.
The previous picture is nice, but is quite large, because of the "isoview" option and the different in scales in X and Y. Moreover, the figure has no legend, which makes it difficult to identify the bubbles.
In the following script, we use datatips.
h=scf(); xtitle("","Life Expectancy", "Fertility Rate") bubblechart(data,legen,2)
This creates the following figure.
This is even more nice with the rainbow colormap.
h=scf(); h.color_map = rainbowcolormap(10); xtitle("","Life Expectancy", "Fertility Rate") bubblechart(data,legen,1:10)
It is also possible to create a separate legend.
h=scf(); h.color_map = rainbowcolormap(10); xtitle("","Life Expectancy", "Fertility Rate") bubblechart(data) legend(legen,"in_upper_right");
Although the rainbow colormap is nice, it does not convey any information. Instead, we can select the color depending on the world zone. This allows to print a 4 dimensionnal information into a 2D figure, which is quite interesting in itself.
h=scf(); xtitle("","Life Expectancy", "Fertility Rate") // 1 = CAN, USA (North America) // 2 = Europe (DEU, DNK, GBR, RUS) // 3 = Arab/Persian/Hebrew (EGY,IRN,IRQ,ISR) h.color_map = rainbowcolormap(3); fill=[ 1 2 2 3 2 3 3 3 2 1 ]; bubblechart(data,legen,fill)
We can also use negative fill argument, which prevents from drawing the contours.
h=scf(); xtitle("","Life Expectancy", "Fertility Rate") h.color_map = rainbowcolormap(3); bubblechart(data,legen,-fill)
It also possible to create a matrix-oriented bubble chart. The area of each bubble depends on a random number, chosen uniformly in [0,1].
n=5; R=rand(n,n); data=zeros(n^2,3); [X,Y]=meshgrid(1:n,1:n); data(:,1)=X(:); data(:,2)=Y(:); data(:,3)=R(:); h=scf(); xtitle("Random numbers [0,1]") h.color_map = rainbowcolormap(n^2); bubblechart(data,[],[],0.5)
The following script is a typical example, adapted from [5].
data =[ 14 12200 15 20 60000 23 18 24400 10 ]; h=scf(); xtitle("Industry Market Share Study","Number of products",... "Sales"); h.color_map = rainbowcolormap(3); legen=string(1:3); bubblechart(data,legen)
The following example is adapted from [6].
data =[ 10 10 100 5 5 75 8 5 65 3 2 60 5 3 50 1 2 35 ]; legen=[ "Smith" "West" "Miller" "Carlson" "Redmond" "Dillar" ]; h=scf(); xtitle("Salary Study","Years with Firm",... "% Salary Increase"); h.color_map = rainbowcolormap(6); bubblechart(data,legen)
The following figure was adapted from [2], with data from [3].
With the same principle, we can create a categorical bubble chart. Consider the dataset [7], "Weights of 1996 US Olympic Rowing Team." The first column gives the name of the rower, the second gives his event, and the third gives his weight. There are 8 different event categories, with weight given as numeric data. It is easy to use the "dsearch" function in Scilab to create a matrix containing the number of athletes in a weight category, for a specific event. From there, we can create a bubble chart representing the number of athletes in each weight category and for each event.
In order to create the X and Y axes, we remove the original axes and replace them with the categorical axes. To do this, we use the "drawaxis" function.
weightLabels=["<150" "150-175" "175-200" ">200"] eventLabels=[ "LW_double_sculls" "LW_four" "coxswain" "eight" "four" "pair" "quad" "single_sculls" ]; R=[ 0 2 0 0 0 4 0 0 1 0 0 0 0 0 4 4 0 0 1 3 0 0 1 1 0 0 0 4 0 0 0 1 ]; // Print the table disp([["";eventLabels],[weightLabels;string(R)]]) // Create a bubble chart for the data m=size(R,"c") n=size(R,"r") D=zeros(m*n,3); [X,Y]=meshgrid(1:m,1:n); D(:,1)=X(:); D(:,2)=Y(:); D(:,3)=max(R(:),%eps); h=scf(); xtitle("Area=Number of Athletes")//,"Weigth Cat.","Event") bubblechart(D,[],2,0.2) h.children.axes_visible=["off","off","off"]; drawaxis(x=0.8,y=1:n,dir="l",tics="v",val=eventLabels) drawaxis(x=1:m,y=0,dir="u",tics="v",val=weightLabels)
Epilogue
The function is now provided in Stixbox.
To install, just type:
atomsInstall("stixbox")
Appendix
The following script can be used to create the R matrix used in the 1996 US Olympinc Rowing team example above.
The data is available here :
The following script uses the "dsearch" function in order to compute the number of athletes in each weight and event category.
filename="Weights-1996-US-Olympic-Rowing-Team.csv" data=csvRead(filename,";",[],"string") // Get the number of events, the events events=data(:,2) eventLabels=unique(events) nEventsCategories=size(eventLabels,"*") // Get the weights weights=csvTextScan(data(:,3)) // Compute the categories, the number of cat. weightLabels=["<150" "150-175" "175-200" ">200"] weightCategories=[0 150 175 200 1000] nWeightCategories=size(weightLabels,"*") // For each event, count the number of athletes // falling into a weight cat. R=zeros(nEventsCategories,nWeightCategories) for i=1:nEventsCategories // The athletes in this event category j=find(events==eventLabels(i)); [ind, occ] = dsearch(weights(j),weightCategories); R(i,:)=occ; end
We can then print the data that we have computed in the form of a table.
-->disp([["";eventLabels],[weightLabels;string(R)]]) ! <150 150-175 175-200 >200 ! ! ! !LW_double_sculls 0 2 0 0 ! ! ! !LW_four 0 4 0 0 ! ! ! !coxswain 1 0 0 0 ! ! ! !eight 0 0 4 4 ! ! ! !four 0 0 1 3 ! ! ! !pair 0 0 1 1 ! ! ! !quad 0 0 0 4 ! ! ! !single_sculls 0 0 0 1 !
References
[2] http://www.bubblechartpro.com/bubble-chart-of-americas-10-richest-colleges/.
[3] http://finance.yahoo.com/news/the-10-richest-colleges-in-america.html
[4] https://developers.google.com/chart/interactive/docs/gallery/bubblechart
[5] http://office.microsoft.com/en-us/excel-help/creating-a-bubble-chart-HA001117076.aspx
[6] http://www.techrepublic.com/blog/msoffice/add-data-labels-to-your-excel-bubble-charts/513
Author: Michaël Baudin, 2013