Building a scatter-plot with d3.js

Sabahat Iqbal
8 min readMay 29, 2020

Recently, the Government of Pakistan published a handy dataset of poverty and development indicators ready to download as an Excel file. I used it to practice my d3 skills and built a simple, interactive scatter-plot.

This blog outlines the steps to build a static version of this data visualization. It assumes a basic knowledge of HTML, CSS, JS and the ability to run a simple local server. I use Python’s http.server but feel free to use whatever you are most comfortable with. In a future blog, I will explain how to make this interactive.

Feel free to download the original dataset and clean it for use or use the already cleaned up dataset from my Github repo. Either way, make sure the final output is a CSV file.

In your favorite IDE (I use VS Code), create a new folder for the project and the following folder sub-structure:

Project Folder
--> data folder
--> js folder
--> index.html file

Add the d3 library in one of three ways:

  1. After downloading the library, add the d3.min.js file into js folder, and then add the following script tag in the .html file before ever referencing your own js files: <script src=”js/d3.min.js”></script> . Maintaining this order of js files is important.
  2. Use npm ; run npm i d3 command in the project folder. This triggers a process that downloads and installs the library within the project folder structure.
  3. Don’t download anything; instead access the library over the internet. Insert this within the <head></head> tags in .html file: <script src=”https://d3js.org/d3.v5.min.js"></script> .

Now, we want to load in the dataset and set up the canvas on which the chart will be “drawn”.

There are three steps to setting up the canvas:

  • Add <div id=”my_dataviz”></div> to your .html file.
  • Define the dimensions (height, width) of the chart itself, which will be smaller than the canvas (defined below) in which it sits. Pay attention to that difference as it can save hours of trying to debug why a chart is falling off the “edge” of your canvas.
// set the dimensions and margins of the graph
let margin = { top: 10, right: 30, bottom: 50, left: 60 },
width = 460 - margin.left - margin.right,
height = 400 - margin.top - margin.bottom;
  • The foundational building block of all d3 visualizations is the SVG element also called the canvas. So we want to “grab” the <div> we created above, add an svg element to it and then set the width, height, and position of it. Then we declare a new variable, in this case — somewhat confusingly — svg, and assign this selection to it so we can pass it around in later code:
let svg = d3.select("#my_dataviz")
.append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");

The g element keeps everything to do with the svg together and so it can be moved around as one.

All the code does up to this point is create basic HTML tags, so if you stopped here and inspected the code in the browser under Elements, you would see:

<body>
<div id="my_dataviz">
<svg width = "460" height = "400">
<g transform="translate(60,10)">
</g>
</svg>
</div>
</body>

Within those <g></g> tags is where the chart will be added in.

As elements are added to the HTML, we will set attributes for them. These attributes will define color, position, size etc of various elements like circle, rectangle etc. This is how we control the relationship between dataset and elements. To give a crude example, if a data point is 10, then we could set the length of rectangle to 10 and we have the beginning of a bar chart.

Also, it is important to note that the default position of any new elements is the top-left corner of the SVG element. The x-y plane for the SVG looks like this:

So if we want an element to be positioned lower down or to the right, we have to use attributes to position it correctly.

Next, save the CSV file(s) to the data folder. Then load it in using d3.csv()and format the columns appropriately. This is also where we will invoke the function (created below) that takes in the formatted dataset as an argument and returns the chart. Apart from the method to load the dataset in, the remaining code to format data is plain JS. So I will skip over this but my code can be viewed here.

Finally, we get into the guts of d3. We build a function, updateChart, that takes in a dataset and uses it to build all the parts of the scatter-plot: the axes and the points on the graph. This function is invoked with the dataset we loaded in above. Taking this step-by-step:

  • First, I had to manipulate the data because I wanted each record in my dataset to use Year as the identifier. So if you looked at the data coming into this function, it would look like this:

But after further manipulation, it would look like this, with a key for each Year so I can select data based on Year more easily:

The d3 method responsible for this is d3.nest() and the code (and beginning of the function)looks like this:

function updateChart(someData) {
let dataAdultLit = d3.nest()
.key(function (d) {return d["Year"];})
.entries(someData[0]);
//more code}

For now, I am focusing on the Adult Literacy and Poverty indicators.

  • Next, we build the axes starting with the x-axis. The most important part of building the axes is determining how to scale the real data points (from your dataset) to pixels on the screen. Obviously, if the data points range from 0 to 100,000, we cannot simply plot points from 0 pixels to 100,000 pixels. So first, we build a scale:
// Add X axis
let x = d3.scaleLinear().domain([0, 100]).range([0, width]);
  • In this case, a linear scale, d3.scaleLinear() ,takes in the min and max values of our dataset, .domain([0, 100]) , and maps them to pixel values on the screen, .range([0, width]) .
  • Next, we attach an axis to the chart. In order to do this, we have to first attach a g element to the svg variable previously created, move that g element to the right spot (0 pixels across and height pixels down)in the chart and then call the axis that we want, d3.axisBottom , with the scale we just created as an argument. d3.axisBottom simply notes that we want the tick marks to be underneath the axis line. Moving g to the right spot ensure that it is in the “traditional” x-axis location. If we left out the transform, translate code, d3 would add the axis in the default position that it adds every element, i.e. the top-left corner of the canvas. Comment out that line and see where the x-axis ends up.
svg
.append("g")
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(x));
  • Next, the x-axis has to have a label. This is a simple, but tedious, process: add a text element to svg variable, move it to the right position (this might take some tweaking), set the alignment, and give the text a value to display on the page. The text-anchor attribute sets the alignment of the text.
//Add x-axis label:
svg
.append("text")
.attr("transform","translate(" + width / 2 + " ," + (height + margin.top + 30) + ")")
.style("text-anchor", "middle")
.text("Adult Literacy");
  • The same process is followed for y-axis except that we don’t have to move it. It is already dropped into the right position:
// Add Y axis
let y = d3.scaleLinear().domain([0, 100]).range([height, 0]);
svg.append("g").call(d3.axisLeft(y));
//Add y-axis label:
svg
.append("text")
.attr("transform", "rotate(-90)")
.attr("y", -40)
.attr("x", 0 - height / 2)
.style("text-anchor", "middle")
.text("Poverty Rate");
  • Be careful when setting the range value for y-axis; it seems to have been set backwards but remember that the starting point for d3 is the top-left corner. So as the data point increases from 0 to 100, if we were to map that, in pixels to .range([0, height]) , we would end up with points plotted from top-left down instead of what we want which is bottom left up. Go ahead and switch the values within range and see what happens to the y-axis.
  • Another important point: once the y-axis label has been rotated 90 degrees, the x and y attributes are reversed. So if you want to move the label up or down, instead of updating the x attribute, you have to update y attribute. And vice versa for moving the label left or right.
  • The last scale we have to build allows us to distinguish between categories within the data. So in our data, we have indicators for 4 separate provinces and it would be great to be able to know which data points belong to which province. So we create a scale that takes in the Province value and returns a color. I picked the colors I liked from https://coolors.co/.
// Color scale: give me a province name, I return a color
let color = d3
.scaleOrdinal()
.domain([
"Balochistan",
"Federal Capital Territory",
"Khyber Pakhtunkhwa",
"Punjab",
"Sindh",
])
.range(
["#440154ff", "#21908dff", "#fde725ff", "#129490","#CE1483"]);
  • Now we can join the dataset to SVG elements, in this case, circle.
// JOIN data to elements.
let circles = svg
.selectAll("circle")
.data(filteredData, function(d){
return d["District"];
});
  • This returns the enter selection captured by a variable called circles. It’s a bit difficult to understand but for now, think of one “imaginary” circle element for each value returned from the data() method. So for each District in the the dataset, a circle element was created and all those circles make up the enter selection called circles. Now, we will do something with all those circles.
// ENTER new elements present in new data.
circles
.enter()
.append("circle")
.attr("fill", function (d) {
return color(d["Province"]);
})
.attr("cy", function (d) {
return y(d["Poverty Rate (%)"]);
})
.attr("cx", function (d) {
return x(d["Adult literacy, 25 or more years old (% of population aged 25 or more)"]);
})
.attr("r", 5);
  • In the code above, we are grabbing the circles we created above, make the imaginary circle real by appending a circle and then set a bunch of attributes for each circle. The color is set with the fill attribute which uses a callback function that takes in a Province name and returns a color. The position of each circle is set by the cx and cy values which also use callback functions that take in a dataset value (Poverty Rate for y-axis position, Adult Literacy for x-axis position)run that through our scales that we created previously and returns a pixel value within the chart. Finally the radius of each circle is set to a constant of 5 pixels.

That is it for a static scatter-plot. I hope you found it useful. In a future blog, I will explain how to upgrade this into an interactive scatter-plot like the one I created here.

See part 2 and 3 of this blog series.

--

--