Nils Pedersen › 2020

Blog Archives

Latest Posts

Monthly

Archives for September 2020

September 12, 2020 - Comments Off on Politics, Elections and Geography

Politics, Elections and Geography

19^th February 2020

Coursework 1: Politics, Elections and Geography

This report will describe an investigation into the space of visually communicating political election results. It will focus on how, by incorporating geographical data, the presentation of election outcomes is sometimes distorted. These misrepresentations may be either unintended or deliberate. For simplicity and brevity, the scope of the investigation will focus on the UK and US electoral systems.

The UK and US electoral systems generally follow the first past the post system – also more accurately described as a single-member plurality (Gallagher and Mitchell 18). In a first past the post system, a voter selects one of the candidates listed on the ballot. The person with the plurality of votes wins, and all other votes are disregarded (Govt of NZ). Both the UK Parliament and the US House of Representatives following a similar system. Each nation is divided into geographical areas known as parliamentary constituencies or congressional districts. The UK has 650 constituencies (‘UK Constituencies’), and the US has 450 districts (‘US Districts’). Both the UK and the US have two-party systems, and as Duverger postulated, first past the post electoral systems tend to encourage a two-party system (Grofman et al. 4). The main parties in the UK are the Labour and Conservative parties and, in the US, the Republican and Democrat parties. However, regional differences in the UK also tend to reflect additional party representation (Blumenau and Hix). One important difference is the election of the head of government. The UK Prime Minister is typically the leader of the party in government (‘PM of the UK’). The US President is chosen by the members of the Electoral College. Each State chooses a number of Electors, equal to States two Senators and the allotted Representatives for that State (National Archives). Therefore, there are 538 Electors, including three allocated to the District of Columbia) Typically, the Electors are chosen, based on the state-wide first past the post system on the Presidential election day. Maine and Nebraska are the only two exceptions as both states allocate Electors based on congressional districts (‘US Electoral College’). Thus, voters vote for a Presidential candidate on the ballot paper, are in fact voting for an Elector, who in turn votes for the winning candidate for President in their State (except for Maine and Nebraska). So, conceptually, we have similar data spaces between both the UK and US electoral systems - around 500 representative choices with 2-8 possible results, and one overall outcome. The next section will discuss visualisation examples that have used to illustrate that data space.

Fig 1: US 2012 Election Results

As a starting point, a Google image search was conducted of the last two elections for the UK and US. The string for the search was as depicted in the figure titles. The first ~15 results for each search are shown in Fig 1 through Fig 4.

Fig 2: US 2016 Election Results

Fig 3: UK 2017 Election Results

Fig 4: UK 2019 Election Results

The majority of the images of the US is that of a choropleth^[1] – where each state is coloured according to whether the Democrats or Republican prevailed. The issue with that representation is that the area of a state is not proportionate to the number of electoral college votes allocated to that state. For example, Montana is 2000 times the area of the District of Columbia, but both have the same number of electoral votes. Only one image, a cartogram^[2] in Fig 1, attempted to graphically show the relative number of electoral college votes that each candidate received. Choropleths seem optimised for display theme information. (Rittschof and Kulhavy 37). Perhaps the main theme of such maps is to show which way an individual state or area voted. That said, by viewing the choropleths in Fig 1, it would be reasonable for a reader to misinterpret the outcome of the 2012 election - to believe Romney won and not Obama.

Interestingly the UK election results shown in Fig 3 and Fig 4 are more likely to adopt a cartogram approach. Perhaps since UK constituencies are more numerous, it would be hard to find one from 650 in such a representation? Although the approach seems to reflect the overall parliamentary distribution better, it still seems hard to determine the governmental outcome without additional information. The issue of effectively conveying election results has been discussed in several articles. The next section will discuss a few remedies that that be proposed to address the issue.

One critique of the choropleth maps is that “land doesn’t vote, people do”. On October 1^st 2019, Trump tweeted the following:

Fig 5: Trump October 2019 Tweet

The implication being that most of the country voted for Trump in 2016. This was incorrect (CNN). Other representations have been proposed, which more accurately reflect voter participation (Knudsen) - see Fig 6. This map shows a circle, sized on county population and coloured relative to candidate choice. However, this presentation also suffers from incorrectly representing the electoral outcome - electoral college votes are apportioned by population (one per district) as well as by area (two per state). Perhaps a better representation can be found?

Fig 6: 2016 Voter by County

The Financial Times has undertaken the “search for a better US election map”. They argued that although the choropleth has become the default representation of US elections, it is arguably a deeply flawed map. (Pearson et al.). Their final compromise was a proportional symbol map, where the number of State allocated electoral college votes was coloured for each candidate, as shown below.

A picture containing text

Description automatically generated

Fig 7: 2012 Electoral Vote by State

It seems to facilitate the task of determining which candidate prevailed in a State as well as giving a better overview of the nationwide electoral college vote.

In conclusion, in order to visually describe an electoral outcome more effectively, it is important to understand the electoral system and how the winner is determined. It’s not necessarily just about one person one vote. In the US the electoral college votes for the president and those members are allocated based on population and state boundaries. In terms of visual presentation, while the UK seems to be moving towards a cartographic presentation of electoral results, there seems to be no such trend in the US. The US seems to favour a choroplethic presentation.

Fig 8: Financial Times 2016 Results combined with Trump Tweet

Citations

Blumenau, Jack, and Simon Hix. ‘Britain’s Evolving Multi-Party System(s)’. British Politics and Policy at LSE, 31 Mar. 2015, https://blogs.lse.ac.uk/politicsandpolicy/britains-evolving-multi-party-systems/.

CNN. Trump’s Impeachment Tweet of a 2016 Election Map Is Inaccurate. Here’s Why - CNN Video. edition.cnn.com, https://www.cnn.com/videos/politics/2019/10/02/trump-impeachment-2016-election-map-tweet-inaccurate-philip-bump-ctn-vpx.cnn. Accessed 18 Feb. 2020.

Gallagher, Michael, and Paul Mitchell. The Politics of Electoral Systems. OUP Oxford, 2005.

Govt of NZ, The Department of Internal. More about FPP. https://www.dia.govt.nz/diawebsite.nsf/wpg_URL/Resource-material-STV-Information-More-about-FPP?OpenDocument. Accessed 18 Feb. 2020.

Grofman, Bernard, et al. Electoral Laws and Their Political Consequences. Algora Publishing, 1986.

Knudsen, Nick. ‘Land Doesn’t Vote, People Do: This Electoral Map Tells the Real Story’. DemCast, 11 Nov. 2019, https://demcastusa.com/2019/11/11/land-doesnt-vote-people-do-this-electoral-map-tells-the-real-story/.

National Archives. ‘Legal Provisions Relevant to the Electoral College Process’. National Archives, 5 Sept. 2019, https://www.archives.gov/electoral-college/provisions.

Pearson, Tom, et al. The Search for a Better US Election Map. 8 Nov. 2016, https://www.ft.com/content/3685bf9e-a4cc-11e6-8b69-02899e8bd9d1.

‘PM of the UK’. Wikipedia, 15 Feb. 2020. Wikipedia, https://en.wikipedia.org/w/index.php?title=Prime_Minister_of_the_United_Kingdom&oldid=940935444.

Rittschof, Kent A., and Raymond W. Kulhavy. ‘Learning and Remembering from Thematic Maps of Familiar Regions’. Educational Technology Research and Development, vol. 46, no. 1, Mar. 1998, pp. 19–38. DOI.org (Crossref), doi:10.1007/BF02299827.

‘UK Constituencies’. Wikipedia, 16 Feb. 2020. Wikipedia, https://en.wikipedia.org/w/index.php?title=United_Kingdom_Parliament_constituencies&oldid=941136034.

‘US Districts’. Wikipedia, 10 Dec. 2019. Wikipedia, https://en.wikipedia.org/w/index.php?title=List_of_United_States_congressional_districts&oldid=930070735.

‘US Electoral College’. Wikipedia, 18 Feb. 2020. Wikipedia, https://en.wikipedia.org/w/index.php?title=United_States_Electoral_College&oldid=941391754.

Choropleth maps are based on data properties applied to a defined area, for example, a US State. ↑
A cartogram is a map in which the data property – such as the number of electoral college votes – is substituted for a defined area or distance. ↑

September 12, 2020 - Comments Off on COVID-19 Cases and Air Quality

COVID-19 Cases and Air Quality

March 26, 2019

COVID-19 Cases and Air Quality

During the initial influx of information about the Coronavirus outbreak, I became interested in the hypothesis that the COVID-19 pandemic, while detrimental to health, was indirectly improving other health outcomes due to the reduction in air pollution (Kimbrough, 2020).

Figure 1

Pollutant Drops in Wuhan, from https://edition.cnn.com/2020/03/16/asia/china-pollution-coronavirus-hnk-intl/index.html

Figure 1 shows the difference in nitrogen dioxide readings in Wuhan, China, between 2019 and 2020. The lockdown in Wuhan due to the coronavirus disease 2019^[1] (COVID-19) outbreak occurred in January 2020 (BBC, 2020). As people were required to remain in their homes, the lockdown had the effect of reducing human economic activity. Pollution generated by that activity would also be reduced (ESA, 2020). Others hypothesised that there might be a corresponding reduction in deaths due to the decrease in pollution (Burke, 2020). It was heartening to read that although some people were suffering from COVID-19, others may be spared because of the reduction in air pollution. However, as of March 2020^[2], most visualisations seemed to compare 2020 air pollution data to previous year’s data. There did not seem to be a direct visualisation of the relationship of air pollution to the progression of the COVID-19 outbreak. This paper aims to present such a visualisation.

Week 9 lab session, Creating Visualisations with Software, will be used as a framework to facilitate designing and developing the visualisation. Before designing the visualisation, data must be obtained and formatted into the appropriate format for the software.

Data

The initial inspiration for this paper’s visualisation was provided by Burke, as shown in Figure 2 (Burke, 2020). The first design idea would be to overlay the local COVID-19 case data on top of the air quality data to see when the reduction in air pollution occurred as COVID-19 cases increased. The air quality data was obtained from the AirNow website (AirNow, n.d.). Chengdu was chosen as it was the closest city, on the AirNow website, to Wuhan, China. The datasets readings were hourly, and there was one CSV file for each year. These were combined into one data frame, and the hourly readings were averaged [a process frowned upon by some (aqicn.org, n.d.)] to daily ones. The final data frame was exported as a CSV file “Chengdu.csv”.

Figure 2

PM_2.5^[3] concentrations in Chengdu in Jan-Feb 2016-2019 (red lines) vs the same period in 2020 (blue lines) (Burke, 2020).

Figure 3

AQI in Chengdu in Jan-Feb 2016-2019 (red lines) vs the same period in 2020 (blue lines) vs COVID-19 cases (black line).

The COVID-19 data was obtained on GitHub. The GitHub COVID-19 data was at the provincial level for China. However, the AirNow air quality was at the city level. For the purpose of providing a dataset for this coursework, as Chengdu is the capital of Sichuan, it seemed reasonable to extract the Sichuan COVID-19 data and combine with the Chengdu air quality data. Figure 3 shows a test ggplot2 (Wickham, 2016) chart prior to exporting the data set for visualising in third party tools.

Visualisation

The data was exported to Datawrapper. It was realised that creating a graph with two different Y axes, as in Figure 3, was not supported in Datawrapper. In fact, there was a blog post arguing against the practice (Rost, 2018). Figure 4 shows the chart of the data created with Datawrapper.

Figure 4

Datawrapper chart

Another tool investigated in the week 9 lab was RAWGraphs. After the data importation step, the next page displayed the different charting options available. Since none seemed to be a good match for the visualisation requirements (dual Y-axis line chart), this approach was also abandoned.

Since neither of the tools met the requirements, it was decided to return to ggplot2. It was determined that ggplot2 did support separate Y axes (Holtz, 2018). The bbplot package (BBC, 2019) seemed to be an effective way of developing a modern look for the line chart. The colour palette chosen was Snorkel from Pantone (Pantone, 2020). Two charts were designed — one with dual Y axes, the other with two separate plots. Appendix 1 shows the data manipulation and the code used to generate the charts.

Figure 5

Option 1 – ggplot2 with dual Y axes

Figure 6

Option 2 - ggplot2 in two separate charts

Unfortunately, the author has been unable to thoroughly evaluate the two different options with prospective consumers of the charts. However, in option 1, the Y-axis label colours were changed to reduce ambiguity. In option 2, the AQI Y-axis limit was reduced to enable the line chart to display more centrally and to highlight the perceived slope relative to the lockdown date.

In terms of the communication the data, both visualisations described in Figure 5 and Figure 6, the shows a clear relationship between when the lockdown occurred in Chengdu and the effect it had on cumulative COVID-19 cases. Surprisingly, it was not apparent that there was a similar effect for the AQI. Perhaps AQI is a lagging indicator, and more recent datasets might show an effect? Maybe the model is too simplistic? There are additional determinants of air quality, like weather conditions (aqicn.org, n.d.)?

Conclusions

When using any tool to facilitate data visualisation, the user is constrained the functionality provided by that tool, and the user must manipulate the data for consumption by that tool. Typically, higher-level tools seem to prioritise a specific set of use cases. If the visualisation use case is not supported, then an alternative tool or design must be chosen. As pointed out in the IM921 lectures, even low-level tools like ggplot2 have constraints. For example, in Figure 5, it may have been clearer to display the month labels between the axis ticks. However, that feature is not specifically supported in ggplot2 (Add option for range ticks (tick labels between tick marks) · Issue #1966 · tidyverse/ggplot2, 2017).

Surprisingly, the charts showed that air quality in Chengdu did not seem to be influenced by the COVID-19 lockdown. Until one realises that the Wuhan lockdown did not occur until January 23^rd (Wikipedia, 2020), this finding seemed to contradict other visualisations (Figure 1) that displayed a dramatic reduction in pollution due to the COVID-19. Further investigation would be needed to determine what proportion of the change in air quality in Chengdu was due to the 2020 COVID-19 outbreak as compared to other determinants.

References

Add option for range ticks (tick labels between tick marks) · Issue #1966 · tidyverse/ggplot2. (2017). GitHub. https://github.com/tidyverse/ggplot2/issues/1966

AirNow. (n.d.). Retrieved April 8, 2020, from https://airnow.gov/index.cfm?action=airnow.global_summary#China$Chengdu

aqicn.org. (n.d.). Beginner’s Guide to Air Quality Instant-Cast and Now-Cast. The World Air Quality Index. Retrieved April 8, 2020, from https://aqicn.org/search/vn/

BBC. (2019, February 1). How the BBC Visual and Data Journalism team works with graphics in R. Medium. https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535

BBC. (2020, January 23). Lockdowns rise as China tries to control virus. BBC News. https://www.bbc.com/news/world-asia-china-51217455

Burke, M. (2020). COVID-19 reduces economic activity, which reduces pollution, which saves lives. http://www.g-feed.com/2020/03/covid-19-reduces-economic-activity.html

ESA. (2020). Coronavirus lockdown leading to drop in pollution across Europe. https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-5P/Coronavirus_lockdown_leading_to_drop_in_pollution_across_Europe

Holtz, Y. (2018). Dual Y axis with R and ggplot2. https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2.html

Kimbrough, L. (2020). Response to one pandemic, COVID-19, has helped ease another: Air pollution. Response to One Pandemic, COVID-19, Has Helped Ease Another: Air Pollution. https://news.mongabay.com/2020/03/response-to-one-pandemic-covid-19-has-helped-ease-another-air-pollution/

Pantone. (2020). Pantone Color of the Year 2020 Palette Exploration | PANTONE 19-4052 Classic Blue | Pantone UK. https://store.pantone.com/uk/en/color-of-the-year-2020-palette-exploration

Rost, L. (2018). Why not to use two axes, and what to use instead | Chartable. https://blog.datawrapper.de/dualaxis/

Wickham, H. (2016). Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org/

Wikipedia. (2020). Hubei lockdowns. In Wikipedia. https://en.wikipedia.org/w/index.php?title=2020_Hubei_lockdowns&oldid=949385366

COVID-19 is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). ↑
As this is a fast-developing topic, coupled with the 2-week submission delay, no material newer than the document date will be referenced. ↑
Particulate matter (PM) is a term used to describe the mixture of solid particles and liquid droplets in the air. PM_2.5` is particulate matter diameter less than 2.5 micrometres (µm). ↑

September 12, 2020 - Comments Off on Australia Fires

Australia Fires

IM921 Visualisation

4^th March 2020

Coursework 2: Graphical Report

A screenshot of a cell phone

Description automatically generated Fig 1. Australia Fires December 2019 ^[1]

This report will describe how to visually communicate recent Australian wildfire data made available by the NASA Fire Information Management System (FIRMS).

One of the goals of the project was to provide a more temporal, or chronological, presentation of the data. Since a dynamic presentation or animation was not permissible, another method to facilitate the presentation of chronological information needed to be developed. A spatial presentation seemed to offer many benefits as the data was comprised of spatial information. One of the issues, however, was that there were over 90,000 observations. What would be the best method to present that data? Small multiples (Park and Quealy) seemed the obvious solution. It would enable the chronological presentation of the hotspots data. Perhaps some trends would become apparent? An algorithm (see Appendix 1) was developed to plot the daily hotspot data for the month of December. Once the initial prototype had been completed a number of issues were apparent, there was still too much data being plotted to overwhelm the map, and the map was being cropped somehow (Tasmania was not being displayed).

It was hypothesised that the bitmap nature of the map was impacting the ability of the rendering engine to correctly display the map. A method (Logan) was found to covert the bitmap image a vector. Once this was accomplished, the map was displaying correctly. To further reduce the amount of data, it was decided to filter the data using a 95% confidence value as provided in the original demo file. This reduced the number of data points to about 30,000 – so, on average, there would be approximately 1000 data points per map. Even though the data had been reduced to 1000 points per map, there was still a lot of duplicates given the size of the maps. The ideal solution would have been to conduct a spatial point pattern analysis (Gimond). An example is shown in Fig 2. Various methods were explored using spatstat() (Baddeley 171), but the author was unsuccessful in integrating a map of Australia as the ‘observation window’ – to enable data to be plotted over the map. As a workaround, it was decided to investigate changing the shape of the plot point and its transparency. Various options were were explored and this comprise solution is shown in Fig 1.

Fig 2. Spatial Point Anaylsis, from: https://rspatial.org/raster/analysis/8-pointpat.html

It was decided to add a bar chart at the bottom to augment the ability to perceive changes in the number or density of hotspots. The bar chart x-axis was labelled with dates corresponding to those on the individual maps. A label over the maximum hotspot days was preferred over a y-axis scale due to space and alignment issues.

In conclusion, two methods of presenting the data were investigated. Both succeeded in separating the data into more manageable chunks which enabled better comprehension. The map view enabled viewers to see where the hot spots were occurring, and the bar chart enabled viewers to see when and how many. For example, a viewer could clearly see that there were two hotspot flare-ups towards the end of December which would not be visible in a single map view. It would have been interesting to investigate the Spatial Point Analysis further. The author believes that using such a technique would have resulted in a better overall presentation, given the size of the 31 maps.

Citations

Baddeley, Adrian. Analysing Spatial Point Patterns in R. 2008, p. 171.

FIRMS. https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/. Accessed 2 Mar. 2020.

Gimond, Manuel. Point Pattern Analysis in R | Intro to GIS and Spatial Analysis. mgimond.github.io, https://mgimond.github.io/Spatial/point-pattern-analysis-in-r.html. Accessed 3 Mar. 2020.

Logan, Murray. Tutorial 5.4 - Mapping and Spatial Analyses in R. https://www.flutterbys.com.au/stats/tut/tut5.4.html. Accessed 3 Mar. 2020.

Park, Haeyoun, and Kevin Quealy. Drought’s Footprint. https://archive.nytimes.com/www.nytimes.com/interactive/2012/07/20/us/drought-footprint.html?_r=0&smid=pl-share. Accessed 3 Mar. 2020.

The authour acknowledges the use of data and imagery from LANCE FIRMS operated by NASA's Earth Science Data and Information System (ESDIS) with funding provided by NASA Headquarters. ↑