Batman By-Election 2018

It’s on in Batman. And the result might well depend on what happens north of the Hipster-proof Fence, a term coined (by my wife) to help describe the voting patterns that flipped in the vicinity of Bell St.

With David Feeney resigning from Federal Parliament due to unresolved issues regarding his citizenship, a by-election for the federal seat of Batman will be held. Batman was an interesting race in 2016, with the ALP narrowly beating the Greens. But with the Greens winning a recent state by-election in Northcote, which covers the southern half of the Batman electorate (south of the Hipster-proof Fence), the 2018 by-election promises to be even more interesting.

One feature of the 2016 federal election was the north-south gradient in votes, both in terms of the 2 candidate-preferred vote, and the swing from the 2013 election. In both cases, the ALP did much better north of the Hipster-proof Fence. Indeed, the ALP had swings toward it in some of the northern-most booths. If the ALP had suffered the same swings north of Bell St as they did further south, the Greens would have won comfortably in 2016.

The result of the Northcote 2017 state by-election closely matched the outcome of the 2016 federal election if one examines the outcomes at individual booths. The consistent swing to Greens in 2017 simply mirrored what had occurred a year before. This is seen in both the 2-candidate-preferred vote, and the swing from the previous election.


Two-candidate preferred vote to the ALP in each booth for the 2016 federal election in the seat of Batman and the 2017 Northcote by-election as a function of latitude. The 2017 result matches that seen in 2016.


Swing to the ALP in each booth for the 2016 federal election in the seat of Batman and the 2017 Northcote by-election as a function of latitude. The 2017 result matches that seen in 2016, with a solid win to the Greens in the south.

While the swings in 2017 and 2016 were quite similar for corresponding booths (above), you might notice that the three northern-most booths in the 2107 state by-election had larger swings away from the ALP than the same booths in 2016 federal election. That will make the ALP nervous, and the Greens hopeful.

While both parties will aim to sway voters in the south, the outcome of the 2018 federal by-election most likely hinges on voting patterns north of Bell St. If the Greens can win back northern voters who apparently turned away from them in 2016 while retaining voters in the south, the Greens might be one of the few winners out of the citizenship saga that has engulf federal parliament.

Interesting times!

Posted in Communication, Uncategorized | Tagged , , , , , , , , , | 1 Comment

When does research help environmental management?

Think of the case where a manager needs to decide which action to take to stop a species  declining, or to eradicate a pest, or to increase sustainable harvest levels. It is rare in environmental management to know, with certainty, which action to take.

In response to such uncertainty, a scientist might recommend that the manager should trial different management actions, and use the results of that trial to decide on the best course of action. Such trials can certainly improve subsequent management.

But research costs money – money that might have been better put toward management. Further, even trialling two options means that, almost inevitably, one of the trialled actions will be inferior to the other. So opportunity costs are likely to exist in almost any trial, even if the research itself were cheap.

The trade-off between learning and doing lies at the heart of adaptive management. My recent paper led by Alana Moore addresses this trade-off, using the simplest formulation of the problem that we could muster. In that case we only considered resolving a choice between two management options. Our hope was to gain greater insight into the question of the circumstances in which research assists environmental management.

The answers surprised us in several instances. One surprise was the threshold behaviour that existed in many parameters. For example, as the expected difference in performance of the two management options increases, the optimal effort to spend on experimentation increases, but only up to a point. Once the threshold difference in performance is sufficiently large, the optimal level of experimentation declines to zero.

This threshold makes some intuitive sense; once we are relatively sure of the difference in performance, then we shouldn’t bother with an experiment to evaluate that. However, prior to reaching that threshold, the optimal effort to spend on the experiment increases with the expected difference in performance; that is somewhat counter intuitive. Other thresholds also exist.

Another surprise is that circumstances in which the investment in management trials is greatest do not necessarily correspond to the circumstances in which the benefit of trials is the greatest. Cases exist when relatively modest investment in trials can lead to large expected management gains. And counter-cases exist in which large investments in trialling options is the best thing to do, but the benefits of those trials are quite small.

I love this sort of modelling – simple models leading to somewhat counter-intuitive insights. You can read about them more in the paper, or in a previous blog post I wrote regarding a talk I did on this topic.

The paper is:

Moore, A. L., Walker, L., Runge, M. C., McDonald-Madden, E. and McCarthy, M. A. (2017). Two-step Adaptive Management for choosing between two management actions. Ecological Applications. doi:10.1002/eap.1515

Posted in Ecological models, Uncategorized | Tagged , , , , , , , , , , , | 1 Comment

Swinging on the Hipster-proof Fence

The “Hipster-proof Fence” is an evocative name for Bell St, which tended to divide booths in Batman that were won by the ALP in the 2016 federal election from those won by the Greens. A similar pattern was seen in neighbouring Wills electorate.

While I like the name “Hipster-proof Fence”, “The Tofu Curtain” is probably more accurate because Bell St is not a sharp barrier to the voting trend; the two-candidate-preferred vote trends across the entire north-south gradient. Bell St just happens to be where the vote approximately flips from one party to the other.

However, the geographic gradient in the swings is quite different between Wills and Batman. In Batman, Bhathal actually had some swings against her in the northern booths. In Batman, Bell St approximates the location where the swings change.


Swings to Feeney (ALP) for each booth in Batman. Negative swings represent swings to Bhathal (Greens). Booths are colour coded by the party that won the booth. The black line shows the latitude of Bell St at Merri Creek.

Bhathal had consistently strong swings in booths south of Bell St. Swings were much more variable north of Bell St, with strong swings to her at some booths, and strong swings away at others.

In contrast, Ratnam had consistently strong swings across the entire electorate of Wills. Her smallest swing occurred in the far north of the electorate, but so did her largest swing.


Swings to Khalil (ALP) for each booth in Wills. Negative values represent swings to Ratnam (Greens). Booths are colour-coded by the party that won the booth. The black line shows the latitude of Bell St at Merri Creek.

If Bhathal had extended her SoBe (South of Bell St) swing to NoBe, she would have won Batman. In contrast, Ratnam achieved large and consistent swings throughout Wills, but was simply coming from too far behind to win.


Posted in Communication | Tagged , , , , , , , , | 2 Comments

Simple Adaptive Management

This post gives some details of my speed talk at the SCB Oceania conference, which is in room P9 on Thursday 7 July at 11:50 as part of a session on conservation planning and adaptive management. We have submitted this work to Ecological Applications – a copy is available here, so please add to the peer review by giving us comments.

Every natural resource management agency seems to do (or at least claims to do) adaptive management, which seeks to use management and monitoring to learn about the system being managed, thereby improving future management. It is sometimes referred to as “learning by doing”.

Active adaptive management seeks to explicitly design management like an experiment, and entails extra costs. The experimental design and monitoring requires more resources than simply just managing the system. Further, if two management strategies are implemented at the same time, then inevitably one of them will be inferior.

The benefits of improved management in future can be weighed against these extra costs. And the optimal balance between these costs and benefits can be determined, thereby optimizing the design of adaptive management programs to maximize performance.

In my opinion, attempts to optimize adaptive management programs have been overwhelmingly disappointing. Firstly, the optimizations seem to only work on relatively small problems (but see Nicol and Chadès 2012). Secondly, each published optimization is different in fundamental ways from others, making it difficult to derive generalities across studies. And perhaps most depressingly, the benefits of optimizing adaptive management seem small – the optimizations typically only increase expected performance by a few percent at most.

The apparently minor benefits of adaptive management make me worry that science might be impotent; we go to all that effort to optimize the design, yet only get tiny improvements. Surely science is better than that!

So with Moore and a few other colleagues, we decided to examine optimal adaptive management to ask a few fundamental questions:

When is adaptive management most useful?
What drives optimal experimentation?
How much should be invested in experimentation?
How big are the benefits of adaptive management

To answer these questions, we set up the simplest possible adaptive management problem. We considertwo possible management options and two time steps. The first time step allows for possible experimentation, and the apparently best option is applied exclusively in the second time step.

Each option has an expected level of performance (which was uncertain), and we need to determine how much effort to expend on each option in the first time step. Each unit of management effort in the first time step is monitored so that its performance can be assessed. Each option has a per unit cost of implementation and a per unit cost of monitoring.

The monitoring data will be uncertain, so we will be more certain about the performance of each option as investment in each option in the first time period effort increases. However, increasing the level of effort allocated to each option in the first time period will decrease the resources available to spend in the second time step. Thus, when we invest more in the first time period, we can more reliably choose between options in the second time period, but we will have fewer resources to spend on the apparently best option. We face a trade-off!

While this formulation of adaptive management is as simple as we could devise, it is still somewhat complex. In total the model has 11 parameters plus the two control variables (the control variables were how much to allocate to each option in the first time period).

We show how the trade-off between learning and saving resources for acting later can be optimized, and the results have some interesting features. Firstly, various thresholds exist. For example, as the expected difference in performance of the two options increases, the optimal effort to spend on experimentation increases, but only up to a point.

Once the threshold difference in performance is sufficiently large, the optimal level of experimentation declines to zero. This threshold makes some intuitive sense; once we are relatively sure of the difference in performance, then we shouldn’t bother with an experiment to evaluate that. However, prior to reaching that threshold, the optimal effort to spend on the experiment increases with the expected difference in performance; that is somewhat counter intuitive.


Optimal level of experimentation for a particular set of parameter values. The optimal level of experimentation increases with the difference in the expected benefit of each option, but only up to a point after which it is best not to experiment at all.

Similar thresholds exist for the budget and the prior level of uncertainty in performance. However, while the optimal level of experimentation increases with the expected difference in performance, the biggest benefits of experimentation are realized when the expected performance of the two strategies are the same. Thus, the  greatest benefits of experimentation are realized under conditions that differ from when the optimal level of experimentation is greatest.

This work helps to illustrate some fundamental features of adaptive management. We also tie the results explicitly to the notion of expected value of sample information. And we  derive analytical solutions for the optimal level of experimentation for some special cases of parameter values.

While the paper is quite mathematically involved, the concept itself is quite straight-forward, and the results are very interpretable. I think it  is a very interesting study – see what you think. Please send us comments to help us improve the paper while it is being peer reviewed.

Moore AL, Walker L, Runge MC, McDonald-Madden E & McCarthy MA (in review) Two-step adaptive management for choosing between two management actions.

Posted in CEED, Communication, Ecological models, Probability and Bayesian analysis | Tagged , , , , , , , , , , , , , | 1 Comment

Preference flows in #IndiVotes 2106

Update (6 July 2016, 8:00 a.m.)

Kevin Bonham pointed me to this AEC table that indicates the flow of preferences for Nationals voters in three-cornered contests in 2013. It was 75.45% to the Liberals and 24.55% to the ALP. So the Liberals should not be surprised with the preference flow of 3/4 from Nationals to Mirabella in Indi.

The flow of preferences from Liberals voters to Nationals was 90.79% in 2013 – higher than the flow the other way, but even if the flow of Nationals preferences to Mirabella had been that high, McGowan would still be leading in Indi.

Original post

Liberal federal electorate chairman for Indi, Tony Schneider, reportedly said that many Nationals voters preferenced Cathy McGowan ahead of Sophie Mirabella in 2016. He said “The reason we have a Coalition is to put a Coalition member in Parliament, whether that’s Sophie or Marty, but now we don’t have either. A lot of National Party people need to have a good look at that.”

Well, we have a preferential voting system so voters can preference whomever they like. But that aside, we can look at the booth data to estimate the proportion of Nationals voters who preferenced McGowan, an independent, ahead of Mirabella, the Liberal candidate.

Using the method I described last week, I estimate the following flow of preferences to Mirabella from each of the other candidates:

0%  of  1498 votes from  LAPPIN, Alan James ( Independent )
0%  of  2724 votes from  O’CONNOR, Jenny ( The Greens )
64.5%  of  697 votes from  QUILTY, Tim ( Liberal Democrats )
28.1%  of  7404 votes from  KERR, Eric ( Australian Labor Party )
41.2%  of  376 votes from  DYER, Ray ( Independent )
73.6% of  13822 votes from  CORBOY, Marty ( The Nationals )
0 % of  1538 votes from  FIDGE, Julian ( Australian Country Party )
92.1%  of  937 votes from  FERRANDO, Vincent ( Rise Up Australia Party )

This model provides a good fit to the data:


Observed versus fitted number of preferences flowing to Mirabella in each of Indi’s booths.

Based on that analysis, I estimate that three quarters of Nationals voters preferenced Mirabella ahead of McGowan – and a total of approximately 3650 National voters preferred McGowan. Counting is still continuing, but McGowan leads with 41,548 votes to 34,489. If all the Nationals voters had preferenced Mirabella, we would now have another neck-and-neck race rather than a safe win to McGowan.

I’m sure the Liberals will be disappointed they couldn’t inspire the Nationals voters to preference their candidate. But I’m not sure that castigating them will help win them over for the next election, however.

Posted in Communication | Tagged , , , , | 1 Comment

#Batman on a knife edge

Update (9 July, 1:30 p.m.)

The postals votes have drifted a little back toward Bhathal, but Feeney still has 56.62% of them. Some absent votes have now been counted, which are currently favouring Bhathal (55.97%), but she is trailing by almost 3000 votes.

And for a bit more on the Hipster-proof Fence, it is worth noting that the swings in Wills were very consistent across the electorate, but Bhathal’s swings were much weaker north of Bell St.

Update (6 July, 11:30 p.m.)

I should probably change the title of this post to “#Batman all but decided”. With almost 2000 postal votes counted, Feeney is winning >60% of them on a 2CP basis, and now leads Bhathal by 2761 votes.

My map of “The Hipster-Proof Fence” (formerly known as Bell St) has been reported in The Age.

For the record, I like “The Tofu Curtain” too.

Update (5 July, 10:30 p.m.)

Postal votes are not swinging away from Feeney at this stage. He is winning 62.86% on 2CP after a little over 1000 counted. That is actually better than the 60.77% he won in 2013 for postals. An unlikely result for Bhathal just became much more unlikely.

Update (4 July, 9:00 p.m.)

Tim Wardrop mapped the ALP:Greens booth outcomes in Wills. You can see it here on Twitter.

And if you are wondering what is happening in other seats, I suggest keeping an eye on Kevin Bonham’s blog. (Also, I have an analysis of preference flows in Indi, which indicates why the Liberals seem cranky at some of the Nationals voters who didn’t prefer their candidate over Cathy McGowan.)

Update (4 July, 5:30 p.m.)

As a point of comparison to the geographic gradient of 2CP in Batman, here are the equivalent data for Wills:


The two-candidate-preferred vote to Khalil (ALP) versus Ratnam (Greens) in each of WIlls’ pooling booths. It is the same basic geographic pattern as seen in Batman.

And in terms of counting of Batman, the small hospital booths have been processed, and they favoured Feeney as in previous years. His lead is now >2300 votes.

Update (3 July 2016, 9:40 p.m.)

The geographic gradient in the Batman vote is perhaps even more dramatic if Feeney’s 2CP vote is plotted versus latitude of the polling booth (i.e., how far north the polling booth is):


The two-candidate-preferred vote won by Feeney (ALP) versus Bhathal (Greens) in each of Batman’s pooling booths. The ALP retains a very strong vote in the north of the electorate, while the Greens dominate the south.

Update (3 July 2016, 3:40 p.m.)

The count hasn’t updated, but I have mapped the two-candidate-preferred outcome by polling place. There is a major north-south gradient in the preference for ALP versus Greens. So the outcome of the early/postal/pre-polls will to some extent reflect where those voters live. Given the geographic difference, those votes could be quite different from the ordinary votes if there is a geographic bias in the location of the voters.


Map of polling booths in Batman colour-coded by the winner of the two-candidate-preferred vote. Feeney (ALP) won the red booths, and Bhathal (Greens) won the green booths (click to get a larger version).

The swings were also geographically structured, although not quite so dramatically. Nevertheless, the three booths with a swing toward Feeney were all in the far north east of the electorate, and all booths with a swing toward Bhathal of less than 6% were in the northern half. The majority of booths with a swing toward Bhathal of >12% were in the south.

A quick look suggests a similar north-south gradient in voting occurred for the neighbouring seat of Wills.



David Feeney and Alexandra Bhathal, the two leading candidates in the 2016 election for the federal seat of Batman (photos from their Twitter profiles).

Update (3 July 2016, 8:10 a.m.)

The AEC website updated overnight, and the new data are more hopeful for Feeney and less hopeful for Bhathal. The percentage of the vote counted is the same as last night, but Bhathal is now reported to have only 48.55% of the vote, trailing by 2074 votes.

The swing by booths (only for those booths used in 2013 as well as 2016) suggests the average swing is 9.3% away from Feeney. For Bhathal to win, she needs a swing of around 11.5% in the non-ordinary votes, which would net her around 55.5% of those votes. That seems unlikely to me.

Here are the swing data for each booth: versus the number of formal votes in those booths. The pre-poll voting centres (PPVC) are highlighted.


Original post:

With just over 70% of the vote counted (11:15 p.m. 2 July 2016), the election for the seat of Batman is close. The AEC website, disappointingly, does not provide results for individual polling booths, unlike in 2013. Despite that, we can examine the swing as best we can.

David Feeney (ALP) currently has 50.3% of the two-candidate-preferred (2CP) vote against Alexandra Bhathal (Greens) on 49.7%. That translates to a difference of around 650 votes.

Most of the ordinary votes have been counted for this election in Batman. In 2013, Bhathal won 38.08% of the ordinary votes, so the swing to her is approximately 11.6%. She needs a swing of less than that on the remaining votes to win the election.

The remaining votes are mainly early, postal and absent votes. In 2013, Bhathal won 43.9% of these non-ordinary votes. We might expect about 20,000 of these votes (the approximate number in 2013 – there might be more in 2016). A swing of 8% towards Bhathal in these non-ordinary votes would be sufficient for her to recover her current deficit of 650 votes.

It is entirely conceivable that Bhathal will win 52% of the non-ordinary votes, and win the seat. Batman is on a knife edge.

Addition (11:50 p.m.):

2013 Results:
Ordinary 2CP to Bhathal 38.08%
Non-ordinary 2CP to Bhathal 43.93%

2010 Results (vs Martin Ferguson):
Ordinary 2CP to Bhathal 42.06%
Non-ordinary 2CP to Bhathal 42.53%

Based on that history, anything is possible in terms of a difference between ordinary and non-ordinary votes.

Details for 2013 here.

Details for 2010 here.


Posted in Communication | Tagged , , , , , , | 4 Comments

Election fever hits again

In the 2013 election, I took some interest in the election result in Indi, a seat located in the north-east of Victoria. My interest was spurred by the chance that Sophie Mirabella, who was flagged to be the next Science Minister if the Liberal-National coalition won government, might be usurped by Cathy McGowan, an independent candidate. Also, I have some relatives in that part of the world, so I was interested to know who would be their local representative in what turned out to be a very close election.

I enjoyed trying to predict the outcome of the election in Indi, as counting continued over a matter of days. You can see an account of my efforts here. (As an aside, this is the most-read post on my blog – I have an alternative career option should I give up ecology!).

In predicting the winner of the election, there are two main unknowns that need to be determined – how the preferences are flowing to the two leading candidates, and  whether the swing in votes is sufficient to usurp the sitting member.

Australia uses a preferential voting system. Voters select their preferred candidate in the seat for the House of Representatives, then their second preference, third preference, etc, until the voter has indicated their preferences for all candidates in the seat.

The initial counting of votes tallies these first preferences for each candidate. Then, the ballot papers of the candidate with the fewest votes are distributed to the other candidates based on the second preferences on those ballot papers. So if we had five candidates initially, the possible winners are narrowed down to four, and the ballot papers of the fifth candidate are then allocated to the remaining four candidates based on the second preferences.

Then, the ballot papers of the candidate with the fewest votes are distributed among the other three. This process continues until we have only two candidates remaining, at which point we have the two-candidate-preferred vote. After this point, the candidate with the most votes wins.

In trying to predict the winner of an election, a key part is predicting how the preferences will flow to the two leading candidates. The Australian Electoral Commission provides updates on first preference counts initially, and then two-candidate-preferred counts as they are completed. Because the two-candidate-preferred counts lag behind the first preference counts, it would be useful  to predict preference flows. If preferences have been counted for a sample of booths, it is possible to model the flow of preferences – here is one way to do that.

Let’s look at the first preference counts and two-candidate-preferred counts for a few booths in the seat of Indi from the 2013 election:

Candidate Alex-
Badda-ginnie Barra-
Barna-wartha Beech-
R. Dudley (Rise Up Aust) 13 6 6 8 12
C. McGowan (Independent) 216 39 381 248 785
R. Leeworthy (Family First) 30 5 19 11 11
S. Mirabella (Liberal) 715 52 366 230 420
H Aschenbrenner (Sex Party) 23 3 11 5 12
W. Hayes (Bullet Train) 6 0 5 7 1
R. Walsh (ALP) 251 7 76 52 145
J. O’Connor (The Greens) 63 2 23 8 105
P. Rourke (Katter) 2 2 3 2 8
R. Murphy (PUP) 54 4 31 14 16
J. Podesta 6 0 8 5 13
2CP McGowan (Independent) 567 54 519 339 1,069
2CP Mirabella (Liberal) 812 66 410 251 459
Total preference flows 448 29 182 112 323
Fraction to McGowan 0.783482 0.517241 0.758242 0.8125 0.879257

We can see that in the Alexandra booth, Cathy McGowan only won 216 first preference votes, compared to Sophie Mirabella’s 715. But the 448 votes of remaining candidates flowed distinctly towards McGowan – on more than 78% of those ballot papers, McGowan was preferenced ahead of Mirabella, so she collected those preferences.

The flow of preferences was even stronger in Beechworth, where McGowan won almost 88% of the distributed preferences, but she got less than 52% of the preferences in Baddaginnie. You might notice a big difference between Beechworth and Baddaginnie in the first preferences. For example, the Greens won almost 7% of first preferences in Beechworth but less than 2% of first preferences in Baddaginnie.

We can model this flow of preferences as a function of the first preferences to predict the two-candidate-preferred vote from first preferences. Here, we are essentially aiming to predict the fraction of votes that flow from the first preferences of the other candidates to the two leading candidates.

We can build this model using linear regression, but we would like to constrain the model coefficients such that they are between zero and one; the coefficients estimate the proportion of voters whose preferenced one of the 2CP candidates ahead of the other.

If we take the data from all of Indi’s 103 booths (and also the postal, early, provisional, and absentee votes), then our model results look like this:

Observed preference flows to Cathy McGowan versus fitted preference flows based on the 2013 federal election results.

Observed preference flows to Cathy McGowan versus fitted preference flows for each of the booths (and the non-ordinary votes) based on the 2013 federal election results.

Let’s look at the model coefficients:

0.623 : DUDLEY, Robert ( Rise Up Australia Party ), 985 votes
0.662 : LEEWORTHY, Rick ( Family First Party ), 1330 votes
0.383 : ASCHENBRENNER, Helma ( Sex Party ), 1402 votes
0.000 : HAYES, William ( Bullet Train For Australia ), 489 votes
0.992 :  WALSH, Robyn ( Australian Labor Party ), 10375 votes
0.700 : O’CONNOR, Jenny ( The Greens ), 3041
0.008 : ROURKE, Phil ( Katter’s Australian Party ), 615 votes
0.680 : MURPHY, Robert Denis ( Palmer United Party ), 2417 votes
1.000 : PODESTA, Jennifer ( Independent ), 841 votes

These coefficients estimate that McGowan was preferenced behind Mirabella by almost all voters who put Hayes first (the estimated coefficient is zero), but she was placed ahead of Mirabella by almost everyone who put Walsh first (the estimated coefficient is 0.992).

The graph shows that this pattern of preference flows as a function of first preferences is very consistent, at least in Indi. In some other electorates, it is not so consistent. Here are the model results for the seat of Batman, which will be hotly contested in 2016 between the Greens and the ALP:


Observed preference flows to Alexandra Bhathal (a Greens candidate) versus fitted preference flows for each of the booths in the seat of Batman (and the non-ordinary votes) based on the 2013 federal election results.

The model for Batman doesn’t work quite as well, largely because Bhathal received a greater flow of preferences from the non-ordinary votes (orange symbols in the figure) than from the ordinary votes. These non-ordinary votes are the postal votes (Bhathal received almost 1700 of the flowing preferences), absent votes (Bhathal received over 1000 of the flowing preferences), early votes (Bhathal received just under 1000 of the flowing preferences), and provisional votes (there were very few of these).

Interestingly, a similar pattern occurred in Wills, which is another inner Melbourne seat with a Greens candidate – it seems the Greens garnered strong preference flows from the non-ordinary votes in 2013. Whether that will be borne out in 2016 remains to be seen, but strong preference flows will be needed by the Greens if they are to prevail in Batman.

If you’d like to look at preference flows for yourself for different seats in the 2013 election, then you are welcome to use my R code that I wrote – it scrapes the data from the AEC website, runs the model and prints out the result.

The code is best run using the source command in R so that you are prompted to select the seat from the list of lower house seats (or you can just specify the seat number directly from within the code). And please excuse my R coding – I know it is clumsy in places, I am learning, and am yet to figure out R’s data structures properly to do vectorized operations (among other things I don’t understand).

Also, I haven’t checked that this works on all seats – there might be some anomalies that I haven’t accounted for.


seatsite="" # seats listed here
# seatsite="" # 2010 seat

seat.table = readHTMLTable(seatsite, header=F, which=1, stringsAsFactors=F,  # extract the seats

# arranged in 3 columns, with various white spaces, so trim
x <- gsub("\t", "", seat.table[6,]$V1, fixed = TRUE)
x <- gsub("\r", "", x, fixed = TRUE)
x <- gsub("\n\n\n\n\n", "\n", x, fixed = TRUE)
V1 <- strsplit(x, "\n")

x <- gsub("\t", "", seat.table[6,]$V2, fixed = TRUE)
x <- gsub("\r", "", x, fixed = TRUE)
x <- gsub("\n\n\n\n\n", "\n", x, fixed = TRUE)
V2 <- strsplit(x, "\n")

x <- gsub("\t", "", seat.table[6,]$V3, fixed = TRUE)
x <- gsub("\r", "", x, fixed = TRUE)
x <- gsub("\n\n\n\n\n", "\n", x, fixed = TRUE)
V3 <- strsplit(x, "\n")

seat.names <- c(V1[[1]], V2[[1]], V3[[1]])  # combine the three columns into one

seatlinks <- getHTMLLinks(seatsite, relative=FALSE)  # get the links to the sites with seat specific data
prefix <- ""  
# prefix <- "" #2010 results

seatlinks <- paste(prefix, seatlinks[12:161], sep="")  # paste on the rest of the website
seatlinks <- gsub("FirstPrefs", "PollingPlaces", seatlinks)  # we will need polling booth data, so change names a little

nseats <- length(seat.names)

print(seat.names) # print out the seat names, and prompt user to select one

prompt <- paste("Enter number of seat (between 1 and ", nseats, "): ", sep="")
chosen <- as.numeric(readline(prompt))

cat("Selected seat is ", seat.names[chosen], "\n")

places = seatlinks[chosen]  # this is the link for teh chosen seat

places.table = readHTMLTable(places, header=F, which=1, stringsAsFactors=F,, skip.rows=c(1,2,3,4,5,6))

places.names <- places.table$V1  # gets the list of booths

placeslinks <- getHTMLLinks(places, relative=FALSE) # get links to booth data
placeslinks <- placeslinks[grep("HousePollingPlaceFirstPrefs", placeslinks)]  # trim off redundant info

nplaces <- length(placeslinks)
places.names <- places.names[1:nplaces]

placeslinks <- paste(prefix, placeslinks, sep="")  # paste on the prefix to booth links

skippedrows <- 1:8  # Need 9 for 2010, header=F; 8 for 2013, header=T

# get info for first booth
firstpref.table = readHTMLTable(placeslinks[1], header=T, which=1, stringsAsFactors=F, skip.rows=skippedrows)

# find number of candidates
ncandidates <- pmatch("......", firstpref.table$V1)-1
  ncandidates <- pmatch("FORMAL", firstpref.table$V1)-1

# get candidate names and parties
candidate.names <- firstpref.table$V1[1:ncandidates]
candidate.parties <- firstpref.table$V2[1:ncandidates]

# get two candidate preferred names
twopp.names <- firstpref.table$V1[(nrow(firstpref.table)-2):(nrow(firstpref.table)-1)]

# get arrays ready for data scraping
firstpref <- array(-999, dim=c(nplaces+4, ncandidates))
twopp <- array(-999, dim=c(nplaces+4, 2)) <- array(dim=nplaces+4)

for(i in 1:nplaces)  # for each booth
  firstpref.table = readHTMLTable(placeslinks[i], header=T, which=1, stringsAsFactors=F, skip.rows=skippedrows)
  for(j in 1:ncandidates)  # get first preference count for each candidate
    firstpref[i, j] <- as.numeric(gsub(",","", firstpref.table[j, 3]))  # gsub removes commas from string
  }[i] <- sum(firstpref[i, 1:ncandidates])  # get total first prefs for each booth

  twopp[i, 1] <- as.numeric(gsub(",","", firstpref.table[(nrow(firstpref.table)-2), 3]))  # get 2CP data
  twopp[i, 2] <- as.numeric(gsub(",","", firstpref.table[(nrow(firstpref.table)-1), 3]))

# Now get non-ordinary votes
othervotesite <- gsub("HouseDivisionPollingPlaces", "HouseDivisionFirstPrefsByVoteType", places)
othervotes.table = readHTMLTable(othervotesite, header=T, which=1, stringsAsFactors=F,, skip.rows=c(1,2,3,4,5,6))
absent <- as.numeric(gsub(",","", othervotes.table$V5[1:ncandidates]))
provisional <- as.numeric(gsub(",","", othervotes.table$V7[1:ncandidates]))
early <- as.numeric(gsub(",","", othervotes.table$V9[1:ncandidates]))
postal <- as.numeric(gsub(",","", othervotes.table$V11[1:ncandidates]))

firstpref[nplaces+1, ] <- absent
firstpref[nplaces+2, ] <- provisional 
firstpref[nplaces+3, ] <- early 
firstpref[nplaces+4, ] <- postal 

twopp[nplaces+1, 1] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-2), 5]))  # absent
twopp[nplaces+2, 1] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-2), 7]))  # provisional 
twopp[nplaces+3, 1] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-2), 9]))  # early 
twopp[nplaces+4, 1] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-2), 11])) # postal 

twopp[nplaces+1, 2] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-1), 5]))
twopp[nplaces+2, 2] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-1), 7]))
twopp[nplaces+3, 2] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-1), 9])) 
twopp[nplaces+4, 2] <- as.numeric(gsub(",","", othervotes.table[(nrow(othervotes.table)-1), 11])) 

totalfirstprefs <- array(-999, dim=ncandidates)
for(j in 1:ncandidates)
  totalfirstprefs[j] <- sum(firstpref[, j]) # count total first prefs for each candidate across all booths
} <- pmatch(twopp.names, candidate.names)  # get id's of 2CP people
twopp.parties <- candidate.parties[] # and their parties

nflowed <- twopp[,1] - firstpref[,[1]] # number of preferences flowing to 2pp candidate number 1

otherfirst <- firstpref[,]  # get first pref votes for candidates other than 2pp candidates (we know where they go)
othernames <- candidate.names[]
otherparties <- candidate.parties[]
othertotalfirstprefs <- totalfirstprefs[]

# set up model specification for flow of preferences 
starter <- structure(rep(0.5,(ncandidates-2)), names=letters[1:(ncandidates-2)])
lowers <- structure(rep(0,(ncandidates-2)), names=letters[1:(ncandidates-2)])
uppers <- structure(rep(1,(ncandidates-2)), names=letters[1:(ncandidates-2)])

formula <- "nflowed ~ a*otherfirst[, 1]"

for(i in 2:(ncandidates-2))
  formula <- paste(formula, " + ", letters[i], "*otherfirst[, ", i, "]", sep="")

model <- nls(formula, algorithm="port", start=starter, lower=lowers, upper=uppers)
modelsum <- summary(model)

cat("Estimated flow to: ", twopp.names[1], "(", twopp.parties[1], ")\n")
for(i in 1:(ncandidates-2))
  cat(modelsum$parameters[i,1], " of ", othertotalfirstprefs[i], "votes from ", othernames[i], "(", otherparties[i], ")\n")

flows <- modelsum$parameters[,1] * othertotalfirstprefs
totflow <- sum(flows)
totflow2 <- sum(othertotalfirstprefs) - totflow

cat("\nEstimated votes to: ", twopp.names[1], "(", twopp.parties[1], "): ", totalfirstprefs[[1]]+totflow, "\n")
cat("\nEstimated votes to: ", twopp.names[2], "(", twopp.parties[2], "): ", totalfirstprefs[[2]]+totflow2, "\n")

RSS <- sum(residuals(model)^2)  # residual sum of squares
TSS <- sum((nflowed - mean(nflowed ))^2)  # total sum of squares
RSq <- 1 - (RSS/TSS)  # R-squared measure

fitted <- fitted(model)
maxi <- max(nflowed, max(fitted))

plot(fitted[1:nplaces], nflowed[1:nplaces], col="blue", xlab="Fitted preference flow", ylab="Observed preference flow", xlim=c(0,maxi), ylim=c(0,maxi))
points(fitted[(nplaces+1):(nplaces+4)], nflowed[(nplaces+1):(nplaces+4)], col="orange", pch=8)
abline(a=0, b=1)
rsqtxt <- paste("Flow to ", twopp.names[1], "\n", seat.names[chosen], "\nR-squared = ", round(RSq, digits = 4), "\nBlue are ordinary votes for each booth\nOrange are non-ordinary votes", sep="")
text(x=0, y=0.9*(max(nflowed)), labels=rsqtxt, pos=4)

Well, I hope you found that interesting. We’ll see what happens in the 2016 election… I might do something about swings in a second post if I have time.

Posted in Communication, Probability and Bayesian analysis | Tagged , , , , , , , , , , , , , , | 1 Comment