Skip to contents

pending Import, preprocess and aggregate election data at the same time for a given election and aggregation level. This function also lets remove parties below a given vote share threshold.

Usage

summary_election_data(
  type_elec,
  year = NULL,
  date = NULL,
  level = "all",
  by_parties = TRUE,
  method = NULL,
  threshold = 0.03,
  short_version = TRUE,
  verbose = TRUE,
  filter_porc_ballots = NA,
  filter_candidacies = NA,
  prec_round = 3,
  CERA_remove = FALSE,
  col_abbrev_candidacies = "abbrev_candidacies",
  col_id_elec = "id_elec",
  col_id_poll_station = "id_INE_poll_station",
  col_id_mun = "id_INE_mun",
  cols_mun_var = c("pop_res_mun", "census_counting_mun"),
  col_id_candidacies = c(id_prov = "id_candidacies", id_nat = "id_candidacies_nat")
)

Arguments

type_elec

Type elections for which data is available. It should be one of the following values: "referendum", "congress", "senate", "local", "cabildo" (Canarian council) or "EU".

year

A vector or single value representing the years of the elections to be considered. Please, check in dates_elections_spain that elections of the specified type are available for the provided year.

date

A vector or single value representing the dates of the elections to be considered. If date was provided, it should be in format %Y-%m-%d (e.g., '2000-01-01'). Defaults to NULL. If no date was provided, year should be provided as numerical variable. Please, check in dates_elections_spain that elections of the specified type are available.

level

A string providing the level of aggregation at which the data is to be provided. The allowed values are the following: 'all', 'ccaa', 'prov', 'mun', 'mun_district', 'sec' or 'poll_station'. Defaults to "all".

by_parties

A flag indicates whether user wants a summary by candidacies/parties or just global results at given level. Defaults to TRUE.

method

A string vector providing the methods of apportionment to be used. The allowed values are the following: "D'Hondt" (or "Hondt" or "hondt"), "Hamilton" (or "hamilton" or "Vinton" or "vinton"), "Webster" (or "webster" or "Sainte-Lague" or "sainte-lague"), "Hill" (or "hill" or "Huntington-Hill" or "huntington-hill"), "Dean" (or "dean") or "Adams" (or "adams") or "Hagenbach-Bischoff" (or "hagenbach") (or "bischoff") or "First Past the Post" (or "first") (or "fptp"). Defaults to "Hondt".

threshold

A numerical value (between 0 and 1) indicating the minimal percentage of votes needed to obtain representation for a given electoral district. Defaults to 0.03.

short_version

Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to TRUE.

verbose

Flag to indicate whether detailed messages should be printed during execution. Defaults to TRUE.

filter_porc_ballots

A numerical argument representing the vote percentage threshold (out of 100) that the user wants to use to filter the parties (as long as by_parties = TRUE). Defaults to NA.

filter_candidacies

A string of characters (or vector of them) containing party abbreviations which ballots will be filtered (as long as by_parties = TRUE). Defaults to NA.

prec_round

Rounding accuracy. Defaults to prec_round = 3.

CERA_remove

Flag to indicate whether it should be removed the ballots related to CERA constituencies. Defaults to FALSE.

col_abbrev_candidacies

Column name to uniquely identify the party abbreviations. Defaults to "abbrev_candidacies".

col_id_elec, col_id_poll_station, col_id_mun, col_id_candidacies, cols_mun_var

(Optional) Column names for election's id, poll station's id, municipalities' id, candidacies' id and column names for the variables just available at mun level or greater.

candidacies_data

A database containing the information of candidacies. Database should contain col_abbrev_candidacies and col_id_candidacies columns. Defaults to NULL.

Value

A tibble with rows corresponding to the level of aggregation for each election, including the following variables:

id_elec

election's id constructed from the election code cod_elec and date date_elec.

id_INE_xxx

id for the xxx constituency provided in level: id_INE_ccaa, id_INE_prov, etc. It is only provided for long version.

xxx

names for the xxx constituency provided in level: ccaa, prov, etc.

ballots_1, ballots_2

number of total ballots and turnout percentage in the first and second round (if applicable). It is only provided for long version.

blank_ballots, invalid_ballots

blank and invalid ballots.

party_ballots, valid_ballots, total_ballots

ballots to candidacies/parties, valid ballots (sum of blank_ballots and party_ballots) and total ballots (sum of valid_ballots and invalid_ballots).

porc_candidacies_parties, porc_candidacies_valid, porc_candidacies_census

perc (%) values of ballots for each candidacy related to party_ballots, valid_ballots and census_counting_xxx, respectively.

n_poll_stations

number of polling stations. It is only provided for long version.

pop_res_xxx

population census of residents (CER + CERA) at xxx level. It is only provided for long version.

census_counting_xxx

population eligible to vote after claims at xxx level. It is only provided for long version.

id_candidacies

id for candidacies: national ids when level = "all" and province ids otherwise.

abbrev_candidacies, name_candidacies

acronym and full name of the candidacies.

ballots

number of ballots obtained for each candidacy at each level section.

seats

number of seats

Details

This function chains the two lower-level helpers get_election_data(), which imports and cleans polling-station and candidacy ballots, and aggregate_election_data(), which rolls those data up to the requested territorial level. Then, this function performs a final round of post-processing so that the user obtains, in a single call, a tidy table with information of the chosen date and aggregation level that is ready for analysis or visualisation.

Author

Javier Alvarez-Liebana and David Pereiro-Pol.

Examples


## Correct examples

# Summary 2023 election data at prov level,
# aggregating the candidacies ballots, in a short version
summary_prov <-
  summary_election_data(type_elec = "congress", year = 2023,
                        level = "prov")
#> Summary election data
#>    [x] Checking if parameters are allowed...
#> 
#> Get and join election data
#>    [x] Checking if parameters are allowed...
#>    [x] Importing the following poll station data ...
#>        - congress elections on 2023-07-24
#> 
#>    [x] Importing candidacies and ballots data (at poll station level) ...
#> ... Please be patient, volume of data downloaded and internet connection may take a few seconds
#> Aggregate election data
#>    [x] Checking if parameters are allowed...
#>    [x] Aggregating data at prov level ...
#> 
#> [x] Join information sources and last summaries ...
#> [x] Including candidacies info and summaries ...

if (FALSE) { # \dontrun{

# Summary 2023 and April 2019 election data at mun_district level,
# aggregating the candidacies ballots, in a long version
summary_mun_district <-
  summary_election_data(type_elec = "congress",
                         year = 2023,
                         date = "2019-11-10",
                         level = "mun_district",
                         short_version = FALSE)

# Summary 2023 election data at prov level,
# aggregating the candidacies ballots, in a short version, and
# removing the CERA votes
summary_prov <-
  summary_election_data(type_elec = "congress",
                         year = 2023,
                         level = "prov",
                         short_version = FALSE,
                         CERA_remove = TRUE)

# Summary 2023 election  data at prov level, aggregating the
# candidacies ballots, in a long version, calculating the number
# of seats for each party in each province and filtering ballots
# above 45% (percentage between 0 and 100)

summary_prov <-
  summary_election_data(type_elec = "congress", year = 2023,
                        date = "2016-06-26", level = "prov",
                        short_version = FALSE,
                        method = "d'hondt",
                        filter_porc_ballots = 45)

# Summary 2023 election data at mun level, aggregating the
# candidacies ballots, in a long version, and filtering ballots
# above 45% (percentage between 0 and 100) and just PP and PSOE
# parties
summary_mun <-
  summary_election_data(type_elec = "congress", year = 2023,
                        date = "2016-06-26", level = "mun",
                        short_version = FALSE,
                        filter_candidacies = c("PSOE", "PP"),
                        filter_porc_ballots = 45)

# ----
# Incorrect examples
# ----

# Wrong examples

# Invalid election type: "national" is not a valid election type
summary_election_data(type_elec = "national", year = 2019)

# Invalid date format: date should be in %Y-%m-%d format
summary_election_data(type_elec = "congress", date = "26-06-2016")

# Invalid short version flag: short_version should be a
# logical variable
summary_election_data(type_elec = "congress", year = 2019,
                  short_version = "yes")

# Invalid aggregation level
summary_election_data("congress", 2019, level = "district")

# Invalid method
summary_election_data("congress", 2019, method = "don")

# threshold falls outside the valid range of 0 to 1
summary_election_data("congress", 2019, method = "dhondt",
                       threshold = 1.3)

# filter_porc_ballots outside range 0 from 100
summary_election_data("congress", 2019,
                      filter_porc_ballots = 150)

# filter_porc_ballots supplied while by_parties = FALSE
summary_election_data("congress", 2019,
                      by_parties = FALSE,
                      filter_porc_ballots = 5)

# filter_candidacies  supplied while by_parties = FALSE
summary_election_data("congress", 2019,
                      by_parties = FALSE,
                      filter_candidacies = c("PP", "PSOE"))

} # }