Summaries of the electoral and candidacies ballots data for a given aggregation level (ccaa, prov, etc)

Import, preprocess and aggregate election data at the same time for a given election and aggregation level. This function also lets remove parties below a given vote share threshold.

Usage

summary_election_data(
  type_elec,
  year = NULL,
  date = NULL,
  level = "all",
  by_parties = TRUE,
  method = NULL,
  threshold = 0.03,
  short_version = TRUE,
  verbose = TRUE,
  filter_porc_ballots = NA,
  filter_candidacies = NA,
  prec_round = 3,
  CERA_remove = FALSE,
  col_abbrev_candidacies = "abbrev_candidacies",
  col_id_elec = "id_elec",
  col_id_poll_station = "id_INE_poll_station",
  col_id_mun = "id_INE_mun",
  cols_mun_var = c("pop_res_mun", "census_counting_mun"),
  col_id_candidacies = c(id_prov = "id_candidacies", id_nat = "id_candidacies_nat")
)

Arguments

type_elec: Type elections for which data is available. It should be one of the following values: "referendum", "congress", "senate", "local", "cabildo" (Canarian council) or "EU".
year: A vector or single value representing the years of the elections to be considered. Please, check in dates_elections_spain that elections of the specified type are available for the provided year.
date: A vector or single value representing the dates of the elections to be considered. If date was provided, it should be in format %Y-%m-%d (e.g., '2000-01-01'). Defaults to NULL. If no date was provided, year should be provided as numerical variable. Please, check in dates_elections_spain that elections of the specified type are available.
level: A string providing the level of aggregation at which the data is to be provided. The allowed values are the following: 'all', 'ccaa', 'prov', 'mun', 'mun_district', 'sec' or 'poll_station'. Defaults to "all".
by_parties: A flag indicates whether user wants a summary by candidacies/parties or just global results at given level. Defaults to TRUE.
method: A string vector providing the methods of apportionment to be used. The allowed values are the following: "D'Hondt" (or "Hondt" or "hondt"), "Hamilton" (or "hamilton" or "Vinton" or "vinton"), "Webster" (or "webster" or "Sainte-Lague" or "sainte-lague"), "Hill" (or "hill" or "Huntington-Hill" or "huntington-hill"), "Dean" (or "dean") or "Adams" (or "adams") or "Hagenbach-Bischoff" (or "hagenbach") (or "bischoff") or "First Past the Post" (or "first") (or "fptp"). Defaults to "Hondt".
threshold: A numerical value (between 0 and 1) indicating the minimal percentage of votes needed to obtain representation for a given electoral district. Defaults to 0.03.
short_version: Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to TRUE.
verbose: Flag to indicate whether detailed messages should be printed during execution. Defaults to TRUE.
filter_porc_ballots: A numerical argument representing the vote percentage threshold (out of 100) that the user wants to use to filter the parties (as long as by_parties = TRUE). Defaults to NA.
filter_candidacies: A string of characters (or vector of them) containing party abbreviations which ballots will be filtered (as long as by_parties = TRUE). Defaults to NA.
prec_round: Rounding accuracy. Defaults to prec_round = 3.
CERA_remove: Flag to indicate whether it should be removed the ballots related to CERA constituencies. Defaults to FALSE.
col_abbrev_candidacies: Column name to uniquely identify the party abbreviations. Defaults to "abbrev_candidacies".
col_id_elec, col_id_poll_station, col_id_mun, col_id_candidacies, cols_mun_var: (Optional) Column names for election's id, poll station's id, municipalities' id, candidacies' id and column names for the variables just available at mun level or greater.

Value

A tibble with rows corresponding to the level of aggregation for each election, including the following variables:

id_elec: election's id constructed from the election code cod_elec and date date_elec.
id_INE_xxx: id for the xxx constituency provided in level: id_INE_ccaa, id_INE_prov, etc. It is only provided for long version.
xxx: names for the xxx constituency provided in level: ccaa, prov, etc.
ballots_1, ballots_2: number of total ballots and turnout percentage in the first and second round (if applicable). It is only provided for long version.
blank_ballots, invalid_ballots: blank and invalid ballots.
party_ballots, valid_ballots, total_ballots: ballots to candidacies/parties, valid ballots (sum of blank_ballots and party_ballots) and total ballots (sum of valid_ballots and invalid_ballots).
porc_candidacies_parties, porc_candidacies_valid, porc_candidacies_census: perc (%) values of ballots for each candidacy related to party_ballots, valid_ballots and census_counting_xxx, respectively.
n_poll_stations: number of polling stations. It is only provided for long version.
pop_res_xxx: population census of residents (CER + CERA) at xxx level. It is only provided for long version.
census_counting_xxx: population eligible to vote after claims at xxx level. It is only provided for long version.
id_candidacies: id for candidacies: national ids when level = "all" and province ids otherwise.
abbrev_candidacies, name_candidacies: acronym and full name of the candidacies.
ballots: number of ballots obtained for each candidacy at each level section.
seats: number of seats

Details

This function chains the two lower-level helpers get_election_data(), which imports and cleans polling-station and candidacy ballots, and aggregate_election_data(), which rolls those data up to the requested territorial level. Then, this function performs a final round of post-processing so that the user obtains, in a single call, a tidy table with information of the chosen date and aggregation level that is ready for analysis or visualisation.

Author

Javier Alvarez-Liebana and David Pereiro-Pol.

Examples


## Correct examples

# Summary 2023 election data at prov level,
# aggregating the candidacies ballots, in a short version
summary_prov <-
  summary_election_data(type_elec = "congress", year = 2023,
                        level = "prov")
#> Summary election data
#>    [x] Checking if parameters are allowed...
#> 
#> Get and join election data
#>    [x] Checking if parameters are allowed...
#>    [x] Importing the following poll station data ...
#>        - congress elections on 2023-07-24
#> 
#>    [x] Importing candidacies and ballots data (at poll station level) ...
#> ... Please be patient, volume of data downloaded and internet connection may take a few seconds
#> Aggregate election data
#>    [x] Checking if parameters are allowed...
#>    [x] Aggregating data at prov level ...
#> 
#> [x] Join information sources and last summaries ...
#> [x] Including candidacies info and summaries ...

if (FALSE) { # \dontrun{

# Summary 2023 and April 2019 election data at mun_district level,
# aggregating the candidacies ballots, in a long version
summary_mun_district <-
  summary_election_data(type_elec = "congress",
                         year = 2023,
                         date = "2019-11-10",
                         level = "mun_district",
                         short_version = FALSE)

# Summary 2023 election data at prov level,
# aggregating the candidacies ballots, in a short version, and
# removing the CERA votes
summary_prov <-
  summary_election_data(type_elec = "congress",
                         year = 2023,
                         level = "prov",
                         short_version = FALSE,
                         CERA_remove = TRUE)

# Summary 2023 election  data at prov level, aggregating the
# candidacies ballots, in a long version, calculating the number
# of seats for each party in each province and filtering ballots
# above 45% (percentage between 0 and 100)

summary_prov <-
  summary_election_data(type_elec = "congress", year = 2023,
                        date = "2016-06-26", level = "prov",
                        short_version = FALSE,
                        method = "d'hondt",
                        filter_porc_ballots = 45)

# Summary 2023 election data at mun level, aggregating the
# candidacies ballots, in a long version, and filtering ballots
# above 45% (percentage between 0 and 100) and just PP and PSOE
# parties
summary_mun <-
  summary_election_data(type_elec = "congress", year = 2023,
                        date = "2016-06-26", level = "mun",
                        short_version = FALSE,
                        filter_candidacies = c("PSOE", "PP"),
                        filter_porc_ballots = 45)

# ----
# Incorrect examples
# ----

# Wrong examples

# Invalid election type: "national" is not a valid election type
summary_election_data(type_elec = "national", year = 2019)

# Invalid date format: date should be in %Y-%m-%d format
summary_election_data(type_elec = "congress", date = "26-06-2016")

# Invalid short version flag: short_version should be a
# logical variable
summary_election_data(type_elec = "congress", year = 2019,
                  short_version = "yes")

# Invalid aggregation level
summary_election_data("congress", 2019, level = "district")

# Invalid method
summary_election_data("congress", 2019, method = "don")

# threshold falls outside the valid range of 0 to 1
summary_election_data("congress", 2019, method = "dhondt",
                       threshold = 1.3)

# filter_porc_ballots outside range 0 from 100
summary_election_data("congress", 2019,
                      filter_porc_ballots = 150)

# filter_porc_ballots supplied while by_parties = FALSE
summary_election_data("congress", 2019,
                      by_parties = FALSE,
                      filter_porc_ballots = 5)

# filter_candidacies  supplied while by_parties = FALSE
summary_election_data("congress", 2019,
                      by_parties = FALSE,
                      filter_candidacies = c("PP", "PSOE"))

} # }