Aggregate elections data at provided level (ccaa, prov, etc) — aggregate_election

Aggregate polling station election results to any chosen territorial level, providing party level ballots, total ballots, number of polling stations and contextual sums.

Usage

aggregate_election_data(
  election_data,
  level = "all",
  by_parties = TRUE,
  prec_round = 3,
  verbose = TRUE,
  short_version = TRUE,
  col_id_elec = "id_elec",
  col_id_poll_station = "id_INE_poll_station",
  col_id_mun = "id_INE_mun",
  cols_mun_var = c("pop_res_mun", "census_counting_mun"),
  col_id_candidacies = c(id_prov = "id_candidacies", id_nat = "id_candidacies_nat")
)

Arguments

election_data: A database containing general election data already provided (by other functions or by the user). Database should contain col_id_elec, col_id_poll_station, cols_mun_var and col_id_candidacies columns. Defaults to NULL.
level: A string providing the level of aggregation at which the data is to be provided. The allowed values are the following: 'all', 'ccaa', 'prov', 'mun', 'mun_district', 'sec' or 'poll_station'. Defaults to "all".
by_parties: A flag indicates whether user wants a summary by candidacies/parties or just global results at given level. Defaults to TRUE.
prec_round: Rounding accuracy. Defaults to prec_round = 3.
verbose: Flag to indicate whether detailed messages should be printed during execution. Defaults to TRUE.
short_version: Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to TRUE.
col_id_elec, col_id_poll_station, col_id_mun, col_id_candidacies, cols_mun_var: (Optional) Column names for election's id, poll station's id, municipalities' id, candidacies' id and column names for the variables just available at mun level or greater.

Value

A tibble with rows corresponding to the level of aggregation for each election, including the following variables:

id_elec: election's id constructed from the election code cod_elec and date date_elec.
cod_elec: code representing the type of election: "01" (referendum), "02" (congress), "03" (senate), "04" (local elections), "06" (cabildo - Canarian council - elections), "07" (European Parliament elections).
id_INE_xxx: id for the xxx constituency provided in level: id_INE_ccaa, id_INE_prov, etc.
xxx: names for the xxx constituency provided in level: ccaa, prov, etc.
blank_ballots, invalid_ballots: blank and invalid ballots.
party_ballots, valid_ballots, total_ballots: ballots to candidacies/parties, valid ballots (sum of blank_ballots and party_ballots) and total ballots (sum of valid_ballots and invalid_ballots).
n_poll_stations: number of polling stations.
id_candidacies: id for candidacies (at province level).
id_candidacies_nat: id for candidacies at region national level.
ballots: number of ballots obtained for each candidacy at each level section.

Details

This function is actually a helper function that, given an electoral data file with a specific structure, aggregates the information to the level specified in level. Data that is only available at the provincial or municipal level is handled differently when the aggregation level is below those levels (for example, CERA data cannot be aggregated below the province, in which case 52 special constituencies are added). This function is not intended as a final-use tool for basic users, but rather as an intermediate step for the summary_election_data() function.

Author

Javier Alvarez-Liebana and David Pereiro-Pol.

Examples

## Correct examples

# Election data from 2023 and 1989
election_data <-
   get_election_data(type_elec = "congress", year = 2023,
                     date = "1989-10-29")
#> Get and join election data
#>    [x] Checking if parameters are allowed...
#>    [x] Importing the following poll station data ...
#>        - congress elections on 2023-07-24
#>        - congress elections on 1989-10-29
#> 
#>    [x] Importing candidacies and ballots data (at poll station level) ...
#> ... Please be patient, volume of data downloaded and internet connection may take a few seconds
#> Be careful! Some poll stations does not match individual ballots with summaries provided by MIR. The discrepancies were resolved by using votes by candidacies.
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)

# National level results (without parties)
nat_agg <-
   election_data |>
   aggregate_election_data(level = "all", by_parties = FALSE)
#> Aggregate election data
#>    [x] Checking if parameters are allowed...
#>    [x] Aggregating data at national level ...

# Province level results (with parties)
prov_agg <-
   election_data |>
   aggregate_election_data(level = "prov")
#> Aggregate election data
#>    [x] Checking if parameters are allowed...
#>    [x] Aggregating data at prov level ...

if (FALSE) { # \dontrun{

# ----
# Incorrect examples
# ----

# Wrong examples

# Invalid 'level' argument,"district" is not allowed
aggregate_election_data(election_data, level = "district")

# Invalid 'by_parties' flag: it must be logical, not character
aggregate_election_data(election_data, level = "prov",
                        by_parties = "yes")

# Invalid parameters: col_id_candidacies should be matched with
# the variable names
aggregate_election_data(election_data, level = "ccaa",
                        col_id_candidacies = "wrong_id")

} # }