Skip to contents

Import, preprocess and summary survey data, for given election types and dates. This function supports both single values and vector inputs for fetching and combining data for multiple elections at once. Surveys for the forthcoming elections can be also asked. Different filtering arguments (polling firm, days until elections, candidacies, sample size, etc) are also included to design a properly query.

Usage

summary_survey_data(
  type_elec = "congress",
  year = NULL,
  date = NULL,
  verbose = TRUE,
  short_version = TRUE,
  format = "long",
  forthcoming = FALSE,
  rm_exit_polls = TRUE,
  rm_unpublish_polls = FALSE,
  filter_polling_firm = NULL,
  filter_media = NULL,
  filter_n_field_days = NULL,
  filter_days_until_elec = NULL,
  lower_sample_size = NULL,
  upper_sample_size = NULL,
  lower_fieldwork_date = NULL,
  upper_fieldwork_date = NULL,
  filter_abbrev_candidacies = NULL
)

Arguments

type_elec

Type elections for which data is available. It should be one of the following values: "referendum", "congress", "senate", "local", "cabildo" (Canarian council) or "EU".

year

A vector or single value representing the years of the elections to be considered. Please, check in dates_elections_spain that elections of the specified type are available for the provided year.

date

A vector or single value representing the dates of the elections to be considered. If date was provided, it should be in format %Y-%m-%d (e.g., '2000-01-01'). Defaults to NULL. If no date was provided, year should be provided as numerical variable. Please, check in dates_elections_spain that elections of the specified type are available.

verbose

Flag to indicate whether detailed messages should be printed during execution. Defaults to TRUE.

short_version

Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to TRUE.

format

Do you want the output in format = "long" or format = "wide" format? Defaults to "long"

forthcoming

A flag indicates whether user wants to include surveys for the forthcoming elections. Defaults to TRUE. If TRUE, no date neither year are required (in that case, just surveys for the next elections are provided).

rm_exit_polls

Flag to indicate whether exit polls should be removed or not. Defaults to TRUE.

rm_unpublish_polls

Flag to indicate whether unpublished polls before elections should be removed or not. In Spain, the Electoral Law (LOREG) establishes that it is forbidden to publish, disseminate, or reproduce electoral polls during the five days prior to election day. Defaults to FALSE.

filter_polling_firm, filter_media, filter_abbrev_candidacies

Do you want to filter surveys by polling firm, media or abbrev of candidacies? A string vector should be introduced. Defaults to NULL.

filter_n_field_days, filter_days_until_elec

Do you want to filter surveys by number of fieldwork days or days until election? A single numeric value should be introduced. Defaults to NULL.

lower_sample_size, upper_sample_size

Do you want to filter surveys by sample size? Single numeric values should be introduced for each one. Defaults to NULL in both cases.

lower_fieldwork_date, upper_fieldwork_date

Do you want to filter surveys by fieldwork_date? Single date values should be introduced for each one. Defaults to NULL in both cases.

Value

A tibble with rows corresponding to the estimated results for each party for each election given by a particular pollster, including the following variables:

id_survey

survey's id constructed from the polling firm and the dates for the start and end of the fieldwork.

id_elec

election's id constructed from the election code and date.

polling_firm

organisation conducting the poll.

media

commissioning organisation / media outlet. Variable not available for short version.

n_polls_by_elec

Number of polls done by each pollster for a particular election. Variable not available for short version.

fieldwork_start, fieldwork_end, n_field_days

fieldwork period and its length (days).

days_until_election

number of days until the next election counting form the start of the fieldwork. Variable not available for short version.

sample_size

sample size of the survey.

abbrev_candidacies, name_candidacies_nat

acronym and full name of the candidacies at national level. The last one is not available available for short version.

id_candidacies_nat

id for candidacies at national level.

estimated_porc_ballots

estimated percentage of ballots for each party.

Details

This function uses the helper function import_survey_data(), which imports and cleans survey data, and adds other relevant variables for analysis. This function gives a tidy table of the survey data for analysis or visualization. Note that dates and years should be associated with years of elections.

Author

Javier Alvarez-Liebana and David Pereiro-Pol.

Examples


## Correct examples

# Summary of surveys for 2016 and 2023, and 2019-04-28 elections.
summary_survey_data(year = c(2016, 2023), date = "2019-04-28")
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 11,940 × 11
#>    id_survey       id_elec polling_firm n_field_days sample_size fieldwork_start
#>    <chr>           <chr>   <chr>               <dbl>       <dbl> <date>         
#>  1 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  2 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  3 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  4 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  5 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  6 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  7 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  8 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  9 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#> 10 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#> # ℹ 11,930 more rows
#> # ℹ 5 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> #   id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> #   estimated_porc_ballots <dbl>

# Summary for all 2019-2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, the last 30 days before elec,
# and with a sample size at least than 500 people
summary_survey_data(year = c(2019, 2023), short_version = FALSE,
                    filter_polling_firm = c("GAD3", "40DB"),
                    filter_days_until_elec = 30,
                    lower_sample_size =  500)
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> # A tibble: 388 × 14
#>    id_survey          id_elec polling_firm media n_polls_by_elec fieldwork_start
#>    <chr>              <chr>   <chr>        <chr>           <int> <date>         
#>  1 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  2 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  3 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  4 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  5 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  6 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  7 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  8 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  9 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#> 10 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#> # ℹ 378 more rows
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> #   sample_size <dbl>, abbrev_candidacies <chr>, id_candidacies_nat <chr>,
#> #   name_candidacies_nat <chr>, estimated_porc_ballots <dbl>,
#> #   days_until_elec <dbl>

# Summary for surveys for the forthcoming elections in, filtering
# by parties ("PP" and "PSOE") and with a sample size of at least
# 500 people
summary_survey_data(forthcoming = TRUE,
                    filter_abbrev_candidacies = c("PP", "PSOE"),
                    lower_sample_size =  500)
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#> Be careful! No date neither year was provided: just surveys for the forthcoming elections were provided
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> Be careful! No data was found for the `lower_sample_size` and `upper_sample_size` provided. No filtering was performed
#> Be careful! No data was found for the `filter_abbrev_candidacies` provided. No filtering was performed
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 0 × 11
#> # ℹ 11 variables: id_survey <chr>, id_elec <glue>, polling_firm <chr>,
#> #   n_field_days <dbl>, sample_size <dbl>, fieldwork_start <date>,
#> #   fieldwork_end <date>, abbrev_candidacies <chr>, id_candidacies_nat <chr>,
#> #   name_candidacies_nat <chr>, estimated_porc_ballots <dbl>

# Summary for 2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, surveys that go from the
# first of March of 2020 to the first of August of the same year
summary_survey_data(year = 2023, short_version = FALSE,
                    filter_polling_firm = c("GAD3", "40DB"),
                    lower_fieldwork_date = "2020-03-01",
                    upper_fieldwork_date = "2020-08-01")
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> # A tibble: 12 × 14
#>    id_survey          id_elec polling_firm media n_polls_by_elec fieldwork_start
#>    <chr>              <chr>   <chr>        <chr>           <int> <date>         
#>  1 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  2 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  3 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  4 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  5 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  6 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  7 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  8 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  9 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 10 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 11 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 12 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> #   sample_size <dbl>, abbrev_candidacies <chr>, id_candidacies_nat <chr>,
#> #   name_candidacies_nat <chr>, estimated_porc_ballots <dbl>,
#> #   days_until_elec <dbl>
if (FALSE) { # \dontrun{
# ----
# Incorrect examples
# ----

# Wrong examples

# Invalid year
summary_survey_data(year = 2018, short_version = FALSE)

# Invalid short version flag: short_version should be a
# logical variable
summary_survey_data(type_elec = "congress", year = 2019,
                   short_version = "yes")

# upper_fieldwork_date prior to lower_fieldwork_date
summary_survey_data(year = 2023,
                    lower_fieldwork_date = "2020-08-01",
                    upper_fieldwork_date = "2020-03-01")

# upper_sample_size smaller than lower_sample-size
summary_survey_data(year = 2023, lower_sample_size = 500,
                    upper_sample_size = 200)

# Invalid class for argument filter_polling_firm
summary_survey_data(year = 2023, filter_polling_firm = 700)

# Invalid class for argument filter_media
summary_survey_data(year = 2023, filter_media = 700)

# Invalid n_fields_days
summary_survey_data(year = 2023, filter_n_field_days = -1)

} # }