Summaries of the poll data for a given election. — summary_survey

Import, preprocess and summary survey data.

Usage

summary_survey_data(
  type_elec = "congress",
  year = NULL,
  date = NULL,
  verbose = TRUE,
  short_version = TRUE,
  format = "long",
  forthcoming = FALSE,
  filter_polling_firm = NULL,
  filter_media = NULL,
  filter_n_field_days = NULL,
  filter_days_until_elec = NULL,
  lower_sample_size = NULL,
  upper_sample_size = NULL,
  lower_fieldwork_date = NULL,
  upper_fieldwork_date = NULL,
  filter_abbrev_candidacies = NULL
)

Arguments

type_elec: Type elections for which data is available. It should be one of the following values: "referendum", "congress", "senate", "local", "cabildo" (Canarian council) or "EU".
year: A vector or single value representing the years of the elections to be considered. Please, check in dates_elections_spain that elections of the specified type are available for the provided year.
date: A vector or single value representing the dates of the elections to be considered. If date was provided, it should be in format %Y-%m-%d (e.g., '2000-01-01'). Defaults to NULL. If no date was provided, year should be provided as numerical variable. Please, check in dates_elections_spain that elections of the specified type are available.
verbose: Flag to indicate whether detailed messages should be printed during execution. Defaults to TRUE.
short_version: Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to TRUE.
format: Do you want the output in long or wide format? Defaults to "long"
forthcoming: A flag indicates whether user wants to include surveys for the forthcoming elections. Defaults to TRUE. If TRUE, no date neither year are required (in that case, just surveys for the next elections are provided).
filter_polling_firm, filter_media, filter_abbrev_candidacies: Do you want to filter surveys by polling firm, media or abbrev of candidacies? A string vector should be introduced. Defaults to NULL.
filter_n_field_days, filter_days_until_elec: Do you want to filter surveys by number of fieldwork days or days until election? A single numeric value should be introduced. Defaults to NULL.
lower_sample_size, upper_sample_size: Do you want to filter surveys by sample size? Single numeric values should be introduced for each one. Defaults to NULL in both cases.
lower_fieldwork_date, upper_fieldwork_date: Do you want to filter surveys by fieldwork_date? Single date values should be introduced for each one. Defaults to NULL in both cases.

Value

A tibble with rows corresponding to the estimated results for each party for each election given by a particular pollster, including the following variables:

id_survey: survey's id constructed from the polling firm and the dates for the start and end of the fieldwork.
id_elec: election's id constructed from the election code and date.
polling_firm: organisation conducting the poll.
media: commissioning organisation / media outlet. Variable not available for short version.
n_polls_by_elec: Number of polls done by each pollster for a particular election. Variable not available for short version.
fieldwork_start, fieldwork_end, n_field_days: fieldwork period and its length (days).
days_until_election: number of days until the next election counting form the start of the fieldwork. Variable not available for short version.
sample_size: sample size of the survey.
abbrev_candidacies, name_candidacies_nat: acronym and full name of the candidacies at national level. The last one is not available available for short version.
id_candidacies_nat: id for candidacies at national level.
estimated_porc_ballots: estimated percentage of ballots for each party.

Details

This function uses the helper function import_survey_data(), which imports and cleans survey data, and adds other relevant variables for analysis. This function gives a tidy table of the survey data that is ready for analysis or visualisation.

Author

Javier Alvarez-Liebana and David Pereiro-Pol.

Examples


## Correct examples

# Summary for all 2019-2023 surveys
summary_survey_data(year = c(2019, 2023))
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 8,515 × 9
#>    id_survey       id_elec polling_firm n_field_days sample_size fieldwork_start
#>    <chr>           <chr>   <chr>               <dbl>       <dbl> <date>         
#>  1 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  2 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  3 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  4 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  5 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  6 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  7 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  8 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  9 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#> 10 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#> # ℹ 8,505 more rows
#> # ℹ 3 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> #   estimated_porc_ballots <dbl>

# Summary for 2019-2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, the last 30 days before elec,
# and with a sample size at least than 500 people
summary_survey_data(year = c(2019, 2023), short_version = FALSE,
                    filter_polling_firm = c("GAD3", "40DB"),
                    filter_days_until_elec = 30,
                    lower_sample_size =  500)
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> # A tibble: 299 × 14
#>    id_survey          id_elec polling_firm media n_polls_by_elec fieldwork_start
#>    <chr>              <chr>   <chr>        <chr>           <int> <date>         
#>  1 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  2 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  3 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  4 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  5 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  6 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  7 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  8 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  9 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#> 10 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#> # ℹ 289 more rows
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> #   days_until_elec <dbl>, sample_size <dbl>, abbrev_candidacies <chr>,
#> #   id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> #   estimated_porc_ballots <dbl>

# Summary for surveys for the forthcoming elections in, filtering
# by parties ("PP" and "PSOE") and with a sample size of at least
# 500 people
summary_survey_data(forthcoming = TRUE,
                    filter_abbrev_candidacies = c("PP", "PSOE"),
                    lower_sample_size =  500)
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#> Be careful! No date neither year was provided: just surveys for the forthcoming elections were provided
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 607 × 9
#>    id_survey       id_elec polling_firm n_field_days sample_size fieldwork_start
#>    <chr>           <glue>  <chr>               <dbl>       <dbl> <date>         
#>  1 SocioMetrica-2… 02-202… SocioMetrica            7        2309 2023-12-25     
#>  2 SocioMetrica-2… 02-202… SocioMetrica            7        2309 2023-12-25     
#>  3 ElectoPanel-20… 02-202… ElectoPanel             7        1007 2023-12-23     
#>  4 ElectoPanel-20… 02-202… ElectoPanel             7        1007 2023-12-23     
#>  5 Celeste-Tel-20… 02-202… Celeste-Tel             7        1100 2023-12-21     
#>  6 Celeste-Tel-20… 02-202… Celeste-Tel             7        1100 2023-12-21     
#>  7 Sigma Dos-2023… 02-202… Sigma Dos              12        2992 2023-12-15     
#>  8 Sigma Dos-2023… 02-202… Sigma Dos              12        2992 2023-12-15     
#>  9 ElectoPanel-20… 02-202… ElectoPanel             7        1093 2023-12-16     
#> 10 ElectoPanel-20… 02-202… ElectoPanel             7        1093 2023-12-16     
#> # ℹ 597 more rows
#> # ℹ 3 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> #   estimated_porc_ballots <dbl>

# Summary for 2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, surveys that go from the
# first of March of 2020 to the first of August of the same year
summary_survey_data(year = 2023, short_version = FALSE,
                    filter_polling_firm = c("GAD3", "40DB"),
                    lower_fieldwork_date = "2020-03-01",
                    upper_fieldwork_date = "2020-08-01")
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> # A tibble: 12 × 14
#>    id_survey          id_elec polling_firm media n_polls_by_elec fieldwork_start
#>    <chr>              <chr>   <chr>        <chr>           <int> <date>         
#>  1 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  2 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  3 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  4 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  5 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  6 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  7 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  8 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  9 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 10 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 11 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 12 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> #   days_until_elec <dbl>, sample_size <dbl>, abbrev_candidacies <chr>,
#> #   id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> #   estimated_porc_ballots <dbl>
if (FALSE) { # \dontrun{
# ----
# Incorrect examples
# ----

# Wrong examples

# Invalid year
summary_survey_data(year = 2018, short_version = FALSE)

# Invalid short version flag: short_version should be a
# logical variable
summary_survey_data(type_elec = "congress", year = 2019,
                  short_version = "yes")

# upper_fieldworkday prior to lower_fieldworkday
summary_survey_data(year = 2023,
                    lower_fieldwork_day = "2020-08-01",
                    upper_fieldwork_day = "2020-03-01")

# upper_sample_size smaller than lower_sample-size
summary_survey_data(year = 2023,
                    lower_fieldwork_day = 500,
                    upper_fieldwork_day = 200)

# Invalid class for argument filter_polling_firm
summary_survey_data(year = 2023,
                    filter_polling_firm = 700)

# Invalid class for argument filter_media
summary_survey_data(year = 2023,
                    filter_media = 700)


} # }