Skip to contents

Import, preprocess and summary survey data.

Usage

summary_survey_data(
  type_elec = "congress",
  year = NULL,
  date = NULL,
  verbose = TRUE,
  short_version = TRUE,
  format = "long",
  forthcoming = FALSE,
  filter_polling_firm = NULL,
  filter_media = NULL,
  filter_n_field_days = NULL,
  filter_days_until_elec = NULL,
  lower_sample_size = NULL,
  upper_sample_size = NULL,
  lower_fieldwork_date = NULL,
  upper_fieldwork_date = NULL,
  filter_abbrev_candidacies = NULL
)

Arguments

type_elec

Type elections for which data is available. It should be one of the following values: "referendum", "congress", "senate", "local", "cabildo" (Canarian council) or "EU".

year

A vector or single value representing the years of the elections to be considered. Please, check in dates_elections_spain that elections of the specified type are available for the provided year.

date

A vector or single value representing the dates of the elections to be considered. If date was provided, it should be in format %Y-%m-%d (e.g., '2000-01-01'). Defaults to NULL. If no date was provided, year should be provided as numerical variable. Please, check in dates_elections_spain that elections of the specified type are available.

verbose

Flag to indicate whether detailed messages should be printed during execution. Defaults to TRUE.

short_version

Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to TRUE.

format

Do you want the output in long or wide format? Defaults to "long"

forthcoming

A flag indicates whether user wants to include surveys for the forthcoming elections. Defaults to TRUE. If TRUE, no date neither year are required (in that case, just surveys for the next elections are provided).

filter_polling_firm, filter_media, filter_abbrev_candidacies

Do you want to filter surveys by polling firm, media or abbrev of candidacies? A string vector should be introduced. Defaults to NULL.

filter_n_field_days, filter_days_until_elec

Do you want to filter surveys by number of fieldwork days or days until election? A single numeric value should be introduced. Defaults to NULL.

lower_sample_size, upper_sample_size

Do you want to filter surveys by sample size? Single numeric values should be introduced for each one. Defaults to NULL in both cases.

lower_fieldwork_date, upper_fieldwork_date

Do you want to filter surveys by fieldwork_date? Single date values should be introduced for each one. Defaults to NULL in both cases.

Value

A tibble with rows corresponding to the estimated results for each party for each election given by a particular pollster, including the following variables:

id_survey

survey's id constructed from the polling firm and the dates for the start and end of the fieldwork.

id_elec

election's id constructed from the election code and date.

polling_firm

organisation conducting the poll.

media

commissioning organisation / media outlet. Variable not available for short version.

n_polls_by_elec

Number of polls done by each pollster for a particular election. Variable not available for short version.

fieldwork_start, fieldwork_end, n_field_days

fieldwork period and its length (days).

days_until_election

number of days until the next election counting form the start of the fieldwork. Variable not available for short version.

sample_size

sample size of the survey.

abbrev_candidacies, name_candidacies_nat

acronym and full name of the candidacies at national level. The last one is not available available for short version.

id_candidacies_nat

id for candidacies at national level.

estimated_porc_ballots

estimated percentage of ballots for each party.

Details

This function uses the helper function import_survey_data(), which imports and cleans survey data, and adds other relevant variables for analysis. This function gives a tidy table of the survey data that is ready for analysis or visualisation.

Author

Javier Alvarez-Liebana and David Pereiro-Pol.

Examples


## Correct examples

# Summary for all 2019-2023 surveys
summary_survey_data(year = c(2019, 2023))
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 8,515 × 9
#>    id_survey       id_elec polling_firm n_field_days sample_size fieldwork_start
#>    <chr>           <chr>   <chr>               <dbl>       <dbl> <date>         
#>  1 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  2 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  3 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  4 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  5 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  6 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  7 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  8 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#>  9 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#> 10 ElectoPanel-20… 02-202… ElectoPanel             4        1250 2019-12-25     
#> # ℹ 8,505 more rows
#> # ℹ 3 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> #   estimated_porc_ballots <dbl>

# Summary for 2019-2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, the last 30 days before elec,
# and with a sample size at least than 500 people
summary_survey_data(year = c(2019, 2023), short_version = FALSE,
                    filter_polling_firm = c("GAD3", "40DB"),
                    filter_days_until_elec = 30,
                    lower_sample_size =  500)
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> # A tibble: 299 × 14
#>    id_survey          id_elec polling_firm media n_polls_by_elec fieldwork_start
#>    <chr>              <chr>   <chr>        <chr>           <int> <date>         
#>  1 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  2 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  3 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  4 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  5 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  6 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  7 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  8 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#>  9 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#> 10 GAD3-2023-07-10-2… 02-202… GAD3         Medi…              47 2023-07-10     
#> # ℹ 289 more rows
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> #   days_until_elec <dbl>, sample_size <dbl>, abbrev_candidacies <chr>,
#> #   id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> #   estimated_porc_ballots <dbl>

# Summary for surveys for the forthcoming elections in, filtering
# by parties ("PP" and "PSOE") and with a sample size of at least
# 500 people
summary_survey_data(forthcoming = TRUE,
                    filter_abbrev_candidacies = c("PP", "PSOE"),
                    lower_sample_size =  500)
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#> Be careful! No date neither year was provided: just surveys for the forthcoming elections were provided
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 607 × 9
#>    id_survey       id_elec polling_firm n_field_days sample_size fieldwork_start
#>    <chr>           <glue>  <chr>               <dbl>       <dbl> <date>         
#>  1 SocioMetrica-2… 02-202… SocioMetrica            7        2309 2023-12-25     
#>  2 SocioMetrica-2… 02-202… SocioMetrica            7        2309 2023-12-25     
#>  3 ElectoPanel-20… 02-202… ElectoPanel             7        1007 2023-12-23     
#>  4 ElectoPanel-20… 02-202… ElectoPanel             7        1007 2023-12-23     
#>  5 Celeste-Tel-20… 02-202… Celeste-Tel             7        1100 2023-12-21     
#>  6 Celeste-Tel-20… 02-202… Celeste-Tel             7        1100 2023-12-21     
#>  7 Sigma Dos-2023… 02-202… Sigma Dos              12        2992 2023-12-15     
#>  8 Sigma Dos-2023… 02-202… Sigma Dos              12        2992 2023-12-15     
#>  9 ElectoPanel-20… 02-202… ElectoPanel             7        1093 2023-12-16     
#> 10 ElectoPanel-20… 02-202… ElectoPanel             7        1093 2023-12-16     
#> # ℹ 597 more rows
#> # ℹ 3 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> #   estimated_porc_ballots <dbl>

# Summary for 2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, surveys that go from the
# first of March of 2020 to the first of August of the same year
summary_survey_data(year = 2023, short_version = FALSE,
                    filter_polling_firm = c("GAD3", "40DB"),
                    lower_fieldwork_date = "2020-03-01",
                    upper_fieldwork_date = "2020-08-01")
#> Summary survey data
#>    [x] Checking if parameters are allowed...
#>    [x] Computing some statistics...
#>    [x] Filtering surveys...
#> # A tibble: 12 × 14
#>    id_survey          id_elec polling_firm media n_polls_by_elec fieldwork_start
#>    <chr>              <chr>   <chr>        <chr>           <int> <date>         
#>  1 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  2 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  3 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  4 GAD3-2020-07-06-2… 02-202… GAD3         ABC                47 2020-07-06     
#>  5 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  6 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  7 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  8 GAD3-2020-05-18-2… 02-202… GAD3         ABC                47 2020-05-18     
#>  9 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 10 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 11 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> 12 GAD3-2020-05-04-2… 02-202… GAD3         ABC                47 2020-05-04     
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> #   days_until_elec <dbl>, sample_size <dbl>, abbrev_candidacies <chr>,
#> #   id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> #   estimated_porc_ballots <dbl>
if (FALSE) { # \dontrun{
# ----
# Incorrect examples
# ----

# Wrong examples

# Invalid year
summary_survey_data(year = 2018, short_version = FALSE)

# Invalid short version flag: short_version should be a
# logical variable
summary_survey_data(type_elec = "congress", year = 2019,
                  short_version = "yes")

# upper_fieldworkday prior to lower_fieldworkday
summary_survey_data(year = 2023,
                    lower_fieldwork_day = "2020-08-01",
                    upper_fieldwork_day = "2020-03-01")

# upper_sample_size smaller than lower_sample-size
summary_survey_data(year = 2023,
                    lower_fieldwork_day = 500,
                    upper_fieldwork_day = 200)

# Invalid class for argument filter_polling_firm
summary_survey_data(year = 2023,
                    filter_polling_firm = 700)

# Invalid class for argument filter_media
summary_survey_data(year = 2023,
                    filter_media = 700)


} # }