Import, preprocess and summary survey data.
Usage
summary_survey_data(
type_elec = "congress",
year = NULL,
date = NULL,
verbose = TRUE,
short_version = TRUE,
format = "long",
forthcoming = FALSE,
filter_polling_firm = NULL,
filter_media = NULL,
filter_n_field_days = NULL,
filter_days_until_elec = NULL,
lower_sample_size = NULL,
upper_sample_size = NULL,
lower_fieldwork_date = NULL,
upper_fieldwork_date = NULL,
filter_abbrev_candidacies = NULL
)
Arguments
- type_elec
Type elections for which data is available. It should be one of the following values: "referendum", "congress", "senate", "local", "cabildo" (Canarian council) or "EU".
- year
A vector or single value representing the years of the elections to be considered. Please, check in
dates_elections_spain
that elections of the specified type are available for the provided year.- date
A vector or single value representing the dates of the elections to be considered. If date was provided, it should be in format %Y-%m-%d (e.g., '2000-01-01'). Defaults to
NULL
. If no date was provided,year
should be provided as numerical variable. Please, check indates_elections_spain
that elections of the specified type are available.- verbose
Flag to indicate whether detailed messages should be printed during execution. Defaults to
TRUE
.- short_version
Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to
TRUE
.- format
Do you want the output in long or wide format? Defaults to
"long"
- forthcoming
A flag indicates whether user wants to include surveys for the forthcoming elections. Defaults to
TRUE
. IfTRUE
, no date neither year are required (in that case, just surveys for the next elections are provided).- filter_polling_firm, filter_media, filter_abbrev_candidacies
Do you want to filter surveys by polling firm, media or abbrev of candidacies? A string vector should be introduced. Defaults to
NULL
.- filter_n_field_days, filter_days_until_elec
Do you want to filter surveys by number of fieldwork days or days until election? A single numeric value should be introduced. Defaults to
NULL
.- lower_sample_size, upper_sample_size
Do you want to filter surveys by sample size? Single numeric values should be introduced for each one. Defaults to
NULL
in both cases.- lower_fieldwork_date, upper_fieldwork_date
Do you want to filter surveys by fieldwork_date? Single date values should be introduced for each one. Defaults to
NULL
in both cases.
Value
A tibble with rows corresponding to the estimated results for each party for each election given by a particular pollster, including the following variables:
- id_survey
survey's id constructed from the polling firm and the dates for the start and end of the fieldwork.
- id_elec
election's id constructed from the election code and date.
- polling_firm
organisation conducting the poll.
- media
commissioning organisation / media outlet. Variable not available for short version.
- n_polls_by_elec
Number of polls done by each pollster for a particular election. Variable not available for short version.
- fieldwork_start, fieldwork_end, n_field_days
fieldwork period and its length (days).
- days_until_election
number of days until the next election counting form the start of the fieldwork. Variable not available for short version.
- sample_size
sample size of the survey.
- abbrev_candidacies, name_candidacies_nat
acronym and full name of the candidacies at national level. The last one is not available available for short version.
- id_candidacies_nat
id for candidacies at national level.
- estimated_porc_ballots
estimated percentage of ballots for each party.
Details
This function uses the helper function import_survey_data()
,
which imports and cleans survey data, and adds other relevant variables for
analysis. This function gives a tidy table of the survey data that
is ready for analysis or visualisation.
Examples
## Correct examples
# Summary for all 2019-2023 surveys
summary_survey_data(year = c(2019, 2023))
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> [x] Computing some statistics...
#> [x] Filtering surveys...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 8,515 × 9
#> id_survey id_elec polling_firm n_field_days sample_size fieldwork_start
#> <chr> <chr> <chr> <dbl> <dbl> <date>
#> 1 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 2 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 3 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 4 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 5 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 6 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 7 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 8 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 9 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 10 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> # ℹ 8,505 more rows
#> # ℹ 3 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> # estimated_porc_ballots <dbl>
# Summary for 2019-2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, the last 30 days before elec,
# and with a sample size at least than 500 people
summary_survey_data(year = c(2019, 2023), short_version = FALSE,
filter_polling_firm = c("GAD3", "40DB"),
filter_days_until_elec = 30,
lower_sample_size = 500)
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> [x] Computing some statistics...
#> [x] Filtering surveys...
#> # A tibble: 299 × 14
#> id_survey id_elec polling_firm media n_polls_by_elec fieldwork_start
#> <chr> <chr> <chr> <chr> <int> <date>
#> 1 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 2 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 3 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 4 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 5 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 6 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 7 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 8 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 9 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 10 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> # ℹ 289 more rows
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> # days_until_elec <dbl>, sample_size <dbl>, abbrev_candidacies <chr>,
#> # id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> # estimated_porc_ballots <dbl>
# Summary for surveys for the forthcoming elections in, filtering
# by parties ("PP" and "PSOE") and with a sample size of at least
# 500 people
summary_survey_data(forthcoming = TRUE,
filter_abbrev_candidacies = c("PP", "PSOE"),
lower_sample_size = 500)
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> Be careful! No date neither year was provided: just surveys for the forthcoming elections were provided
#> [x] Computing some statistics...
#> [x] Filtering surveys...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 607 × 9
#> id_survey id_elec polling_firm n_field_days sample_size fieldwork_start
#> <chr> <glue> <chr> <dbl> <dbl> <date>
#> 1 SocioMetrica-2… 02-202… SocioMetrica 7 2309 2023-12-25
#> 2 SocioMetrica-2… 02-202… SocioMetrica 7 2309 2023-12-25
#> 3 ElectoPanel-20… 02-202… ElectoPanel 7 1007 2023-12-23
#> 4 ElectoPanel-20… 02-202… ElectoPanel 7 1007 2023-12-23
#> 5 Celeste-Tel-20… 02-202… Celeste-Tel 7 1100 2023-12-21
#> 6 Celeste-Tel-20… 02-202… Celeste-Tel 7 1100 2023-12-21
#> 7 Sigma Dos-2023… 02-202… Sigma Dos 12 2992 2023-12-15
#> 8 Sigma Dos-2023… 02-202… Sigma Dos 12 2992 2023-12-15
#> 9 ElectoPanel-20… 02-202… ElectoPanel 7 1093 2023-12-16
#> 10 ElectoPanel-20… 02-202… ElectoPanel 7 1093 2023-12-16
#> # ℹ 597 more rows
#> # ℹ 3 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> # estimated_porc_ballots <dbl>
# Summary for 2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, surveys that go from the
# first of March of 2020 to the first of August of the same year
summary_survey_data(year = 2023, short_version = FALSE,
filter_polling_firm = c("GAD3", "40DB"),
lower_fieldwork_date = "2020-03-01",
upper_fieldwork_date = "2020-08-01")
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> [x] Computing some statistics...
#> [x] Filtering surveys...
#> # A tibble: 12 × 14
#> id_survey id_elec polling_firm media n_polls_by_elec fieldwork_start
#> <chr> <chr> <chr> <chr> <int> <date>
#> 1 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 2 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 3 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 4 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 5 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 6 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 7 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 8 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 9 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> 10 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> 11 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> 12 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> # days_until_elec <dbl>, sample_size <dbl>, abbrev_candidacies <chr>,
#> # id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> # estimated_porc_ballots <dbl>
if (FALSE) { # \dontrun{
# ----
# Incorrect examples
# ----
# Wrong examples
# Invalid year
summary_survey_data(year = 2018, short_version = FALSE)
# Invalid short version flag: short_version should be a
# logical variable
summary_survey_data(type_elec = "congress", year = 2019,
short_version = "yes")
# upper_fieldworkday prior to lower_fieldworkday
summary_survey_data(year = 2023,
lower_fieldwork_day = "2020-08-01",
upper_fieldwork_day = "2020-03-01")
# upper_sample_size smaller than lower_sample-size
summary_survey_data(year = 2023,
lower_fieldwork_day = 500,
upper_fieldwork_day = 200)
# Invalid class for argument filter_polling_firm
summary_survey_data(year = 2023,
filter_polling_firm = 700)
# Invalid class for argument filter_media
summary_survey_data(year = 2023,
filter_media = 700)
} # }