Import, preprocess and summary survey data, for given election types and dates. This function supports both single values and vector inputs for fetching and combining data for multiple elections at once. Surveys for the forthcoming elections can be also asked. Different filtering arguments (polling firm, days until elections, candidacies, sample size, etc) are also included to design a properly query.
Usage
summary_survey_data(
type_elec = "congress",
year = NULL,
date = NULL,
verbose = TRUE,
short_version = TRUE,
format = "long",
forthcoming = FALSE,
rm_exit_polls = TRUE,
rm_unpublish_polls = FALSE,
filter_polling_firm = NULL,
filter_media = NULL,
filter_n_field_days = NULL,
filter_days_until_elec = NULL,
lower_sample_size = NULL,
upper_sample_size = NULL,
lower_fieldwork_date = NULL,
upper_fieldwork_date = NULL,
filter_abbrev_candidacies = NULL
)
Arguments
- type_elec
Type elections for which data is available. It should be one of the following values: "referendum", "congress", "senate", "local", "cabildo" (Canarian council) or "EU".
- year
A vector or single value representing the years of the elections to be considered. Please, check in
dates_elections_spain
that elections of the specified type are available for the provided year.- date
A vector or single value representing the dates of the elections to be considered. If date was provided, it should be in format %Y-%m-%d (e.g., '2000-01-01'). Defaults to
NULL
. If no date was provided,year
should be provided as numerical variable. Please, check indates_elections_spain
that elections of the specified type are available.- verbose
Flag to indicate whether detailed messages should be printed during execution. Defaults to
TRUE
.- short_version
Flag to indicate whether it should be returned a short version of the data (just key variables) or not. Defaults to
TRUE
.- format
Do you want the output in
format = "long"
orformat = "wide"
format? Defaults to"long"
- forthcoming
A flag indicates whether user wants to include surveys for the forthcoming elections. Defaults to
TRUE
. IfTRUE
, no date neither year are required (in that case, just surveys for the next elections are provided).- rm_exit_polls
Flag to indicate whether exit polls should be removed or not. Defaults to
TRUE
.- rm_unpublish_polls
Flag to indicate whether unpublished polls before elections should be removed or not. In Spain, the Electoral Law (LOREG) establishes that it is forbidden to publish, disseminate, or reproduce electoral polls during the five days prior to election day. Defaults to
FALSE
.- filter_polling_firm, filter_media, filter_abbrev_candidacies
Do you want to filter surveys by polling firm, media or abbrev of candidacies? A string vector should be introduced. Defaults to
NULL
.- filter_n_field_days, filter_days_until_elec
Do you want to filter surveys by number of fieldwork days or days until election? A single numeric value should be introduced. Defaults to
NULL
.- lower_sample_size, upper_sample_size
Do you want to filter surveys by sample size? Single numeric values should be introduced for each one. Defaults to
NULL
in both cases.- lower_fieldwork_date, upper_fieldwork_date
Do you want to filter surveys by fieldwork_date? Single date values should be introduced for each one. Defaults to
NULL
in both cases.
Value
A tibble with rows corresponding to the estimated results for each party for each election given by a particular pollster, including the following variables:
- id_survey
survey's id constructed from the polling firm and the dates for the start and end of the fieldwork.
- id_elec
election's id constructed from the election code and date.
- polling_firm
organisation conducting the poll.
- media
commissioning organisation / media outlet. Variable not available for short version.
- n_polls_by_elec
Number of polls done by each pollster for a particular election. Variable not available for short version.
- fieldwork_start, fieldwork_end, n_field_days
fieldwork period and its length (days).
- days_until_election
number of days until the next election counting form the start of the fieldwork. Variable not available for short version.
- sample_size
sample size of the survey.
- abbrev_candidacies, name_candidacies_nat
acronym and full name of the candidacies at national level. The last one is not available available for short version.
- id_candidacies_nat
id for candidacies at national level.
- estimated_porc_ballots
estimated percentage of ballots for each party.
Details
This function uses the helper function
import_survey_data()
, which imports and cleans survey data,
and adds other relevant variables for analysis. This function gives
a tidy table of the survey data for analysis or visualization. Note
that dates and years should be associated with years of elections.
Examples
## Correct examples
# Summary of surveys for 2016 and 2023, and 2019-04-28 elections.
summary_survey_data(year = c(2016, 2023), date = "2019-04-28")
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> [x] Computing some statistics...
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 11,940 × 11
#> id_survey id_elec polling_firm n_field_days sample_size fieldwork_start
#> <chr> <chr> <chr> <dbl> <dbl> <date>
#> 1 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 2 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 3 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 4 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 5 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 6 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 7 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 8 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 9 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> 10 ElectoPanel-20… 02-202… ElectoPanel 4 1250 2019-12-25
#> # ℹ 11,930 more rows
#> # ℹ 5 more variables: fieldwork_end <date>, abbrev_candidacies <chr>,
#> # id_candidacies_nat <chr>, name_candidacies_nat <chr>,
#> # estimated_porc_ballots <dbl>
# Summary for all 2019-2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, the last 30 days before elec,
# and with a sample size at least than 500 people
summary_survey_data(year = c(2019, 2023), short_version = FALSE,
filter_polling_firm = c("GAD3", "40DB"),
filter_days_until_elec = 30,
lower_sample_size = 500)
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> [x] Computing some statistics...
#> [x] Filtering surveys...
#> # A tibble: 388 × 14
#> id_survey id_elec polling_firm media n_polls_by_elec fieldwork_start
#> <chr> <chr> <chr> <chr> <int> <date>
#> 1 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 2 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 3 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 4 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 5 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 6 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 7 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 8 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 9 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> 10 GAD3-2023-07-10-2… 02-202… GAD3 Medi… 47 2023-07-10
#> # ℹ 378 more rows
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> # sample_size <dbl>, abbrev_candidacies <chr>, id_candidacies_nat <chr>,
#> # name_candidacies_nat <chr>, estimated_porc_ballots <dbl>,
#> # days_until_elec <dbl>
# Summary for surveys for the forthcoming elections in, filtering
# by parties ("PP" and "PSOE") and with a sample size of at least
# 500 people
summary_survey_data(forthcoming = TRUE,
filter_abbrev_candidacies = c("PP", "PSOE"),
lower_sample_size = 500)
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> Be careful! No date neither year was provided: just surveys for the forthcoming elections were provided
#> [x] Computing some statistics...
#> [x] Filtering surveys...
#> Be careful! No data was found for the `lower_sample_size` and `upper_sample_size` provided. No filtering was performed
#> Be careful! No data was found for the `filter_abbrev_candidacies` provided. No filtering was performed
#> ! A short version was asked (if you want all variables, run with `short_version = FALSE`)
#> # A tibble: 0 × 11
#> # ℹ 11 variables: id_survey <chr>, id_elec <glue>, polling_firm <chr>,
#> # n_field_days <dbl>, sample_size <dbl>, fieldwork_start <date>,
#> # fieldwork_end <date>, abbrev_candidacies <chr>, id_candidacies_nat <chr>,
#> # name_candidacies_nat <chr>, estimated_porc_ballots <dbl>
# Summary for 2023 surveys in a full version, filtering
# just 40DB and GAD3 polling firms, surveys that go from the
# first of March of 2020 to the first of August of the same year
summary_survey_data(year = 2023, short_version = FALSE,
filter_polling_firm = c("GAD3", "40DB"),
lower_fieldwork_date = "2020-03-01",
upper_fieldwork_date = "2020-08-01")
#> Summary survey data
#> [x] Checking if parameters are allowed...
#> [x] Computing some statistics...
#> [x] Filtering surveys...
#> # A tibble: 12 × 14
#> id_survey id_elec polling_firm media n_polls_by_elec fieldwork_start
#> <chr> <chr> <chr> <chr> <int> <date>
#> 1 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 2 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 3 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 4 GAD3-2020-07-06-2… 02-202… GAD3 ABC 47 2020-07-06
#> 5 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 6 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 7 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 8 GAD3-2020-05-18-2… 02-202… GAD3 ABC 47 2020-05-18
#> 9 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> 10 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> 11 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> 12 GAD3-2020-05-04-2… 02-202… GAD3 ABC 47 2020-05-04
#> # ℹ 8 more variables: fieldwork_end <date>, n_field_days <dbl>,
#> # sample_size <dbl>, abbrev_candidacies <chr>, id_candidacies_nat <chr>,
#> # name_candidacies_nat <chr>, estimated_porc_ballots <dbl>,
#> # days_until_elec <dbl>
if (FALSE) { # \dontrun{
# ----
# Incorrect examples
# ----
# Wrong examples
# Invalid year
summary_survey_data(year = 2018, short_version = FALSE)
# Invalid short version flag: short_version should be a
# logical variable
summary_survey_data(type_elec = "congress", year = 2019,
short_version = "yes")
# upper_fieldwork_date prior to lower_fieldwork_date
summary_survey_data(year = 2023,
lower_fieldwork_date = "2020-08-01",
upper_fieldwork_date = "2020-03-01")
# upper_sample_size smaller than lower_sample-size
summary_survey_data(year = 2023, lower_sample_size = 500,
upper_sample_size = 200)
# Invalid class for argument filter_polling_firm
summary_survey_data(year = 2023, filter_polling_firm = 700)
# Invalid class for argument filter_media
summary_survey_data(year = 2023, filter_media = 700)
# Invalid n_fields_days
summary_survey_data(year = 2023, filter_n_field_days = -1)
} # }