Title: | Tidy Finance Helper Functions |
---|---|
Description: | Helper functions for empirical research in financial economics, addressing a variety of topics covered in Scheuch, Voigt, and Weiss (2023) <doi:10.1201/b23237>. The package is designed to provide shortcuts for issues extensively discussed in the book, facilitating easier application of its concepts. For more information and resources related to the book, visit <https://www.tidy-finance.org/r/index.html>. |
Authors: | Christoph Scheuch [aut, cre, cph] , Stefan Voigt [aut, cph] , Patrick Weiss [aut, cph] , Maximilian Mücke [ctb] |
Maintainer: | Christoph Scheuch <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.1.9003 |
Built: | 2024-11-18 15:51:54 UTC |
Source: | https://github.com/tidy-finance/r-tidyfinance |
This function adds lagged versions of specified columns to a data frame. Optionally,
the operation can be grouped by another column and allows for flexible handling
of missing values. The lag is applied based on the date
column in the data frame.
add_lag_columns( data, cols, by = NULL, lag, max_lag = lag, drop_na = TRUE, data_options = NULL )
add_lag_columns( data, cols, by = NULL, lag, max_lag = lag, drop_na = TRUE, data_options = NULL )
data |
A data frame containing the columns to be lagged. |
cols |
A character vector specifying the names of the columns to lag. |
by |
An optional column by which to group the data when applying the lag.
Default is |
lag |
The number of periods to lag the columns by. Must be non-negative. |
max_lag |
An optional maximum lag period. The default is equal to |
drop_na |
A logical value indicating whether to drop rows with missing
values in the lagged columns. Default is |
data_options |
A list of additional options for data processing, such as
the |
A data frame with lagged versions of the specified columns appended, optionally grouped by another column.
# Create a sample data frame data <- tibble::tibble( permno = rep(1:2, each = 10), date = rep(seq.Date(as.Date('2023-01-01'), by = "month", length.out = 10), 2), bm = runif(20, 0.5, 1.5), size = runif(20, 100, 200) ) # Add lagged columns for 'bm' and 'size' with a 3-month lag, grouped by 'permno' data |> add_lag_columns(c("bm", "size"), lag = months(3), by = "permno") # Introduce missing values in the data data$bm[c(3, 5, 7, 15, 18)] <- NA data$size[c(2, 4, 8, 13)] <- NA # Add lagged columns with NA values removed data |> add_lag_columns(c("bm", "size"), lag = months(3), by = permno)
# Create a sample data frame data <- tibble::tibble( permno = rep(1:2, each = 10), date = rep(seq.Date(as.Date('2023-01-01'), by = "month", length.out = 10), 2), bm = runif(20, 0.5, 1.5), size = runif(20, 100, 200) ) # Add lagged columns for 'bm' and 'size' with a 3-month lag, grouped by 'permno' data |> add_lag_columns(c("bm", "size"), lag = months(3), by = "permno") # Introduce missing values in the data data$bm[c(3, 5, 7, 15, 18)] <- NA data$size[c(2, 4, 8, 13)] <- NA # Add lagged columns with NA values removed data |> add_lag_columns(c("bm", "size"), lag = months(3), by = permno)
This function assigns data points to portfolios based on a specified
sorting variable and the selected function to compute breakpoints. Users
can specify a function to compute breakpoints. The function must take
data
and sorting_variable
as the first two arguments. Additional
arguments are passed with a named list breakpoint_options. The
function needs to return an ascending vector of breakpoints. By default,
breakpoints are computed with compute_breakpoints. The default
column names can be modified using data_options.
assign_portfolio( data, sorting_variable, breakpoint_options = NULL, breakpoint_function = compute_breakpoints, data_options = NULL )
assign_portfolio( data, sorting_variable, breakpoint_options = NULL, breakpoint_function = compute_breakpoints, data_options = NULL )
data |
A data frame containing the dataset for portfolio assignment. |
sorting_variable |
A string specifying the column name in |
breakpoint_options |
An optional named list of arguments passed to
|
breakpoint_function |
A function to compute breakpoints. The default is set to compute_breakpoints. |
data_options |
A named list of data_options with characters,
indicating the column names required to run this function. The required
column names identify dates. Defaults to |
A vector of portfolio assignments for each row in the input data
.
data <- data.frame( id = 1:100, exchange = sample(c("NYSE", "NASDAQ"), 100, replace = TRUE), market_cap = 1:100 ) assign_portfolio(data, "market_cap", breakpoint_options(n_portfolios = 5)) assign_portfolio( data, "market_cap", breakpoint_options(percentiles = c(0.2, 0.4, 0.6, 0.8), breakpoint_exchanges = c("NYSE")) )
data <- data.frame( id = 1:100, exchange = sample(c("NYSE", "NASDAQ"), 100, replace = TRUE), market_cap = 1:100 ) assign_portfolio(data, "market_cap", breakpoint_options(n_portfolios = 5)) assign_portfolio( data, "market_cap", breakpoint_options(percentiles = c(0.2, 0.4, 0.6, 0.8), breakpoint_exchanges = c("NYSE")) )
This function generates a structured list of options for defining breakpoints in portfolio sorting. It includes parameters for the number of portfolios, percentile thresholds, exchange-specific breakpoints, and smooth bunching, along with additional optional parameters.
breakpoint_options( n_portfolios = NULL, percentiles = NULL, breakpoint_exchanges = NULL, smooth_bunching = FALSE, ... )
breakpoint_options( n_portfolios = NULL, percentiles = NULL, breakpoint_exchanges = NULL, smooth_bunching = FALSE, ... )
n_portfolios |
Integer, optional. The number of portfolios to create. Must be a
positive integer. If not provided, defaults to |
percentiles |
Numeric vector, optional. A vector of percentile thresholds for
defining breakpoints. Each value should be between 0 and 1. If not provided, defaults
to |
breakpoint_exchanges |
Character, optional. A non-empty string specifying the
exchange for which the breakpoints apply. If not provided, defaults to |
smooth_bunching |
Logical, optional. Indicates whether smooth bunching should
be applied. Defaults to |
... |
Additional optional arguments. These will be captured in the resulting structure as a list. |
A list of class "tidyfinance_breakpoint_options"
containing the provided
breakpoint options, including any additional arguments passed via ...
.
breakpoint_options( n_portfolios = 5, percentiles = c(0.2, 0.4, 0.6, 0.8), breakpoint_exchanges = "NYSE", smooth_bunching = TRUE, custom_threshold = 0.5, another_option = "example" )
breakpoint_options( n_portfolios = 5, percentiles = c(0.2, 0.4, 0.6, 0.8), breakpoint_exchanges = "NYSE", smooth_bunching = TRUE, custom_threshold = 0.5, another_option = "example" )
This function checks if a given dataset type is supported by verifying against a list of all supported dataset types from different domains. If the specified type is not supported, it stops execution and returns an error message listing all supported types.
check_supported_type(type)
check_supported_type(type)
type |
The dataset type to check for support. |
Does not return a value; instead, it either passes silently if the type is supported or stops execution with an error message if the type is unsupported.
This function computes breakpoints based on a specified sorting. It can optionally filter the data by exchanges before computing the breakpoints. The function requires either the number of portfolios to be created or specific percentiles for the breakpoints, but not both. The function also optionally handles cases where the sorting variable clusters on the edges, by assigning all extreme values to the edges and attempting to compute equally populated breakpoints with the remaining values.
compute_breakpoints( data, sorting_variable, breakpoint_options, data_options = NULL )
compute_breakpoints( data, sorting_variable, breakpoint_options, data_options = NULL )
data |
A data frame containing the dataset for breakpoint computation. |
sorting_variable |
A string specifying the column name in |
breakpoint_options |
A named list of breakpoint_options for the breakpoints. The arguments include
|
data_options |
A named list of data_options with characters, indicating the column names
required to run this function. The required column names identify dates. Defaults to |
A vector of breakpoints of the desired length.
This function will stop and throw an error if both n_portfolios
and
percentiles
are provided or if neither is provided. Ensure that you only
use one of these parameters.
data <- data.frame( id = 1:100, exchange = sample(c("NYSE", "NASDAQ"), 100, replace = TRUE), market_cap = 1:100 ) compute_breakpoints(data, "market_cap", breakpoint_options(n_portfolios = 5)) compute_breakpoints( data, "market_cap", breakpoint_options(percentiles = c(0.2, 0.4, 0.6, 0.8), breakpoint_exchanges = c("NYSE")) )
data <- data.frame( id = 1:100, exchange = sample(c("NYSE", "NASDAQ"), 100, replace = TRUE), market_cap = 1:100 ) compute_breakpoints(data, "market_cap", breakpoint_options(n_portfolios = 5)) compute_breakpoints( data, "market_cap", breakpoint_options(percentiles = c(0.2, 0.4, 0.6, 0.8), breakpoint_exchanges = c("NYSE")) )
This function calculates long-short returns based on the returns of portfolios. The long-short return is computed as the difference between the returns of the "top" and "bottom" portfolios. The direction of the calculation can be adjusted based on whether the return from the "bottom" portfolio is subtracted from or added to the return from the "top" portfolio.
compute_long_short_returns( data, direction = "top_minus_bottom", data_options = NULL )
compute_long_short_returns( data, direction = "top_minus_bottom", data_options = NULL )
data |
A data frame containing portfolio returns. The data frame must include columns for the portfolio identifier, date, and return measurements. The portfolio column should indicate different portfolios, and there should be columns for return measurements prefixed with "ret_excess". |
direction |
A character string specifying the direction of the long-short return calculation. It can be either "top_minus_bottom" or "bottom_minus_top". Default is "top_minus_bottom". If set to "bottom_minus_top", the return will be computed as (bottom - top). |
data_options |
A named list of data_options with characters, indicating the column
names required to run this function. The required column names identify dates. Defaults to
|
A data frame with columns for date, return measurement types (from the "ret_measure" column), and the computed long-short returns. The data frame is arranged by date and pivoted to have return measurement types as columns with their corresponding long-short returns.
data <- data.frame( permno = 1:100, date = rep(seq.Date(from = as.Date("2020-01-01"), by = "month", length.out = 100), each = 10), mktcap_lag = runif(100, 100, 1000), ret_excess = rnorm(100), size = runif(100, 50, 150) ) portfolio_returns <- compute_portfolio_returns( data, "size", "univariate", breakpoint_options_main = breakpoint_options(n_portfolios = 5) ) compute_long_short_returns(portfolio_returns)
data <- data.frame( permno = 1:100, date = rep(seq.Date(from = as.Date("2020-01-01"), by = "month", length.out = 100), each = 10), mktcap_lag = runif(100, 100, 1000), ret_excess = rnorm(100), size = runif(100, 50, 150) ) portfolio_returns <- compute_portfolio_returns( data, "size", "univariate", breakpoint_options_main = breakpoint_options(n_portfolios = 5) ) compute_long_short_returns(portfolio_returns)
This function computes individual portfolio returns based on specified sorting variables and sorting methods. The portfolios can be rebalanced every period or on an annual frequency by specifying a rebalancing month, which is only applicable at a monthly return frequency. The function supports univariate and bivariate sorts, with the latter supporting dependent and independent sorting methods.
compute_portfolio_returns( sorting_data, sorting_variables, sorting_method, rebalancing_month = NULL, breakpoint_options_main, breakpoint_options_secondary = NULL, breakpoint_function_main = compute_breakpoints, breakpoint_function_secondary = compute_breakpoints, min_portfolio_size = 0, data_options = NULL )
compute_portfolio_returns( sorting_data, sorting_variables, sorting_method, rebalancing_month = NULL, breakpoint_options_main, breakpoint_options_secondary = NULL, breakpoint_function_main = compute_breakpoints, breakpoint_function_secondary = compute_breakpoints, min_portfolio_size = 0, data_options = NULL )
sorting_data |
A data frame containing the dataset for portfolio
assignment and return computation. Following CRSP naming conventions, the
panel data must identify individual stocks with |
sorting_variables |
A character vector specifying the column names in
|
sorting_method |
A string specifying the sorting method to be used. Possible values are:
For bivariate sorts, the portfolio returns are averaged over the
controlling sorting variable (i.e., the second sorting variable) and only
portfolio returns for the main sorting variable (given as the first element
of |
rebalancing_month |
An integer between 1 and 12 specifying the month in
which to form portfolios that are held constant for one year. For example,
setting it to |
breakpoint_options_main |
A named list of breakpoint_options passed to
|
breakpoint_options_secondary |
An optional named list of breakpoint_options
passed to |
breakpoint_function_main |
A function to compute the main sorting variable. The default is set to compute_breakpoints. |
breakpoint_function_secondary |
A function to compute the secondary sorting variable. The default is set to compute_breakpoints. |
min_portfolio_size |
An integer specifying the minimum number of
portfolio constituents (default is set to |
data_options |
A named list of data_options with characters, indicating
the column names required to run this function. The required column names identify dates,
the stocks, and returns. Defaults to |
The function checks for consistency in the provided arguments. For univariate sorts, a single sorting variable and a corresponding number of portfolios must be provided. For bivariate sorts, two sorting variables and two corresponding numbers of portfolios (or percentiles) are required. The sorting method determines how portfolios are assigned and returns are computed. The function handles missing and extreme values appropriately based on the specified sorting method and rebalancing frequency.
A data frame with computed portfolio returns, containing the following columns:
portfolio
: The portfolio identifier.
date
: The date of the portfolio return.
ret_excess_vw
: The value-weighted excess return of the portfolio
(only computed if the sorting_data
contains mktcap_lag
)
ret_excess_ew
: The equal-weighted excess return of the portfolio.
Ensure that the sorting_data
contains all the required columns: The
specified sorting variables and ret_excess
. The function will stop and
throw an error if any required columns are missing.
# Univariate sorting with periodic rebalancing data <- data.frame( permno = 1:500, date = rep(seq.Date(from = as.Date("2020-01-01"), by = "month", length.out = 100), each = 10), mktcap_lag = runif(500, 100, 1000), ret_excess = rnorm(500), size = runif(500, 50, 150) ) compute_portfolio_returns( data, "size", "univariate", breakpoint_options_main = breakpoint_options(n_portfolios = 5) ) # Bivariate dependent sorting with annual rebalancing compute_portfolio_returns( data, c("size", "mktcap_lag"), "bivariate-independent", 7, breakpoint_options_main = breakpoint_options(n_portfolios = 5), breakpoint_options_secondary = breakpoint_options(n_portfolios = 3), )
# Univariate sorting with periodic rebalancing data <- data.frame( permno = 1:500, date = rep(seq.Date(from = as.Date("2020-01-01"), by = "month", length.out = 100), each = 10), mktcap_lag = runif(500, 100, 1000), ret_excess = rnorm(500), size = runif(500, 50, 150) ) compute_portfolio_returns( data, "size", "univariate", breakpoint_options_main = breakpoint_options(n_portfolios = 5) ) # Bivariate dependent sorting with annual rebalancing compute_portfolio_returns( data, c("size", "mktcap_lag"), "bivariate-independent", 7, breakpoint_options_main = breakpoint_options(n_portfolios = 5), breakpoint_options_secondary = breakpoint_options(n_portfolios = 3), )
Computes a set of summary statistics for numeric and integer variables in a
data frame. This function allows users to select specific variables for
summarization and can calculate statistics for the whole dataset or within
groups specified by the by
argument. Additional detail levels for quantiles
can be included.
create_summary_statistics( data, ..., by = NULL, detail = FALSE, drop_na = FALSE )
create_summary_statistics( data, ..., by = NULL, detail = FALSE, drop_na = FALSE )
data |
A data frame containing the variables to be summarized. |
... |
Comma-separated list of unquoted variable names in the data frame to summarize. These variables must be either numeric, integer, or logical. |
by |
An optional unquoted variable name to group the data before
summarizing. If |
detail |
A logical flag indicating whether to compute detailed summary statistics including additional quantiles. Defaults to FALSE, which computes basic statistics (n, mean, sd, min, median, max). When TRUE, additional quantiles (1%, 5%, 10%, 25%, 75%, 90%, 95%, 99%) are computed. |
drop_na |
A logical flag indicating whether to drop missing values for each variabl (default is FALSE). |
The function first checks that all specified variables are of type numeric, integer, or logical. If any variables do not meet this criterion, the function stops and returns an error message indicating the non-conforming variables.
The basic set of summary statistics includes the count of non-NA values (n),
mean, standard deviation (sd), minimum (min), median (q50), and maximum
(max). If detail
is TRUE, the function also computes the 1st, 5th, 10th,
25th, 75th, 90th, 95th, and 99th percentiles.
Summary statistics are computed for each variable specified in ...
. If a
by
variable is provided, statistics are computed within each level of the
by
variable.
A tibble with summary statistics for each selected variable. If by
is specified, the output includes the grouping variable as well. Each row
represents a variable (and a group if by
is used), and columns include
the computed statistics.
Downloads the WRDS dummy database from the respective Tidy Finance GitHub repository and saves it to the specified path. If the file already exists, the user is prompted before it is replaced.
create_wrds_dummy_database(path)
create_wrds_dummy_database(path)
path |
The file path where the SQLite database should be saved. If not provided, the default path is "data/tidy_finance_r.sqlite". |
Invisible NULL
. Side effect: downloads a file to the specified path.
path <- paste0(tempdir(), "/tidy_finance_r.sqlite") create_wrds_dummy_database(path)
path <- paste0(tempdir(), "/tidy_finance_r.sqlite") create_wrds_dummy_database(path)
This function creates a list of data options used in financial data analysis,
specifically for TidyFinance-related functions. It allows users to specify
key parameters such as id
, date
, exchange
, mktcap_lag
, and ret_excess
along with other additional options passed through ...
.
data_options( id = "permno", date = "date", exchange = "exchange", mktcap_lag = "mktcap_lag", ret_excess = "ret_excess", portfolio = "portfolio", ... )
data_options( id = "permno", date = "date", exchange = "exchange", mktcap_lag = "mktcap_lag", ret_excess = "ret_excess", portfolio = "portfolio", ... )
id |
A character string representing the identifier variable (e.g., "permno"). |
date |
A character string representing the date variable (e.g., "date"). |
exchange |
A character string representing the exchange variable (e.g., "exchange"). |
mktcap_lag |
A character string representing the market capitalization lag variable (e.g., "mktcap_lag"). |
ret_excess |
A character string representing the excess return variable (e.g., "ret_excess"). |
portfolio |
A character string representing the portfolio variable (e.g., "portfolio"). |
... |
Additional arguments to be included in the data options list. |
A list of class tidyfinance_data_options
containing the specified data options.
data_options( id = "permno", date = "date", exchange = "exchange" )
data_options( id = "permno", date = "date", exchange = "exchange" )
This function safely disconnects an established database connection using the DBI package.
disconnection_connection(con)
disconnection_connection(con)
con |
A database connection object created by DBI::dbConnect or any similar function that establishes a connection to a database. |
A logical value: TRUE
if disconnection was successful, FALSE
otherwise.
Downloads and processes data based on the specified type (e.g., Fama-French factors, Global Q factors, or macro predictors), and date range. This function checks if the specified type is supported and then delegates to the appropriate function for downloading and processing the data.
download_data(type, start_date = NULL, end_date = NULL, ...)
download_data(type, start_date = NULL, end_date = NULL, ...)
type |
The type of dataset to download, indicating either factor data or macroeconomic predictors. |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, the full dataset or a subset is returned, dependening on the dataset type. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, the full dataset or a subset is returned, depending on the dataset type. |
... |
Additional arguments passed to specific download functions depending on the |
A tibble with processed data, including dates and the relevant financial metrics, filtered by the specified date range.
download_data("factors_ff_3_monthly", "2000-01-01", "2020-12-31") download_data("macro_predictors_monthly", "2000-01-01", "2020-12-31") download_data("constituents", index = "DAX") download_data("fred", series = c("GDP", "CPIAUCNS")) download_data("stock_prices", symbols = c("AAPL", "MSFT"))
download_data("factors_ff_3_monthly", "2000-01-01", "2020-12-31") download_data("macro_predictors_monthly", "2000-01-01", "2020-12-31") download_data("constituents", index = "DAX") download_data("fred", series = c("GDP", "CPIAUCNS")) download_data("stock_prices", symbols = c("AAPL", "MSFT"))
This function downloads and processes the constituent data for a specified financial index. The data is fetched from a remote CSV file, filtered, and cleaned to provide relevant information about constituents.
download_data_constituents(index)
download_data_constituents(index)
index |
A character string specifying the name of the financial index for which to download constituent data. The index must be one of the supported indexes listed by list_supported_indexes. |
The function retrieves the URL of the CSV file for the specified index from ETF sites, then sends an HTTP GET request to download the CSV file, and processes the CSV file to extract equity constituents.
The approach is inspired by tidyquant::tq_index()
, which uses a different wrapper around o
ther ETFs.
A tibble with two columns:
The ticker symbol of the equity constituent.
The name of the equity constituent.
The location where the company is based.
The exchange where the equity is traded.
The tibble is filtered to exclude non-equity entries, blacklisted symbols, empty names, and any entries containing the index name or "CASH".
download_data_constituents("DAX")
download_data_constituents("DAX")
Downloads and processes factor data based on the specified type (Fama-French or Global Q), and date range. This function delegates to specific functions based on the type of factors requested: Fama-French or Global Q. It checks if the specified type is supported before proceeding with the download and processing.
download_data_factors(type, start_date = NULL, end_date = NULL)
download_data_factors(type, start_date = NULL, end_date = NULL)
type |
The type of dataset to download, indicating the factor model and frequency. |
start_date |
The start date for filtering the data, in "YYYY-MM-DD" format. |
end_date |
The end date for filtering the data, in "YYYY-MM-DD" format. |
A tibble with processed factor data, including dates, risk-free rates, market excess returns, and other factors, filtered by the specified date range.
download_data_factors("factors_ff_3_monthly", "2000-01-01", "2020-12-31") download_data_factors("factors_ff_3_daily") download_data_factors("factors_q5_daily", "2020-01-01", "2020-12-31")
download_data_factors("factors_ff_3_monthly", "2000-01-01", "2020-12-31") download_data_factors("factors_ff_3_daily") download_data_factors("factors_q5_daily", "2020-01-01", "2020-12-31")
Downloads and processes Fama-French factor data based on the specified type (e.g., "factors_ff_3_monthly"), and date range. The function first checks if the specified type is supported and requires the 'frenchdata' package to download the data. It processes the raw data into a structured format, including date conversion, scaling factor values, and filtering by the specified date range.
download_data_factors_ff(type, start_date = NULL, end_date = NULL)
download_data_factors_ff(type, start_date = NULL, end_date = NULL)
type |
The type of dataset to download, corresponding to the specific Fama-French model and frequency. |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, the full dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, the full dataset is returned. |
If there are multiple tables in the raw Fama-French data (e.g., value-weighted
and equal-weighted returns), then the function only returns the first table
because these are the most popular. Please use the frenchdata
package
directly if you need less commonly used tables.
A tibble with processed factor data, including the date, risk-free rate, market excess return, and other factors, filtered by the specified date range.
download_data_factors_ff("factors_ff_3_monthly", "2000-01-01", "2020-12-31") download_data_factors_ff("factors_ff_10_industry_portfolios_monthly", "2000-01-01", "2020-12-31")
download_data_factors_ff("factors_ff_3_monthly", "2000-01-01", "2020-12-31") download_data_factors_ff("factors_ff_10_industry_portfolios_monthly", "2000-01-01", "2020-12-31")
Downloads and processes Global Q factor data based on the specified type (daily, monthly, etc.), date range, and source URL. The function first checks if the specified type is supported, identifies the dataset name from the supported types, then downloads and processes the data from the provided URL. The processing includes date conversion, renaming variables to a standardized format, scaling factor values, and filtering by the specified date range.
download_data_factors_q( type, start_date = NULL, end_date = NULL, url = "https://global-q.org/uploads/1/2/2/6/122679606/" )
download_data_factors_q( type, start_date = NULL, end_date = NULL, url = "https://global-q.org/uploads/1/2/2/6/122679606/" )
type |
The type of dataset to download (e.g., "factors_q5_daily", "factors_q5_monthly"). |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, the full dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, the full dataset is returned. |
url |
The base URL from which to download the dataset files, with a specific path for Global Q datasets. |
A tibble with processed factor data, including the date, risk-free rate, market excess return, and other factors, filtered by the specified date range.
download_data_factors_q("factors_q5_daily", "2020-01-01", "2020-12-31") download_data_factors_q("factors_q5_annual")
download_data_factors_q("factors_q5_daily", "2020-01-01", "2020-12-31") download_data_factors_q("factors_q5_annual")
This function downloads a specified data series from the Federal Reserve Economic Data (FRED) website, processes the data, and returns it as a tibble.
download_data_fred(series, start_date = NULL, end_date = NULL)
download_data_fred(series, start_date = NULL, end_date = NULL)
series |
A character vector specifying the FRED series ID to download. |
start_date |
The start date for filtering the data, in "YYYY-MM-DD" format. |
end_date |
The end date for filtering the data, in "YYYY-MM-DD" format. |
This function constructs the URL based on the provided FRED series ID, performs an HTTP GET request to download the data in CSV format, and processes it to a tidy tibble format. The resulting tibble includes the date, value, and the series ID.
This approach is inspired by quantmod::getSymbolsFRED()
which uses a different wrapper around
the same FRED download data site. If you want to systematically download FRED data via API,
please consider using fredr
package.
A tibble containing the processed data with three columns:
The date corresponding to the data point.
The value of the data series at that date.
The FRED series ID corresponding to the data.
download_data_fred("CPIAUCNS") download_data_fred(c("GDP", "CPIAUCNS"), "2010-01-01", "2010-12-31")
download_data_fred("CPIAUCNS") download_data_fred(c("GDP", "CPIAUCNS"), "2010-01-01", "2010-12-31")
Downloads and processes macroeconomic predictor data based on the specified type (monthly, quarterly, or annual), date range, and source URL. The function first checks if the specified type is supported, then downloads the data from the provided URL (defaulting to a Google Sheets export link). It processes the raw data into a structured format, calculating additional financial metrics and filtering by the specified date range.
download_data_macro_predictors( type, start_date = NULL, end_date = NULL, sheet_id = "1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG" )
download_data_macro_predictors( type, start_date = NULL, end_date = NULL, sheet_id = "1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG" )
type |
The type of dataset to download ("macro_predictors_monthly", "macro_predictors_quarterly", "macro_predictors_annual"). |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, the full dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, the full dataset is returned. |
sheet_id |
The Google Sheets ID from which to download the dataset, with the default "1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG". |
A tibble with processed data, filtered by the specified date range and including financial metrics.
macro_predictors_monthly <- download_data_macro_predictors("macro_predictors_monthly")
macro_predictors_monthly <- download_data_macro_predictors("macro_predictors_monthly")
This function downloads the data from Open Source Asset Pricing from Google Sheets using a specified sheet ID, processes the data by converting column names to snake_case, and optionally filters the data based on a provided date range.
download_data_osap( start_date = NULL, end_date = NULL, sheet_id = "1JyhcF5PRKHcputlioxlu5j5GyLo4JYyY" )
download_data_osap( start_date = NULL, end_date = NULL, sheet_id = "1JyhcF5PRKHcputlioxlu5j5GyLo4JYyY" )
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, the full dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, the full dataset is returned. |
sheet_id |
A character string representing the Google Sheet ID from which to download the data.
Default is |
A tibble containing the processed data. The column names are converted to snake_case,
and the data is filtered by the specified date range if start_date
and end_date
are provided.
osap_monthly <- download_data_osap(start_date = "2020-01-01", end_date = "2020-06-30")
osap_monthly <- download_data_osap(start_date = "2020-01-01", end_date = "2020-06-30")
Downloads historical stock data from Yahoo Finance for given symbols and date range.
download_data_stock_prices(symbols, start_date = NULL, end_date = NULL)
download_data_stock_prices(symbols, start_date = NULL, end_date = NULL)
symbols |
A character vector of stock symbols to download data for. At least one symbol must be provided. |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, a subset of the dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, a subset of the dataset is returned. |
A tibble containing the downloaded stock data with columns: symbol, date, volume, open, low, high, close, and adjusted_close.
download_data_stock_prices(c("AAPL", "MSFT")) download_data_stock_prices("GOOGL", "2021-01-01", "2022-01-01" )
download_data_stock_prices(c("AAPL", "MSFT")) download_data_stock_prices("GOOGL", "2021-01-01", "2022-01-01" )
This function acts as a wrapper to download data from various WRDS datasets including CRSP, Compustat, and CCM links based on the specified type. It is designed to handle different data types by redirecting to the appropriate specific data download function.
download_data_wrds(type, start_date = NULL, end_date = NULL, ...)
download_data_wrds(type, start_date = NULL, end_date = NULL, ...)
type |
A string specifying the type of data to download. It should match one of the predefined patterns to indicate the dataset: "wrds_crsp" for CRSP data, "wrds_compustat" for Compustat data, or "wrds_ccm_links" for CCM links data. |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, a subset of the dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, a subste of the dataset is returned. |
... |
Additional arguments passed to specific download functions depending on the |
A data frame containing the requested data, with the structure and
contents depending on the specified type
.
crsp_monthly <- download_data_wrds("wrds_crsp_monthly", "2020-01-01", "2020-12-31") compustat_annual <- download_data_wrds("wrds_compustat_annual", "2020-01-01", "2020-12-31") ccm_links <- download_data_wrds("wrds_ccm_links", "2020-01-01", "2020-12-31") fisd <- download_data_wrds("wrds_fisd") trace_enhanced <- download_data_wrds("wrds_trace_enhanced", cusips = "00101JAH9")
crsp_monthly <- download_data_wrds("wrds_crsp_monthly", "2020-01-01", "2020-12-31") compustat_annual <- download_data_wrds("wrds_compustat_annual", "2020-01-01", "2020-12-31") ccm_links <- download_data_wrds("wrds_ccm_links", "2020-01-01", "2020-12-31") fisd <- download_data_wrds("wrds_fisd") trace_enhanced <- download_data_wrds("wrds_trace_enhanced", cusips = "00101JAH9")
This function downloads data from the WRDS CRSP/Compustat Merged (CCM) links
database. It allows users to specify the type of links (linktype
) and the
primacy of the link (linkprim
).
download_data_wrds_ccm_links(linktype = c("LU", "LC"), linkprim = c("P", "C"))
download_data_wrds_ccm_links(linktype = c("LU", "LC"), linkprim = c("P", "C"))
linktype |
A character vector indicating the type of link to download.
The default is |
linkprim |
A character vector indicating the primacy of the link.
Default is |
A data frame with the columns permno
, gvkey
, linkdt
, and
linkenddt
, where linkenddt
is the end date of the link, and missing end
dates are replaced with today's date.
ccm_links <- download_data_wrds_ccm_links(linktype = "LU", linkprim = "P")
ccm_links <- download_data_wrds_ccm_links(linktype = "LU", linkprim = "P")
This function downloads financial data from the WRDS Compustat database for a given type of financial data, start date, and end date. It filters the data according to industry format, data format, and consolidation level, and returns the most current data for each reporting period. Additionally, the annual data also includes the calculated calculates book equity (be), operating profitability (op), and investment (inv) for each company.
download_data_wrds_compustat( type, start_date = NULL, end_date = NULL, additional_columns = NULL )
download_data_wrds_compustat( type, start_date = NULL, end_date = NULL, additional_columns = NULL )
type |
The type of financial data to download. |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, a subset of the dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, a subset of the dataset is returned. |
additional_columns |
Additional columns from the Compustat table as a character vector. |
A data frame with financial data for the specified period, including variables for book equity (be), operating profitability (op), investment (inv), and others.
download_data_wrds_compustat("wrds_compustat_annual", "2020-01-01", "2020-12-31") download_data_wrds_compustat("wrds_compustat_quarterly", "2020-01-01", "2020-12-31") # Add additional columns download_data_wrds_compustat("wrds_compustat_annual", additional_columns = c("aodo", "aldo"))
download_data_wrds_compustat("wrds_compustat_annual", "2020-01-01", "2020-12-31") download_data_wrds_compustat("wrds_compustat_quarterly", "2020-01-01", "2020-12-31") # Add additional columns download_data_wrds_compustat("wrds_compustat_annual", additional_columns = c("aodo", "aldo"))
This function downloads and processes stock return data from the CRSP database for a specified period. Users can choose between monthly and daily data types. The function also adjusts returns for delisting and calculates market capitalization and excess returns over the risk-free rate.
download_data_wrds_crsp( type, start_date = NULL, end_date = NULL, batch_size = 500, version = "v2", additional_columns = NULL )
download_data_wrds_crsp( type, start_date = NULL, end_date = NULL, batch_size = 500, version = "v2", additional_columns = NULL )
type |
A string specifying the type of CRSP data to download: "crsp_monthly" or "crsp_daily". |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, a subset of the dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, a subset of the dataset is returned. |
batch_size |
An optional integer specifying the batch size for processing daily data, with a default of 500. |
version |
An optional character specifying which CRSP version to use. "v2" (the default) uses the updated second version of CRSP, and "v1" downloads the legacy version of CRSP. |
additional_columns |
Additional columns from the CRSP monthly or daily data as a character vector. |
A data frame containing CRSP stock returns, adjusted for delistings, along with calculated market capitalization and excess returns over the risk-free rate. The structure of the returned data frame depends on the selected data type.
crsp_monthly <- download_data_wrds_crsp("wrds_crsp_monthly", "2020-11-01", "2020-12-31") crsp_daily <- download_data_wrds_crsp("wrds_crsp_daily", "2020-12-01", "2020-12-31") # Add additional columns download_data_wrds_crsp("wrds_crsp_monthly", "2020-11-01", "2020-12-31", additional_columns = c("mthvol", "mthvolflg"))
crsp_monthly <- download_data_wrds_crsp("wrds_crsp_monthly", "2020-11-01", "2020-12-31") crsp_daily <- download_data_wrds_crsp("wrds_crsp_daily", "2020-12-01", "2020-12-31") # Add additional columns download_data_wrds_crsp("wrds_crsp_monthly", "2020-11-01", "2020-12-31", additional_columns = c("mthvol", "mthvolflg"))
Establishes a connection to the WRDS database to download a filtered subset
of the FISD (Fixed Income Securities Database). The function filters the
fisd_mergedissue
and fisd_mergedissuer
tables based on several criteria
related to the securities, such as security level, bond type, coupon type,
and others, focusing on specific attributes that denote the nature of the
securities. It finally returns a data frame with selected fields from the
fisd_mergedissue
table after joining it with issuer information from the
fisd_mergedissuer
table for issuers domiciled in the USA.
download_data_wrds_fisd(additional_columns = NULL)
download_data_wrds_fisd(additional_columns = NULL)
additional_columns |
Additional columns from the FISD table as a character vector. |
A data frame containing a subset of FISD data with fields related to the bond's characteristics and issuer information. This includes complete CUSIP, maturity date, offering amount, offering date, dated date, interest frequency, coupon, last interest date, issue ID, issuer ID, SIC code of the issuer.
fisd <- download_data_wrds_fisd() fisd_extended <- download_data_wrds_fisd(additional_columns = c("asset_backed", "defeased"))
fisd <- download_data_wrds_fisd() fisd_extended <- download_data_wrds_fisd(additional_columns = c("asset_backed", "defeased"))
Establishes a connection to the WRDS database to download the specified CUSIPs trade messages from the Trade Reporting and Compliance Engine (TRACE). The trade data is cleaned as suggested by Dick-Nielsen (2009, 2014).
download_data_wrds_trace_enhanced(cusips, start_date = NULL, end_date = NULL)
download_data_wrds_trace_enhanced(cusips, start_date = NULL, end_date = NULL)
cusips |
A character vector specifying the 9-digit CUSIPs to download. |
start_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the start date for the data. If not provided, a subset of the dataset is returned. |
end_date |
Optional. A character string or Date object in "YYYY-MM-DD" format specifying the end date for the data. If not provided, a subset of the dataset is returned. |
A data frame containing the cleaned trade messages from TRACE for the selected CUSIPs over the time window specified. Output variables include identifying information (i.e., CUSIP, trade date/time) and trade-specific information (i.e., price/yield, volume, counterparty, and reporting side).
trace_enhanced <- download_data_wrds_trace_enhanced("00101JAH9", "2019-01-01", "2021-12-31")
trace_enhanced <- download_data_wrds_trace_enhanced("00101JAH9", "2019-01-01", "2021-12-31")
This function estimates rolling betas for a given model using the provided data.
It supports parallel processing for faster computation using the furrr
package.
estimate_betas( data, model, lookback, min_obs = NULL, use_furrr = FALSE, data_options = NULL )
estimate_betas( data, model, lookback, min_obs = NULL, use_furrr = FALSE, data_options = NULL )
data |
A tibble containing the data with a date identifier (defaults to |
model |
A formula representing the model to be estimated (e.g.,
|
lookback |
A Period object specifying the number of months, days, hours, minutes, or seconds to look back when estimating the rolling model. |
min_obs |
An integer specifying the minimum number of observations required to estimate
the model. Defaults to 80% of |
use_furrr |
A logical indicating whether to use the |
data_options |
A named list of data_options with characters, indicating the column
names required to run this function. The required column names identify dates and the stocks.
Defaults to |
A tibble with the estimated betas for each time period.
# Estimate monthly betas using monthly return data set.seed(1234) data_monthly <- tibble::tibble( date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-01"), by = "month"), each = 50), permno = rep(1:50, times = 12), ret_excess = rnorm(600, 0, 0.1), mkt_excess = rnorm(600, 0, 0.1), smb = rnorm(600, 0, 0.1), hml = rnorm(600, 0, 0.1), ) estimate_betas(data_monthly, "ret_excess ~ mkt_excess", months(3)) estimate_betas(data_monthly, "ret_excess ~ mkt_excess + smb + hml", months(6)) data_monthly |> dplyr::rename(id = permno) |> estimate_betas("ret_excess ~ mkt_excess", months(3), data_options = data_options(id = "id")) # Estimate monthly betas using daily return data and parallelization data_daily <- tibble::tibble( date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "day"), each = 50), permno = rep(1:50, times = 366), ret_excess = rnorm(18300, 0, 0.02), mkt_excess = rnorm(18300, 0, 0.02), smb = rnorm(18300, 0, 0.02), hml = rnorm(18300, 0, 0.02), ) data_daily <- data_daily |> dplyr::mutate(date = lubridate::floor_date(date, "month")) # Change settings via future::plan(strategy = "multisession", workers = 4) estimate_betas(data_daily, "ret_excess ~ mkt_excess", lubridate::days(90), use_furrr = TRUE)
# Estimate monthly betas using monthly return data set.seed(1234) data_monthly <- tibble::tibble( date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-01"), by = "month"), each = 50), permno = rep(1:50, times = 12), ret_excess = rnorm(600, 0, 0.1), mkt_excess = rnorm(600, 0, 0.1), smb = rnorm(600, 0, 0.1), hml = rnorm(600, 0, 0.1), ) estimate_betas(data_monthly, "ret_excess ~ mkt_excess", months(3)) estimate_betas(data_monthly, "ret_excess ~ mkt_excess + smb + hml", months(6)) data_monthly |> dplyr::rename(id = permno) |> estimate_betas("ret_excess ~ mkt_excess", months(3), data_options = data_options(id = "id")) # Estimate monthly betas using daily return data and parallelization data_daily <- tibble::tibble( date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "day"), each = 50), permno = rep(1:50, times = 366), ret_excess = rnorm(18300, 0, 0.02), mkt_excess = rnorm(18300, 0, 0.02), smb = rnorm(18300, 0, 0.02), hml = rnorm(18300, 0, 0.02), ) data_daily <- data_daily |> dplyr::mutate(date = lubridate::floor_date(date, "month")) # Change settings via future::plan(strategy = "multisession", workers = 4) estimate_betas(data_daily, "ret_excess ~ mkt_excess", lubridate::days(90), use_furrr = TRUE)
This function estimates Fama-MacBeth regressions by first running cross-sectional regressions for each time period and then aggregating the results over time to obtain average risk premia and corresponding t-statistics.
estimate_fama_macbeth( data, model, vcov = "newey-west", vcov_options = NULL, data_options = NULL )
estimate_fama_macbeth( data, model, vcov = "newey-west", vcov_options = NULL, data_options = NULL )
data |
A data frame containing the data for the regression. It must include a column
representing the time periods (defaults to |
model |
A formula representing the regression model to be estimated in each cross-section. |
vcov |
A character string indicating the type of standard errors to compute. Options are
|
vcov_options |
A list of additional arguments to be passed to the
|
data_options |
A named list of data_options with characters, indicating the column
names required to run this function. The required column names identify dates. Defaults to
|
A data frame with the estimated risk premiums, the number of observations, standard errors, and t-statistics for each factor in the model.
set.seed(1234) data <- tibble::tibble( date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-01"), by = "month"), each = 50), permno = rep(1:50, times = 12), ret_excess = rnorm(600, 0, 0.1), beta = rnorm(600, 1, 0.2), bm = rnorm(600, 0.5, 0.1), log_mktcap = rnorm(600, 10, 1) ) estimate_fama_macbeth(data, "ret_excess ~ beta + bm + log_mktcap") estimate_fama_macbeth(data, "ret_excess ~ beta + bm + log_mktcap", vcov = "iid") estimate_fama_macbeth(data, "ret_excess ~ beta + bm + log_mktcap", vcov = "newey-west", vcov_options = list(lag = 6, prewhite = FALSE)) # Use different column name for date data |> dplyr::rename(month = date) |> estimate_fama_macbeth( "ret_excess ~ beta + bm + log_mktcap", data_options = data_options(date = "month") )
set.seed(1234) data <- tibble::tibble( date = rep(seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-01"), by = "month"), each = 50), permno = rep(1:50, times = 12), ret_excess = rnorm(600, 0, 0.1), beta = rnorm(600, 1, 0.2), bm = rnorm(600, 0.5, 0.1), log_mktcap = rnorm(600, 10, 1) ) estimate_fama_macbeth(data, "ret_excess ~ beta + bm + log_mktcap") estimate_fama_macbeth(data, "ret_excess ~ beta + bm + log_mktcap", vcov = "iid") estimate_fama_macbeth(data, "ret_excess ~ beta + bm + log_mktcap", vcov = "newey-west", vcov_options = list(lag = 6, prewhite = FALSE)) # Use different column name for date data |> dplyr::rename(month = date) |> estimate_fama_macbeth( "ret_excess ~ beta + bm + log_mktcap", data_options = data_options(date = "month") )
This function estimates the coefficients of a linear model specified by one or more independent variables. It checks for the presence of the specified independent variables in the dataset and whether the dataset has a sufficient number of observations. It returns the model's coefficients as either a numeric value (for a single independent variable) or a data frame (for multiple independent variables).
estimate_model(data, model, min_obs = 1)
estimate_model(data, model, min_obs = 1)
data |
A data frame containing the dependent variable and one or more independent variables. |
model |
A character that describes the model to estimate (e.g.
|
min_obs |
The minimum number of observations required to estimate the model. Defaults to 1. |
A data frame with a row for each coefficient and column names corresponding to the independent variables.
stats::lm()
for details on the underlying linear model fitting used.
data <- data.frame( ret_excess = rnorm(100), mkt_excess = rnorm(100), smb = rnorm(100), hml = rnorm(100) ) # Estimate model with a single independent variable estimate_model(data, "ret_excess ~ mkt_excess") # Estimate model with multiple independent variables estimate_model(data, "ret_excess ~ mkt_excess + smb + hml") # Estimate model without intercept estimate_model(data, "ret_excess ~ mkt_excess - 1")
data <- data.frame( ret_excess = rnorm(100), mkt_excess = rnorm(100), smb = rnorm(100), hml = rnorm(100) ) # Estimate model with a single independent variable estimate_model(data, "ret_excess ~ mkt_excess") # Estimate model with multiple independent variables estimate_model(data, "ret_excess ~ mkt_excess + smb + hml") # Estimate model without intercept estimate_model(data, "ret_excess ~ mkt_excess - 1")
This internal function selects and returns a random user agent string from a predefined list. The list contains user agents for various operating systems and browsers, including Windows, macOS, Linux, Android, iPhone, Chrome, Safari, Firefox, and Edge.
get_random_user_agent()
get_random_user_agent()
A character string representing a randomly selected user agent.
This function establishes a connection to the Wharton Research Data Services
(WRDS) database using the RPostgres
package. It requires that the
RPostgres
package is installed and that valid WRDS credentials are set as
environment variables.
get_wrds_connection()
get_wrds_connection()
The function checks if the RPostgres
package is installed before
attempting to establish a connection. It uses the host, dbname, port, and
sslmode as fixed parameters for the connection. Users must set their WRDS
username and password as environment variables WRDS_USER
and
WRDS_PASSWORD
, respectively, before using this function.
An object of class DBIConnection
representing the connection to the
WRDS database. This object can be used with other DBI-compliant functions
to interact with the database.
Postgres
, dbDisconnect
for more
information on managing database connections.
# Before using this function, set your WRDS credentials: # Sys.setenv(WRDS_USER = "your_username", WRDS_PASSWORD = "your_password") con <- get_wrds_connection() # Use `con` with DBI-compliant functions to interact with the WRDS database # Remember to disconnect after use: # disconnect_connection(con)
# Before using this function, set your WRDS credentials: # Sys.setenv(WRDS_USER = "your_username", WRDS_PASSWORD = "your_password") con <- get_wrds_connection() # Use `con` with DBI-compliant functions to interact with the WRDS database # Remember to disconnect after use: # disconnect_connection(con)
This function generates a lagged version of a given column based on a date variable, with the
ability to specify a range of lags. It also allows for the optional removal of NA
values.
lag_column(column, date, lag, max_lag = lag, drop_na = TRUE)
lag_column(column, date, lag, max_lag = lag, drop_na = TRUE)
column |
A numeric vector or column to be lagged. |
date |
A vector representing dates corresponding to the |
lag |
An integer specifying the minimum lag (in days, hours, etc.) to apply to |
max_lag |
An integer specifying the maximum lag (in days, hours, etc.) to apply to |
drop_na |
A logical value indicating whether to drop |
A vector of the same length as column
, containing the lagged values.
If no matching dates are found within the lag window, NA
is returned for that position.
# Basic example with a vector dates <- as.Date("2023-01-01") + 0:9 values <- rnorm(10) lagged_values <- lag_column(values, dates, lag = 1, max_lag = 3) # Example using a tibble and dplyr::group_by data <- tibble::tibble( permno = rep(1:2, each = 10), date = rep(seq.Date(as.Date('2023-01-01'), by = "month", length.out = 10), 2), size = runif(20, 100, 200), bm = runif(20, 0.5, 1.5) ) data |> dplyr::group_by(permno) |> dplyr::mutate( across(c(size, bm), \(x) lag_column(x, date, months(3), months(6), drop_na = TRUE)) ) |> dplyr::ungroup()
# Basic example with a vector dates <- as.Date("2023-01-01") + 0:9 values <- rnorm(10) lagged_values <- lag_column(values, dates, lag = 1, max_lag = 3) # Example using a tibble and dplyr::group_by data <- tibble::tibble( permno = rep(1:2, each = 10), date = rep(seq.Date(as.Date('2023-01-01'), by = "month", length.out = 10), 2), size = runif(20, 100, 200), bm = runif(20, 0.5, 1.5) ) data |> dplyr::group_by(permno) |> dplyr::mutate( across(c(size, bm), \(x) lag_column(x, date, months(3), months(6), drop_na = TRUE)) ) |> dplyr::ungroup()
This function returns a tibble containing information about supported financial indexes.
Each index is associated with a URL that points to a CSV file containing the holdings of the index.
Additionally, each index has a corresponding skip
value, which indicates the number of lines
to skip when reading the CSV file.
list_supported_indexes()
list_supported_indexes()
A tibble with three columns:
The name of the financial index (e.g., "DAX", "S&P 500").
The URL to the CSV file containing the holdings data for the index.
The number of lines to skip when reading the CSV file.
supported_indexes <- list_supported_indexes() print(supported_indexes)
supported_indexes <- list_supported_indexes() print(supported_indexes)
This function aggregates and returns a comprehensive tibble of all supported dataset types from different domains. It includes various datasets across different frequencies (daily, weekly, monthly, quarterly, annual) and models (e.g., q5 factors, Fama-French 3 and 5 factors, macro predictors).
list_supported_types(domain = NULL, as_vector = FALSE)
list_supported_types(domain = NULL, as_vector = FALSE)
domain |
A character vector to filter for domain specific types (e.g. c("WRDS", "Fama-French")) |
as_vector |
Logical indicating whether types should be returned as a character vector instead of data frame. |
A tibble aggregating all supported dataset types with columns: type
(the type of dataset), dataset_name
(a descriptive name or file name of
the dataset), and domain
(the domain to which the dataset belongs, e.g.,
"Global Q", "Fama-French", "Goyal-Welch").
# List all supported types as a data frame list_supported_types() # Filter by domain list_supported_types(domain = "WRDS") # List supported types as a vector list_supported_types(as_vector = TRUE)
# List all supported types as a data frame list_supported_types() # Filter by domain list_supported_types(domain = "WRDS") # List supported types as a vector list_supported_types(as_vector = TRUE)
This function returns a tibble with the supported Fama-French dataset types, including their names and frequencies (daily, weekly, monthly). Each dataset type is associated with a specific Fama-French model (e.g., 3 factors, 5 factors). Additionally, it annotates each dataset with the domain "Fama-French".
list_supported_types_ff()
list_supported_types_ff()
A tibble with columns: type
(the type of dataset), dataset_name
(a descriptive name of the dataset), and domain
(the domain to which the
dataset belongs, always "Fama-French").
This function returns a tibble with the legacy names of initially supported
Fama-French dataset types, including their names and frequencies (daily, weekly, monthly).
Each dataset type is associated with a specific Fama-French model (e.g., 3 factors, 5
factors). Additionally, it annotates each dataset with the domain "Fama-French".
Not included in the exported list_supported_types()
function.
list_supported_types_ff_legacy()
list_supported_types_ff_legacy()
A tibble with columns: type
(the type of dataset), dataset_name
(a descriptive name of the dataset), and domain
(the domain to which the
dataset belongs, always "Fama-French").
This function returns a tibble with the supported macro predictor dataset types provided by Goyal-Welch, including their frequencies (monthly, quarterly, annual). All dataset types reference the same source file "PredictorData2022.xlsx" for the year 2022. Additionally, it annotates each dataset with the domain "Goyal-Welch".
list_supported_types_macro_predictors()
list_supported_types_macro_predictors()
A tibble with columns: type
(the type of dataset), dataset_name
(the file name of the dataset, which is the same for all types), and
domain
(the domain to which the dataset belongs, always "Goyal-Welch").
Returns a tibble listing the supported other data types and their corresponding dataset names.
list_supported_types_other()
list_supported_types_other()
A tibble with columns type
and dataset_name
, where type
indicates the code used to specify the data source and dataset_name
provides the name of the data source.
This function returns a tibble with the supported Global Q dataset types, including their names and frequencies (daily, weekly, weekly week-to-week, monthly, quarterly, annual). Each dataset type is associated with the Global Q model, specifically the q5 factors model for the year 2023. Additionally, it annotates each dataset with the domain "Global Q".
list_supported_types_q()
list_supported_types_q()
A tibble with columns: type
(the type of dataset), dataset_name
(the file name of the dataset), and domain
(the domain to which the
dataset belongs, always "Global Q").
This function returns a tibble with the supported dataset types provided via WRDS. Additionally, it annotates each dataset with the domain "WRDS".
list_supported_types_wrds()
list_supported_types_wrds()
A tibble with columns: type
(the type of dataset), dataset_name
(the file name of the dataset), and domain
(the domain to which the
dataset belongs, always "WRDS").
Returns a character vector containing the names of the chapters available in the Tidy Finance resource. This function provides a quick reference to the various topics covered.
list_tidy_finance_chapters()
list_tidy_finance_chapters()
A character vector where each element is the name of a chapter available in the Tidy Finance resource. These names correspond to specific chapters in Tidy Finance with R.
list_tidy_finance_chapters()
list_tidy_finance_chapters()
Opens the main Tidy Finance website or a specific chapter within the site in the user's default web browser. If a chapter is specified, the function constructs the URL to access the chapter directly.
open_tidy_finance_website(chapter = NULL)
open_tidy_finance_website(chapter = NULL)
chapter |
An optional character string specifying the chapter to open.
If |
Invisible NULL
. The function is called for its side effect of opening
a web page.
open_tidy_finance_website() open_tidy_finance_website("beta-estimation")
open_tidy_finance_website() open_tidy_finance_website("beta-estimation")
This function prompts the user to input their WRDS (Wharton Research Data Services) username and password, and stores these credentials in a .Renviron file. The user can choose to store the .Renviron file in either the project directory or the home directory. If the .Renviron file already contains WRDS credentials, the user will be asked if they want to overwrite the existing credentials. Additionally, the user has the option to add the .Renviron file to the .gitignore file to prevent it from being tracked by version control.
set_wrds_credentials()
set_wrds_credentials()
Invisibly returns TRUE. Displays messages to the user based on their input and actions taken.
## Not run: set_wrds_credentials() ## End(Not run)
## Not run: set_wrds_credentials() ## End(Not run)
Removes the values in a numeric vector that are beyond the specified
quantiles, effectively trimming the distribution based on the cut
parameter. This process reduces the length of the vector, excluding extreme
values from both tails of the distribution.
trim(x, cut)
trim(x, cut)
x |
A numeric vector to be trimmed. |
cut |
The proportion of data to be trimmed from both ends of the
distribution. For example, a |
A numeric vector with the extreme values removed.
set.seed(123) data <- rnorm(100) trimmed_data <- trim(x = data, cut = 0.05)
set.seed(123) data <- rnorm(100) trimmed_data <- trim(x = data, cut = 0.05)
Replaces the values in a numeric vector that are beyond the specified
quantiles with the boundary values of those quantiles. This is done for both
tails of the distribution based on the cut
parameter.
winsorize(x, cut)
winsorize(x, cut)
x |
A numeric vector to be winsorized. |
cut |
The proportion of data to be winsorized from both ends of the
distribution. For example, a |
A numeric vector with the extreme values replaced by the corresponding quantile values.
set.seed(123) data <- rnorm(100) winsorized_data <- winsorize(data, 0.05)
set.seed(123) data <- rnorm(100) winsorized_data <- winsorize(data, 0.05)