Spline-based representation of functional data

Represent curves as a weighted sum of spline basis functions.

Usage

tfb_spline(data, ...)

# S3 method for data.frame
tfb_spline(
  data,
  id = 1,
  arg = 2,
  value = 3,
  domain = NULL,
  penalized = TRUE,
  global = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for matrix
tfb_spline(
  data,
  arg = NULL,
  domain = NULL,
  penalized = TRUE,
  global = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for numeric
tfb_spline(
  data,
  arg = NULL,
  domain = NULL,
  penalized = TRUE,
  global = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for list
tfb_spline(
  data,
  arg = NULL,
  domain = NULL,
  penalized = TRUE,
  global = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for tfd
tfb_spline(
  data,
  arg = NULL,
  domain = NULL,
  penalized = TRUE,
  global = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for tfb
tfb_spline(
  data,
  arg = NULL,
  domain = NULL,
  penalized = TRUE,
  global = FALSE,
  verbose = TRUE,
  ...
)

# S3 method for default
tfb_spline(
  data,
  arg = NULL,
  domain = NULL,
  penalized = TRUE,
  global = FALSE,
  verbose = TRUE,
  ...
)

Arguments

data: a matrix, data.frame or list of suitable shape, or another tf-object containing functional data.
...: arguments to the calls to mgcv::s() setting up the basis (and to mgcv::magic() or mgcv::gam.fit() if penalized = TRUE). Uses k = 25 cubic regression spline basis functions (bs = "cr") by default, but should be set appropriately by the user. See Details and examples in the vignettes.
id: The name or number of the column defining which data belong to which function.
arg: numeric, or list of numerics. The evaluation grid. For the data.frame-method: the name/number of the column defining the evaluation grid. The matrix method will try to guess suitable arg-values from the column names of data if arg is not supplied. Other methods fall back on integer sequences (1:<length of data>) as the default if not provided.
value: The name or number of the column containing the function evaluations.
domain: range of the arg.
penalized: TRUE (default) estimates regularized/penalized basis coefficients via mgcv::magic() or mgcv::gam.fit(), FALSE yields ordinary least squares / ML estimates for basis coefficients. FALSE is much faster but will overfit for noisy data if k is (too) large.
global: Defaults to FALSE. If TRUE and penalized = TRUE, all functions share the same smoothing parameter (see Details).
verbose: TRUE (default) outputs statistics about the fit achieved by the basis and other diagnostic messages.

Value

a tfb-object

Details

The basis to be used is set up via a call to mgcv::s() and all the spline bases discussed in mgcv::smooth.terms() are available, in principle. Depending on the value of the penalized- and global-flags, the coefficient vectors for each observation are then estimated via fitting a GAM (separately for each observation, if !global) via mgcv::magic() (least square error, the default) or mgcv::gam() (if a family argument was supplied) or unpenalized least squares / maximum likelihood.

After the "smoothed" representation is computed, the amount of smoothing that was performed is reported in terms of the "percentage of variability preserved", which is the variance (or the explained deviance, in the general case if family was specified) of the smoothed function values divided by the variance of the original values (the null deviance, in the general case). Reporting can be switched off with verbose = FALSE.

The ... arguments supplies arguments to both the spline basis (via mgcv::s()) and the estimation (via mgcv::magic() or mgcv::gam()), the most important arguments are:

k: how many basis functions should the spline basis use, default is 25.
bs: which type of spline basis should be used, the default is cubic regression splines (bs = "cr")
family argument: use this if minimizing squared errors is not a reasonable criterion for the representation accuracy (see mgcv::family.mgcv() for what's available) and/or if function values are restricted to be e.g. positive (family = Gamma()/tw()/...), in \([0,1]\) (family = betar()), etc.
sp: numeric value for the smoothness penalty weight, for manually setting the amount of smoothing for all curves, see mgcv::s(). This (drastically) reduces computation time. Defaults to -1, i.e., automatic optimization of sp using mgcv::magic() (LS fits) or mgcv::gam() (GLM), source code in R/tfb-spline-utils.R.

If global == TRUE, this uses a small subset of curves (10% of curves, at least 5, at most 100; non-random sample using every j-th curve in the data) on which smoothing parameters per curve are estimated and then takes the mean of the log smoothing parameter of those as sp for all curves. This is much faster than optimizing for each curve on large data sets. For very sparse or noisy curves, estimating a common smoothing parameter based on the data for all curves simultaneously is likely to yield better results, this is not what's implemented here.

Methods (by class)

tfb_spline(data.frame): convert data frames
tfb_spline(matrix): convert matrices
tfb_spline(numeric): convert matrices
tfb_spline(list): convert lists
tfb_spline(tfd): convert tfd (raw functional data)
tfb_spline(tfb): convert tfb: modify basis representation, smoothing.
tfb_spline(default): convert tfb: default method, returning prototype when data is missing