Represent curves as a weighted sum of spline basis functions.
Usage
tfb_spline(data, ...)
# S3 method for data.frame
tfb_spline(
data,
id = 1,
arg = 2,
value = 3,
domain = NULL,
penalized = TRUE,
global = FALSE,
verbose = TRUE,
...
)
# S3 method for matrix
tfb_spline(
data,
arg = NULL,
domain = NULL,
penalized = TRUE,
global = FALSE,
verbose = TRUE,
...
)
# S3 method for numeric
tfb_spline(
data,
arg = NULL,
domain = NULL,
penalized = TRUE,
global = FALSE,
verbose = TRUE,
...
)
# S3 method for list
tfb_spline(
data,
arg = NULL,
domain = NULL,
penalized = TRUE,
global = FALSE,
verbose = TRUE,
...
)
# S3 method for tfd
tfb_spline(
data,
arg = NULL,
domain = NULL,
penalized = TRUE,
global = FALSE,
verbose = TRUE,
...
)
# S3 method for tfb
tfb_spline(
data,
arg = NULL,
domain = NULL,
penalized = TRUE,
global = FALSE,
verbose = TRUE,
...
)
# S3 method for default
tfb_spline(
data,
arg = NULL,
domain = NULL,
penalized = TRUE,
global = FALSE,
verbose = TRUE,
...
)
Arguments
- data
a
matrix
,data.frame
orlist
of suitable shape, or anothertf
-object containing functional data.- ...
arguments to the calls to
mgcv::s()
setting up the basis (and tomgcv::magic()
ormgcv::gam.fit()
ifpenalized = TRUE
). Usesk = 25
cubic regression spline basis functions (bs = "cr"
) by default, but should be set appropriately by the user. See Details and examples in the vignettes.- id
The name or number of the column defining which data belong to which function.
- arg
numeric
, or list ofnumeric
s. The evaluation grid. For thedata.frame
-method: the name/number of the column defining the evaluation grid. Thematrix
method will try to guess suitablearg
-values from the column names ofdata
ifarg
is not supplied. Other methods fall back on integer sequences (1:<length of data>
) as the default if not provided.- value
The name or number of the column containing the function evaluations.
- domain
range of the
arg
.- penalized
TRUE
(default) estimates regularized/penalized basis coefficients viamgcv::magic()
ormgcv::gam.fit()
,FALSE
yields ordinary least squares / ML estimates for basis coefficients.FALSE
is much faster but will overfit for noisy data ifk
is (too) large.- global
Defaults to
FALSE
. IfTRUE
andpenalized = TRUE
, all functions share the same smoothing parameter (see Details).- verbose
TRUE
(default) outputs statistics about the fit achieved by the basis and other diagnostic messages.
Details
The basis to be used is set up via a call to mgcv::s()
and all the spline
bases discussed in mgcv::smooth.terms()
are available, in principle.
Depending on the value of the penalized
- and global
-flags, the
coefficient vectors for each observation are then estimated via fitting a GAM
(separately for each observation, if !global
) via mgcv::magic()
(least
square error, the default) or mgcv::gam()
(if a family
argument was
supplied) or unpenalized least squares / maximum likelihood.
After the "smoothed" representation is computed, the amount of smoothing that
was performed is reported in terms of the "percentage of variability
preserved", which is the variance (or the explained deviance, in the general
case if family
was specified) of the smoothed function values divided by the variance of the original
values (the null deviance, in the general case). Reporting can be switched off
with verbose = FALSE
.
The ...
arguments supplies arguments to both the
spline basis (via mgcv::s()
) and the estimation (via
mgcv::magic()
or mgcv::gam()
), the most important arguments are:
k
: how many basis functions should the spline basis use, default is 25.bs
: which type of spline basis should be used, the default is cubic regression splines (bs = "cr"
)family
argument: use this if minimizing squared errors is not a reasonable criterion for the representation accuracy (seemgcv::family.mgcv()
for what's available) and/or if function values are restricted to be e.g. positive (family = Gamma()/tw()/...
), in \([0,1]\) (family = betar()
), etc.sp
: numeric value for the smoothness penalty weight, for manually setting the amount of smoothing for all curves, seemgcv::s()
. This (drastically) reduces computation time. Defaults to-1
, i.e., automatic optimization ofsp
usingmgcv::magic()
(LS fits) ormgcv::gam()
(GLM), source code inR/tfb-spline-utils.R
.
If global == TRUE
, this uses a small subset of curves (10%
of curves,
at least 5, at most 100; non-random sample using every j-th curve in the
data) on which smoothing parameters per curve are estimated and then takes
the mean of the log smoothing parameter of those as sp
for all curves. This
is much faster than optimizing for each curve on large data sets. For very
sparse or noisy curves, estimating a common smoothing parameter based on the
data for all curves simultaneously is likely to yield better results, this is
not what's implemented here.
Methods (by class)
tfb_spline(data.frame)
: convert data framestfb_spline(matrix)
: convert matricestfb_spline(numeric)
: convert matricestfb_spline(list)
: convert liststfb_spline(tfd)
: converttfd
(raw functional data)tfb_spline(tfb)
: converttfb
: modify basis representation, smoothing.tfb_spline(default)
: converttfb
: default method, returning prototype when data is missing
See also
mgcv::smooth.terms()
for spline basis options.