** denotes quite substantial/important changes *** denotes really big changes Liable to change in future release: - method="GCV.Cp" as default in gam. Default will change to "REML". (1.8-24, June 2018) Currently deprecated and liable to be removed: - gam.fit and full.score (1.9-0) Issues: * openblas 0.3.x x<7 is not thread safe if itself compiled for single thread use and then called from multiple threads (unlike the reference BLAS, say). 0.2.20 appears to be OK. For 0.3.x x>6 make USE_THREAD=0 USE_LOCKING=1 to make openblas ensures thread safety. * t2 in bam(...,discrete=TRUE) - not treated as tensor products at present, and reparameterization needs checking (also for bam). * bam(...,discrete=TRUE) has no option to use the identical discretizaton to that used in fitting when predicting. 1.9-1 * Revised bam discrete code to allow indexing vectors to be long vectors, thereby allowing much larger datasets to be used. n limit is still .Machine$integer.max. * Revised C code underlying mvn family to allow larger problems, via use of R long vectors. * Revised gfam to work correctly with bam(...,discrete=FALSE). Subsetting in bgam.fit was incorrect for gfam. Thanks to Dave Miller for reporting it. * Correction of ziplss residual computation - incorrect probability of non zero used. Thanks to Xiao Liang. * correction of ginla integration weights (correcting typo from Rue et al. 2009). Thanks to Paul Van Dam-Bates. * C code modification to remove some new compiler warnings. 1.9-0 *** NCV smoothing parameter estimation now available for most models. See ?NCV. *** 'gfam' allows reponse variables to be from several different families. * Restriction that number of coefficients must be fewer than number of data removed. * Fix to tensor product constructor to allow matrix factor arguments to be handled correctly (previously failed). Thanks to Dave Miller. * removed argument 'pers' from 'plot.gam' (deprecated since 1.8-23, Nov 2017). * removed "nlm.fd" optimizer (deprecated since 1.8-19, Sept 2017). * removed discrete method prediction from discrete bam objects fitted before 1.8-32 (deprecated since then). * 'multinom' variance fix for K>=3 cases. A counter was initialized in the wrong place. Thanks to Max Goplerud for finding and reporting this. * NCV variance estimation improvements + na handling. * 'in.out' argument added for 'bam'. * slight internal change to mgcv:::get.var to enable user defined functions in arguments of smooths to work. * smooth2random tweak to guarantee same eigenvector sign everytime for same problem (needed by some developers who don't want to carry the original transform around). * Some unregistered methods fixes. * tests directory removed, as check time was tripping up every CRAN submission. * coxph memory error fix - could make bfgs fitting fail. 1.8-42 * One remaining old style C declaration fixed. 1.8-41 ** 'cnorm' family added for left, right, interval or un censored Gaussian data. Useful for log normal Accelerated Failure Time models, Tobit regression, rounded data etc. ** 'sz' factor smooth interaction class added for implementing models with main effect smooths and difference smooths for levels of a factor. See ?factor.smooth. ** 'NCV' smoothing parameter method added, but still experimental. * replacment of some old (K&R) style C function declarations. * mono.con corrected for cases with upper bounds. Thanks to Sean Wu. * plot.gam(...,seWithMean=TRUE) modified to only include mean uncertainty for the linear predictor of which the smooth is a part, when there are multiple linear predictors. Thanks to Gavin Simpson. * modifications of sparce matrix coercions, to avoid deprecated direct coercions. as(as(as(a, "dMatrix"), "generalMatrix"), "XsparseMatrix") in place of as(a,"dgXMatrix") where 'X' is 'C' or 'T'. Actually this requires Matrix 1.4-2 to work (will be added to dependencies in future). * vis.gam now deals properly with models with more than one linear predictor. * slight change in bgam.fitd to check scale parameter estimate converged when using bam(...,discrete=TRUE), otherwise scale could be wrong for all fixed smoothing parameters. * predict.gam modified so that 'terms' and 'exclude' control all terms, smooth or parametric, in the same way. Including the "(Intercept)" term. * Warnings from 'model.matrix' suppressed in 'terms2tensor' called by e.g. predict.bamd. There is a warning if any extra contrasts are supplied to model matrix that do not relate to a term in the model (which contradicts the documentation). Doc vs code bug report also filed. * Fix of broken rank deficiency handling in gam.fit5. Thanks to Cesko Voeten. * trind.generator modified to allow return of index functions in place of index arrays. * summary.gam (recov/reTest) modified to deal with 'fs' smooths fitted using 'gamm'. * gam.fit4 convergence testing improved and bug fix in computation of dVkk matrix used to check for converged 'infinite' smoothing parameters in bfgs. 1.8-40 * Small gam.fit5 convergence change to reduce chance of repeating step failures. * ExtractData fix avoiding assumption that 'xt' argument to a smooth is always a list (could cause setup failure with e.g. fs smooth). Thanks Keith Woolner. * Minor changes to mat.c and tprs.c to allow DEFS = -DSTRICT_R_HEADERS build. 1.8-39 * gam.fit5 convergence logic tightened up, to avoid pointlessly lengthy iterations when can't meet convergence tolerance. Faster, more reliable, but also generates more warnings when tolerances not met. * gam.fit3 convergence test modified to more reliable version. * Some modification of warning handling to only print inner optimization warnings if they occur at final call to inner optimizer. * newton and bfgs optimizers reset inner loop tolerance to 1e-2 of outer loop tolerance if it is larger than this, to avoid inner loop being too inaccurate for outer tolerance. * minor R 4.2.0 C compatibility change. 1.8-38 * uniquecombs fix to reduce memory footprint for text data. 1.8-37 * minor update for clang compatibility. * PredictMat fix of bug in which dimensions could be dropped for 1 row matrix as part of constraint handling (leading to disaster in predict.bamd). Thanks to Shawn Ligocki. 1.8-36 * 'fs' smooth construction modified to (approximately) orthogonalise smoothing penalty penalized and unpenalized bases. This makes the assumption that the associated variance components are independent more natural. Thanks to Matteo Fasiolo and Harald Bayaan for showing how the original construction could be problematic. * correction to 'fs' smooth handling in gamm to allow correlation structures to be used in the same model without cuasing an error. * AIC calculation modified for non-Gaussian families with a scale parameter, so that the model estimated scale parameter is used, rather than the deviance estimator employed with glm exponential families. This matters for avoiding substantial bias with very low mean count data (particulalry 'tw' and 'Tweedie' families). * ziP family response scale se fix for case when b>0 (thanks to Mark Donoghoe) * prediction on response scale could fail for models fitted by bam(...,discrete=TRUE) - fixed. * bug fix to multi-model anova.gam code to deal properly with extended family scale parameters (still not the recommended way of testing!) 1.8-35 * Fix to bug in Fletcher scale parameter estimate with weighted data (e.g. quasibinomial with n>1). Gaussian case not affected. Thanks to John Maindonald. * Italian translation updated thanks to Daniele Medri * German translation updated thanks to Detlef Steuer * French translation updated thanks to Philippe Grosjean * gumbls initialization improved. * Modification so that smooths can insist on not being reparameterized when called using default bam methods, by having a 'repara' element set to FALSE in the object returned by the smooth constructor. * Modification so that smooths can have the summation convention for matrix arguments turned off, by having an 'xt$sumConv' element set to FALSE in the 'xt' argument of 's'. 1.8-34 * 'gam.mh' proposal df typo fix and help file '%*%' -> '\%*\*' fix. Thanks Len Thomas. * various random number generation calls in rd functions in families could fail if simulating one point - fixed. * gp smooths can now be specified to be strictly stationary (see ?gp.smooth) * fix to predict.bamd bug in handling fixed effect contrasts (wrong contrast could be used for prediction, if non-default used in fitting). Thanks to Jalal Al-Tamimi. * predict.bamd modified to catch case where all prediction data is NA * predict.gam modified to allow NA to be a factor level. 1.8-33 * 'inline' -> 'static inline' error fix - caused installation failure on some platforms. 1.8-32 ** bam(...,discrete=TRUE) now uses discretization on the parametric model components as well as the smooths. * 'gumbls' family added for Gumbel location scale models. * 'shash' location scale and shape family added. * psum.chisq function added to compute c.d.f. of weighted sums of chi-squared r.v.s using method of Davies, 1980. This is now used for computing p-values in summary.gam in place of the Liu et al, 2009 approximation. * score and Schoenfeld residuals added for cox.ph to facilitate PH assumption checking (see ?cox.ph). * added 'gam.mh' for posterior dampling from models fitted with 'gam'. * 'ocat' upgraded to not ignore weights. * added multivariate t and Gaussian densities and multivariate t generation. * fix to predict.bamd to correct handling of `terms' and `exclude' arguments. * A recent lme change broke gamm offset handling. Fixed. Thanks Aaron Benjamin Shev. * dgesvd LAPACK calls replaced by faster dgesdd, as some versions of MKL BLAS/LAPACK have broken dgesvd. * Fix to derivative w.r.t. scale parameter calculation for extended families ('tw' is the only one at present) when using 'ML' smoothing parameter estimation. Could lead to step failure with 'tw'. Thanks Kevin Hawkshaw. * Fix to prediction/fake formula environment that could cause failure of gam/bam called from a function. Thanks to Duncan Murdoch. * Fix to smoothing parameter uncertainy correction for general families. * Much cleaner design for the discretization of covariates and associated discretization index matching in bam (predict.bamd and discrete.mf). * Fix to jagam logic for identifying separable penalties which led to wrong partioning of penalties for 'fs' smooths used with jagam. Thanks to Chris Jackson for spotting the problem. * Internal Sl list handling changes. * tensor.prod.model.matrix now also works with sparse matrices of class "dgCMatrix". 1.8-31 * fix in initalization in gammPQL * fix of some C routines of type void in place of SEXP called by .Call. 1.8-30 * anova.gam now uses GLRT for multi-model comparisons in both extended and general family cases. * Fix to bug in bam(...,discrete=TRUE) offset handling introduced in 1.8-29, which corrupted offset (and generated numerous warnings). Also fixes a less obvious bug introduced at the same time in predict.gam which could get the offset wrong when predicting with such models. Thanks to Brian Montgomery and Sean Wilson. * Fix to problem in pen.reg (used to initialize location scale models in particular), which could lead to initialization failure for lightly penalized models. Thanks Matteo Fasiolo. * Fix to predict.bamd handling of 'terms' and 'exclude' for models fit by bam(...,discrete=TRUE). * Work around in predict.gam for a spurious model.matrix warning when a contrasts.arg relates to a variable in data not required by object. * Fix to gammals family which had 'd2link' etc defined with argument 'eta' instead of the intended 'mu'. Thanks to Jim Stagge. * 'in.out' modified to allow boundaries specified exactly as for a soap film smoother. * soap film smoother constructor modified to check knots are within boundary on entry and drop those that are not, with a warning. Also halts if it detects that basis setup is catastrophically ill-conditioned. 1.8-29 * gammPQL modified to use standard GLM IRLS initilization (rather than glmmPQL method) to improve convergence of `gamm' fits. * bam(...,discrete=TRUE) now drops rownames from parametric model matrix components, to save substantial memory. * All BLAS/LAPACK calls from C now explicitly pass hidden string length arguments to avoid breakage by recent gfortran optimizations (stack corruption causing BLAS/LAPACK to return error code). * predict.gam bug fix - parameteric interaction terms could be dropped for type="terms" if there were no smooths. knock on was that they were also dropped for all bam(...,discrete=TRUE) fits. (Thanks Justin Davis.) * bam(...,discrete=TRUE) indexing bug in setup meant that models containing smooths with matrix arguments and other smooths with factor by variable would fail at the setup stage. * gam.fit4 initial divergence bug fix. * Gamma location-scale family 'gammals' added. See ?gammals. * row-wise Kronecker product operator %.% added for convenience. * changes to general families to allow return of first deriv of penalized Hessian component more easily. * ocat bug fix. Response scale prediction was wrong - it ignored the estimated thresholds. Thanks to Fabian Scheipl. * bam deviance could be wrongly returned, leading to 100% explained deviance. Fixed. 1.8-28 * fix of obscure sp naming bug. * changed some contour default colours from green to blue (they overlay heatmaps, so green was not clever). * Tweedie likelihood evaluation code made slightly more robust - for a model with machine zero scale parameter estimate it could segfault, as series maximum location could then overflow integer storage. Fixed + upper limit imposed on series length (warning if it's not enough). 1.8-27 ** Added routine 'ginla' for fully Bayesian inference based on an integrated nested Laplace approximation (INLA) approach. See ?ginla. * Tweedie location scale family added: 'twlss'. * gam.fit5 modified to distinguish more carefully between +ve semi definite and +ve definite. Previously could fail, claiming indefiniteness when it should not have. Affects general families. * bam was ignoring supplied scale parameter in extended family cases - fixed. * work around in list formula handling for reformulate sometimes setting response to a name in place of a call. * preinitialize in general families is now a function, not an expression. See cox.ph for an example. * added routine cholup for rank one modification of Cholesky factor. * two stage gam/bam fitting now allows 'sp' to be modified. * predict.gam could fail with type="response" for families requiring the response to be provided in this case (e.g. cox.ph). Fixed. * sp.vcov defaults to extracting edge corrected log sp cov matrix, if gam(...,gam.control(edge.control=TRUE)) used for fitting. * gam(...,gam.control(edge.correct=TRUE)) could go into infinite loop if sp was effectively zero. Corrected. 1.8-26 * LINPACK dependency removed. * Added service routine choldrop to down date a Cholesky factor on row/col deletion. * liu2 had a check messed up when vectorized. Fix to stop vector being checked for equality to zero. 1.8-25 ** bam(...,discrete=TRUE) methods improved. Cross products now usually faster (can be much faster) and code can now make better use of optimised BLAS. * fix to 'fs' smooth.construct method and smooth2random method, to allow constructor to be called without a "gamm" atribute set on the smooth spec but still get a sensible result from smooth2random (albeit never using sarse matrices). Useful for other packages using constructors and smooth2random, for 'fs' smooths. * The mrf smooth constructor contained an obsolete hack in which the term dimension was set to 2 to avoid plotting when used as a te marginal. This messed up side constraints for terms where a mrf smooth was a main effect and te marginal. Fixed. * extract.lme.cov/2 documentation modified to cover NA handling properly, and routines modified to not require data to be supplied. * fix of efsudr bug whereby extended families with no extra parameters to estimate could give incorrect results when using optimer="efs" in 'gam'. * negbin() corrected - it was declaring the log link to be canonical, leading to poor convergence and slight misfit. * predict.bam(...,discrete=TRUE) now handles na.action properly, rather than always dropping NAs. * Fix of very obscure bug in which very poor model of small dataset could end up with fewer `good' data than coefs, breaking an assumption of C code and segfaulting. * fix of null deviance computation bug introduced with extended families in bam - null deviance was wrong for non-default methods. * liu2 modified to deal with random effects estimated to be exactly 0, so that summary.gam does not fail in this case. 1.8-24 * Extended Fellner Schall optimizer now avaialable for all families with 'gam' using gam(...,optimizer="efs"). * Change to default behaviour of plot.gam when 'seWithMean=TRUE', and of predict.gam when 'type="iterms"'. The extra uncertainty added to CIs or standard errors now reflects the uncertainty in the mean in all other model terms, not just the uncertanity in the mean of the fixed effects as before. See ?plot.gam and ?predict.gam (including for how to get the old behaviour). * 're' smooths can now accept matrix arguments: see ?linear.functional.terms. * cox.ph now allows an offset to be provided. * Fix in smoothCon for bug in case in which only a single coefficient is involved in a sum-to-zero constraint. Could cause failure in e.g. t2 with cc marginal. * Model terms s, te etc are now always evaluated in mgcv workspace explicitly to avoid masking problems in obscure circumstances. * 'mrf' smooth documentation modified to make it clearer how to specify 'nb', and code modified so that it is now possible to specify the neighbour structure using names rather than indices. * 'bfgs' fix to handle extended families. * plot.gam modified to only prompt (via devAskNewPage) for a new page after the first page is used up. * export 'k.check'. * Fix to 'Rrank'. Previously a matrix R with more columns than rows could cause a segfault. * Fix to non-finite likelihood handling in gam.fit5. * Fix in bgam.fitd to step reduce under indefinite deviance and to ensure penalty evaluation is round off negative proof. * newton slighty modified to avoid (small) chance of all sp's being dropped for indef likelihood. 1.8-23 * default plot methods added for smooths of 3 and 4 variables. * The `gamma' control parameter for gam and bam can now be used with RE/ML smoothness selection, not just GCV/AIC. Essentially smoothing parameters are chosen as if the sample size was n/gamma instead of n. * The "bs" basis now allows multiple penalties of different orders on the same spline. e.g. s(x,bs="bs",m=c(3,2,0)). See ?b.spline. * bam(...,discrete=TRUE) can now use the smooth 'id' mechanism to link smoothing parameters, but note the method constraint that the linked bases are not forced to be identical in this case (unlike other fitting methods). * summary.gam now allows random effects tests to be skipped (in some large models the test is costly and uninteresting). * 'interpret.gam0' modified so that masked 's', 'te', etc from other packages does not cause failure. * coxph fix of prediction bug introduced with stratified model (thanks Giampiero Marra) * bam(...,discrete=TRUE) fix to handle nested matrix arguments to smooths. * bam(...,discrete=TRUE) fix to by variable handling with fs and re smooths which could fail during re-representation as tensor smooths (for discretization purposes). * bam extended family extension had introduced a bug in null deviance computation for Gaussian additive case when using methods other than fREML or GCV.Cp. Fixed. * bam(...,discrete=TRUE) now defaults to discrete=FALSE if there are no smooths, rather than failing. * bam was reporting wrong family under some smoothing parameter selection methods (not default). * null deviance computation improved for extended families. Previous version used an approximation valid for most families, and corrected the rest - now replaced with exact computations for all cases. * scat initialization tweaked to avoid -ve def problems at start. * paraPen handling in bam was broken - fixed. * slight adjustment to sp initialization for extended families - use observed information in weights if possible. 1.8-22 * Fix of bug whereby testing for OpenMP and nthreads>1 in bam, would fail if OpenMP was missing. 1.8-21 * When functions were added to families within mgcv some very large environments could end up attached to those functions, for no good reason. The problem originated from the dispatch of the generic 'fix.family.link' and then propagated via fix.family.var and fix.family.ls. This is now avoided, resulting in smaller gam objects on disk and lower R memory usage. Thanks to Niels Richard Hansen for uncovering this. * Another bug fix for prediction from discrete fit bam models with an offset, this time when there were more than 50000 data. Also fix to bam fitting when the number of data was an integer multiple of the chunk size + 1. * check.term was missing a 'stop' so that some unhandled nesting structures in bam(...,discrete=TRUE) failed with an unhelpful error, instead of a helpful one. Fixed. 1.8-20 * bam(,discrete=TRUE) could produce garbage with ti(x,z,k=c(6,5),mc=c(T,F)) because tensor re-ordering for efficiency failed to re-order mc (this is a *very* specialist bug!). Thanks to Fabian Scheipl. * plot(...,residuals=TRUE) weighted the working residuals by the sqrt working weights divided by the mean sqrt working weight. The standardization by the mean sqrt weight was non standard and has been removed. * Fix to bad bug in bam(...,discrete=TRUE) offset handling, and predict.bamd modified to avoid failure predicting with offset. Thanks to Paul Shearer. * fix of typo in bgam.fit, which caused failure of extended families when dataset larger than chunk size. Thanks Martijn Wieling. * bam(...,discrete=TRUE)/bgam.fitd modified to use fisher weights with extended families if rho!=0. 1.8-19 ** bam() now accepts extended families (i.e. nb, tw, ocat etc) * cox.ph now allows stratification (i.e. baseline hazard can differ between groups). * Example code for matched case control in ?cox.ph was just plain wrong. Now fixed. Thanks to Patrick Farrell. * bam(...,discrete=TRUE) tolerance for judging whether smoothing parameters are on boundary was far too low, so that sps could become so large that numerical instability set in. Fixed. Thanks to Paul Rosenfield. * p.type!=0 removed in summary.gam (previously deprecated) * single penalty tensor product smooths removed (previously deprecated). * gam(...,optimizer="perf") deprecated. * extra divergence check added to bam gam default gam fitting (similar to discrete method). * preinitialize and postproc components of extended families are now functions, not expressions. * coefficient divergence check was missing in bam(...,discrete=TRUE) release code - now fixed. * gaulss family link derivatives modified to avoid overflow. Thanks to Kristen Beck for reporting the problem. * redundant 'n' argument removed from extended family 'ls' functions. * Convergence checking can step fail earlier in fast.REML.fit. If trial step is no improvement and equal to previous best (to within a tolerance), then terminate with step failure after a few step halvings if situation persists. Thanks to Zheyuan Li for reporting problem. 1.8-18 * Tweak to 'newton' to further reduce chance of false convergence at indefinite point. * Fix to bam.update to deal with NAs in response. * 'scat' family now takes a 'min.df' argument which defaults to 3. Could otherwise occasionally have indefinite LAML problems as df headed towards 2. * Fix to `gam.fit4' where in rare circumstances the PIRLS iteration could finish at an indefinite point, spoiling implicit differentiation. * `gam.check' modified to fix a couple of issues with `gamm' fitted models, and to warn that interpretability is reduced for such models. * `qq.gam' default method slight modification to default generation of reference quantiles. In theory previous method could cause a problem if enough residuals were exactly equal. * Fix to `plot.mrf.smooth' to deal with models with by variables. * `plot.gam' fix to handling of plot limits when using 'trans' (from 1.8-16 'trans' could be applied twice). * `plot.gam' argument 'rug' now defaults to 'NULL' corresponding to 'rug=TRUE' if the number of data is <= 10000 and 'rug=FALSE' otherwise. * bam(...,discrete=TRUE) could fail if NAs in the smooth terms caused data rows to be dropped which led to parametric term factors having unused levels (which were then not being dropped). Fixed (in discrete.mf). * bam(...,discrete=TRUE,nthreads=n) now warns if n>1 and openMP is not available on the platform being used. * Sl.addS modified to use C code for some otherwise very slow matrix subset and addition ops which could become rate limiting for bam(...,discrete=TRUE). * Parallel solves in Sl.iftChol can speed up bam(...,discrete=TRUE) with large numbers of smoothing/variance parameters. * 'gamm' now warns if called with extended families. * disasterous 'te' in place of 'ti' typo in ?smooth.terms fixed thanks to John McKinlay. * Some `internal' functions exported to facilitate quantile gam methods in separate package. * Minor fix in gam.fit5 - 1 by 1 matrix coerced to scalar, to prevent failure in some circumstances. 1.8-17 * Export gamlss.etamu, gamlss.gH and trind.generator to facilitate user addition of new location-scale families. * Re-ordering of initialization in gam.fit4 to avoid possible failure of dev.resids call before initialization. * trap in fast.REML.fit for situation in which all smoothing parameters satisfy conditions for indefinite convergence on entry, with an immediate warning that this probably indicates iteration divergence (of bam). * "bs" basis modified to allow easier control of the interval over which the spline penalty applies which in turn allows more sensible control of extrapolation behaviour, when this is unavoidable. * Fix in uniquecombs - revised faster code (from 1.8-13) could occasionally generate false matches between different input combinations for integer variables or factors. Thanks to Rohan Sadler for reporting the issue that uncovered this. * A very bad initial model for uninformative data could lead to a negative fletcher estimate of the scale parameter and optimizer failure - fixed. * "fREML" allowed in sp.vcov so that it works with bam fitted models. * 2 occurances of 'return' replaced by (correct) return(). 1.8-16 * slightly improved intial value heuristics for overlapping penalties in general family case. * 'ocat' checks that response is numeric. * plot.gam(...,scale=-1) now changes scale according to 'trans' and 'shift'. * newton optimizer made slightly more cautious: contracts step if reduction in true objective too far different from reduction predicted by quadratic approximation underlying Newton step. Also leaves parameters unchanged in Newton step while their grad is less than 1% of max grad. * Fix to Fisher weight computation in gam.fit4. Previously a weight could (rarely) evaluate as negative machine prec instead of zero, get passed to gdi2 in C code, generate a NaN when square rooted, resulting in a NaN passed to the LAPACK dgeqp3 routine, which then hung in a non-interuptable way. * Fix of 'sp' argument handling with multiple formulae. Allocation to terms could be incorrect. * Option 'edge.correct' added to 'gam.control' to allow better correction of edge of smoothing parameter space effects with 'gam' when RE/ML used. * Fix to setting of penalty rank in smooth.construct.mrf.smooth.spec. Previously this was wrong, which could cause failure with gamm if the penalty was rank deficient. Thanks Paul Buerkner. * Fix to Vb.corr call from gam.fit3.post.proc to ensure that sp not dropped (wrongly treated as scale estimate) when P-REML or P-ML used. Could cause failure depending on BLAS. Thanks Matteo Fasiolo. * Fix in gam.outer that caused failure with "efs" optimizer and fixed sps. * Fix to `get.var' to drop matrix attributes of 1 column matrix variables. * Extra argument added to `uniquecombs' to allow result to have same row ordering regardless of input data ordering. Now used by smooth constructors that subsample unique covariate values during basis setup to ensure invariance to data re-ordering. * Correction of scaling error in spherical correlation structure GP smooth. * qf and rd functions for binomial family fixed for zero n case. 1.8-15 * Fix of survival function prediction in cox.ph family. Code used expression (8.8.5) in Klein and Moeschberger (2003), which is missing a term. Correct expression is, e.g., (10) from Andersen, Weis Bentzon and Klein (1996) Scandinavian Journal of Statistics. * Added help file 'cox.pht' for Cox PH regression with time dependent covariates. * fix of potential seg fault in gdi.c:get_bSb if single smooth model rank deficient (insufficient workspace allocated). * gam.fit5 modified to step half if trial penalized likelihood is infinite. * Fix so that bam works properly with drop.intercept=TRUE. 1.8-14 * bug fix to smoothCon that could generate NAs in model matrix when using bam with numeric by variables. The problem was introduced as part of the bam(...,discrete=TRUE) coding. 1.8-13 * Added help file ?one.se.rule on the `one standard error rule' for obtaining smoother models. * bam(...,discrete=TRUE) no longer complains about more coefficients than data. * 's', 'te', 'ti' and 't2' modified to allow user to specify that the smooth should pass through zero at a specified point. See ?identifiability. * anova.gam modified to use more appropriate reference degrees of freedom for multiple model call, where possible. Also fixed to allow multiple formulae models and to use -2*logLik in place of `deviance' for general.family models. * offsets allowed with multinomial, ziplss and gaulss families. * gevlss family implementing generalized extreme value location, scale and shape models. * Faster code used in 'uniquecombs'. Speeds up discretization step in 'bam(...,discrete=TRUE)'. Could still be improved for multi-column case. * modification to 'smoothCon' to allow resetting of smooth supplied constraints - enables fix of bug in bam handling of 't2' terms, where parameterization of penalty and model matrix did not previously match properly. * clarification of `exclude' argument to predict.gam in help files. * modification to 'plot.gam' etc, so that 'ylim' is no longer shifted by 'shift'. * ylim and ... handling improved for 'fs' plot method (thanks Dave Miller) * gam.check now recognises RStudio and plots appropriately. * bam(...,sparse=TRUE) removed - not efficient, because of unavoidability of dense off diagonal terms in X'X or equivalent. Deprecated since 1.8-5. * tweak to initial.sp/g to avoid infinite loop in s.p. initialization, in rather unusual circumstances. Thanks to Mark Bravington. * bam and gam have `drop.intercept' argument to force the parametric terms not to include a constant in their span, even when there are factor variables. * Fix in Vb.corr (2nd order edf correction) for fixed smoothing parameter case. * added 'all.vars1' to enable terms like s(x$y) in model formulae. * modification to gam.fit4 to ignore 'start' if it is immediately worse than 'null.coef'. * cSplineDes can now accept a 'derivs' argument. * added drop.intercept handling for multiple formulae (mod by Matteo Fasiolo). * 'gam.side' fix to avoid duplication of smooth model matrices to be tested against, when checking for numerical degeneracy. Problem could occasionally cause a failure (especially with bam), when the total matrix to be tested against ended upo with more columns than rows. * 4 occurances of as.name("model.frame") changed to quote(stats::model.frame) * fix in predict.bamd discrete prediction code to be a bit more relaxed about use of as.factor, etc in formulae. * fix in predict.gam handling of 'na.action' to avoid trying to get type of na.action from name of na.action function - this was fragile to nesting and could cause predict.bam to fail in the presence of NA's. * fix of gam.outer so that general families (e.g. cox.ph) can have all their smoothing parameters supplied without then ignoring the penalties! * fix in multiple formulae handling of fixed smoothing parameters. * Fix of bug in zlim handling in vis.gam perspective plot with standard errors. Thanks Petra Kuhnert. * probit link added to 'jagam' (suggested by Kenneth Knoblauch). * 'Sl.' routines revised to allow operation with non-linearly parameterized smoothers. * bug fix in Hessian computation in gam.fit5 - leading diagonal of Hessian of log|Hp| could be wrong where Hp is penalized Hessian. * better use of crossprod in gamlss.gH 1.8-12 ** "bs" B-spline smoothing basis. B-splines with derivative based penalties of various orders. * 'gamm' now uses a fixed scale parameter in PQL estimation for Poisson and binomial data via the `sigma' option in lmeControl. * bam null deviance computation was wrong with prior weights (including binomial other than binary case), and returned deviance was wrong for non-binary binomial. Fixed (did not affect estimation). * improvements to "bfgs" optimizer to better deal with `infinite' smoothing parameters. * changed scheme=3 in default 2-D plotting to grey scale version of scheme=2. * 'trichol' and 'bandchol' added for banded Cholesky decompositions, plus 'sdiag' functions added for extracting and setting matrix sub- and super- diagonals. * p-spline constructor and Predict.matrix.pspline.smooth now allow set up of SCOP-spline monotonic smoothers, and derivatives of smooths. Not used in modelling functions yet. * s(...,bs="re") now allows known precision matrix structures to be defined using the `xt' argument of 's' see ?smooth.construct.re.smooth.spec for details and example. * negbin() with a grid search for `theta' is no longer supported - use 'nb' instead. * bug fix to bam aic computation with AR rho correction. 1.8-11 * bam(...,discrete=TRUE) can now handle matrix arguments to smooths (and hence linear functional terms). * bam(...,discrete=TRUE) bug fix in fixed sp handling. * bam(...,discrete = TRUE) db.drho reparameterization fix, fixing nonsensical edf2. Also bam edf2 limited to maximum of edf1. * smoothCon rescaling of S changed to use efficient matrix norm in place of relatively slow computation involving model matrix crossproduct. * bam aic corrected for AR model if present. * Added select=TRUE argument to 'bam'. * Several discrete prediction fixes including improved thread safety. * bam/gam name gcv.ubre field by "method". * gam.side modified so that if a smooth has 'side.constrain==FALSE' it is neither constrained, nor used in the computation of constraints for other terms (the latter part being new). Very limited impact! * No longer checks if SUPPORT_OPENMP defined in Rconfig.h, but only if _OPENMP defined. No change in actual behaviour. 1.8-10 ** 'multinom' family implemented for multinomial logistic regression. * predict.bam now defaults to using efficient discrete prediction methods for models fit using discrete covariate methods (bam(...,discrete=TRUE)). * with bam(...,discrete=TRUE) terms like s(a,b,bs="re") had wrong p-value computation applied, as a result of being treated as tensor product terms. Fixed. * minor tweak to soap basis setup to avoid rounding error leading to 'approx' occasionally producing NA's with fixed boundaries. * misc.c:rwMatrix made thread safe (had been using R_chk_calloc, which isn't). * some upgrading for 64bit addressing. * uniquecombs now preserves contrasts on factors. * variable summary tweak so that 1 column matrices in parametric model are treated as regular numeric variables. 1.8-9 * C level fix in bam(...,discrete=TRUE) code. Some memory was mistakenly allocated via 'calloc' rather than 'R_chk_calloc', but was then freed via 'R_chk_free'. This could cause R to halt on some platforms. 1.8-8 ** New "gp" smooth class (see ?gp.smooth) implemeting the Matern covariance based Gaussian process model of Kamman and Wand (2003), and a variety of other simple GP smoothers. * some smooth plot methods now accept 'colors' and 'contour.col' argument to set color palette in image plots and contour line colors. * predict.gam and predict.bam now accept an 'exclude' argument allowing terms (e.g. random effects) to be zeroed for prediction. For efficiency, smooth terms not in 'terms' or in 'exclude' are no longer evaluated, and are instead set to zero or not returned. See ?predict.gam. * ocat saturated likelihood definition changed to zero, leading to better comprability of deviance between model fits (thanks to Herwig Friedl). * null.deviance calculation for extended families modified to make more sense when `mu' is the mean of a latent variable, rather than the response itself. * bam now returns standarized residuals 'std.rsd' if `rho!=0'. * bam(...,discrete=TRUE) can now handle 'fs' terms. * bam(...,discrete=TRUE) now accepts 'by' variables. Thanks to Zheyaun Li for debugging on this. * bam now works with drop.unused.levels == TRUE when random effects should have more levels than those that exist in data. (Thanks Alec Leh) * bam chunk.size logic error fix - error could be triggered if chunk.size reset automaticlly to be larger than data size. * uniqucombs can now accept a data frame with some or all factor columns, as well as purely numeric marices. * discrete.mf modified to avoid discretizing a covariate more than once, and to halt if a model requires the same covariate to be discretized two different ways (e.g. s(x) + s(x,z)). This affects only bam(...,discrete=TRUE). * Some changes to ziP and ziplss families to improve numerical robustness, and to ziP help file to suggest appropriate checking. Thanks to Keren Raiter, for reporting problems. * numerical robustness of extended gam methods (gam.fit4) improved for cases with many zero or near zero iterative weights. Handling of zero weights modified to avoid divide by (near) zero problems. Also tests for poor scaling of sqrt(abs(w))*z and substitutes computations based on w*z if detected. Also 'newton' routine now step halves if REML score not finite! * Sl.setup (used by bam) modification to allow more efficient handling of terms with multiple diagonal penalties with no non-zero elements in common, but possibly with non zero elements `interleaved' between penalties. 1.8-7 ** 'gam' default scale parameter changed to modified Pearson estimator developed by Fletcher 2012 Biometrika 99(1), 230-237. See ?gam.scale. ** 'bam' now has a 'discrete' argument to allow discretization of covariates for more efficient computation, with substantially more parallelization (via 'nthreads'). Still somewhat experimental. * Slightly more accurate smoothing parameter uncertainty correction. Changes edf2 used for AIC (under RE/ML), and hence may change AIC values. * jagam prior variance on fixed effects is now set with reference to data and model during initialization step. * bam could lose offset for small datasets in gaussian additive case. fixed. * gam.side now setup to include penalties in computations if fewer data than coefs (an exceedingly specialist topic). * p-value computation for smooth terms modified to avoid an ambiguity in the choice of test statistic that could lead to p-value changing somewhat between platforms. * gamm now warns if attempt is made to use extended family. * step fail logic improved for "fREML" optimization in 'bam'. * fix of openMP error in mgcv_pbsi, which could cause a problem in multi-threaded bam computation (failure to declare a variable as private). * Smoothing parameter uncertainty corrected AIC calculations had an indexing problem in Sl.postproc, which could result in failure of bam with linked smooths. * mroot patched for fact that chol(...,pivot=TRUE) does not operate as documented on rank deficient matrices: trailing block of triangular factor has to be zeroed for pivoted crossprod of factor to equal original matrix. * bam(...,sparse=TRUE) deprecated as no examples found where it is really worthwhile (but let me know if this is a problem). * marginal model matrices in tensor product smooths now stored in re-parameterized form, if re-parameterization happened (shouldn't change anything!). * initial.spg could fail if response vector had dim attributes and extended family used. fixed. 1.8-6 * Generalization of list formula handling to allow linear predictors to share terms. e.g. gam(list(y1~s(x),y2~s(z),1+2~s(v)+w-1),family=mvn(d=2)) * New German translation thanks to Detlef Steuer. * plot.gam now silently returns a list of plotting data, to help advanced users (Fabian Scheipl) to produce customized plot. * bam can now set up an object suitable for fitting, but not actually do the fit, following a suggestion by Fabian Scheipl. See arguments 'fit' and 'G'. 1.8-5 * Korean translation added thanks to Chel Hee Lee. * scale parameter handling in edf in logLik.gam made consistent with glm (affects AIC). * 'bam', 'gam' and 'gamm' modified to often produce smaller files when models saved (and never to produce absurdly large files). Achieved by setting environment of formula, terms etc to .GlobalEnv. Previously 'save' could save entire contents of environment of formula/terms with fitted model object. Note that change can cause failure in user written functions calling gam/bam and then 'predict' without supplying all prediction variables (fix obvious). * A help file 'single.index' supplied illustrating how single index models can be estimated in mgcv. * predict.gam now only creates a "constant" attribute if the model has one. * gam.fit4 convergence testing of coefs modified to more robust test of gradients of penalized dev w.r.t. params, rather than change in params, which can fail under rank deficiency. * mgcv_qrqy was not thread safe. Not noticeable on many platforms as all threads did exactly the same thing to the same matrix, but very noticeable on Windows. Thread safe mgcv_qrqy0 added and used in any parallel sections. * Allow openMP support if compiler supports it and provides pre-defined macro _OPENMP, even if SUPPORT_OPENMP undefined. (Allows multi-threading on Windows, for example.) * 'eps' is now an argument to 'betar' allowing some control on how to handle response values too close to 0 or 1. Help file expanded to emphasise the problems with using beta regression with 0s and 1s in the data. * fix of bug in multi-formula contrast handling, causing failure of prediction in some cases. * ziP and ziplss now check for non-integer (or binary) responses and produce an error message if these are found. Previously this was not trapped and could lead to a segfault. 1.8-4 ** JAGS/BUGS support added, enabling auto-generation of code and data required to used mgcv type GAMs with JAGS. Useful for complex random effects structures, for example. * smoothCon failed if selection penalties requested, but term was unpenalized. Now fixed (no selection penalties on unpenalized terms.) * gam.check would fail for tensor product smooths with by variables - fixed. * predict.gam would fail when predicting for more data than the blocksize but selecting only some terms. Fixed thanks to Scott Kostyshak. * smoothCon now has an argument `diagonal.penalty' allowing single penalty smooths to be re-parameterized in order to diagonalize the penalty matrix. PredictMat is modified to apply the same reparameterization, making it user transparent. Facilitates the setup of smooths for export to other packages. * predict.bam now exported in response to a request from another package maintainer. * 1.8 allows some prediction tasks for some families (e.g. cox.ph) to require response variables to be supplied. NAs in these then messed up prediction when they were not needed (e.g. if response variables with NAs were provided to predict.gam for a simple exponential family GAM). Response NAs now passed to the family specific prediction code, restoring the previous behaviour for most models. Thanks Casper Wilestofte Berg. * backend parallel QR code used by gam modified to use a pivoted block algorithm. * nthreads argument added to 'bam' to allow for parallel computation for computations in the main process (serial on any cluster nodes). e.g. QR based combination of results from cluster nodes is now parallel. * fREML computation now partly in parallel (controlled by 'nthreads' argument to 'bam') * slanczos now accepts an nt argument allowing parallel computation of main O(n^2) step. * fix to newton logic problem, which could cause an attempt to use 'score2' before definition. * fix to fREML code which could cause matrix square root to lose dimensions and cause an error. * initial.sp could perform very poorly for very low basis dimensions - could set initial sp to effective infinity. 1.8-3 * Fix of two illegal read/write bugs with extended family models with no smooths. (Thanks to Julian Faraway for reporting beta regr problem). * bam now checks that chunk.size > number of parameters and resets the chunk.size if not. * Examples of use of smoothCon and PredictMat for setting up bases for use outside mgcv (and then predicting) added to ?smoothCon. 1.8-2 * For exponential family gams, fitted by outer iteration, a warning is now generated if the Pearson scale parameter estimate is more than 4 times a robust estimate. This may indicate an unstable Pearson estimate. * 'gam.control' now has an option 'scale.est' to allow selection of the estimator to use for the scale parameter in exponential family GAMs. See ?gam.scale. Thanks to Trevor Davies for providing a clear unstable Pearson estimate example. * drop.unused.levels argument added to gam, bam and gamm to allow "mrf" (and "re") terms to have unobserved factor levels. * "mrf" constructor modified to deal properly with regions that contain no observations. * "fs" smooths are no longer eligible to have side conditions set, since they are fully penalized terms and hence always identifiable (in theory). * predict.bam was not declared as a method in NAMESPACE - fixed * predict.bam modified to strip down object to save memory (especially in parallel). * predict.gam now has block.size=NULL as default. This implies a block size of 1000 when newdata supplied, and use of a single block if no new data was supplied. * some messages were not printing correctly after a change in message handling to facilitate easier translation. Now fixed. 1.8-1 * bam modified so that choleski based fitting works properly with rank deficient model matrix (without regularization). * fix of 1.8-0 bug - gam prior weights mishandled in computation of cov matrix, resulting in incorrect variance estimates (even without prior weights specified). Thanks Fabian Scheipl. 1.8-0 *** Cox Proportional Hazard family 'cox.ph' added as example of general penalized likelihood families now useable with 'gam'. *** 'ocat', 'tw', 'nb', 'betar', 'ziP' and 'scat' families added for ordered categorical data, Tweedie with estimation of 'p', negative binomial with (fast) estimation of 'theta', beta regression for proportions, simple zero inflated Poisson regression and heavy tailed regression with scaled t distribution. These are all examples of 'extended families' now useable with 'gam'. *** 'gaulss' and 'ziplss' families, implementing models with multiple linear predictors. For gaulss there is a linear predictor for the Gaussian mean and another for the standard deviation. For ziplss there is a linear predictor controlling `presence' and another controlling the Poisson parameter, given presence. *** 'mvn' family for multivariate normal additive models. ** AIC computation changed for bam and gam models estimated by REML/ML to account for smoothing parameter uncertainty in degrees of freedom term. * With REML/ML smoothness selection in gam/bam an extra covariance matrix 'Vc' is now computed which allows for smoothing parameter uncertainty. See the 'unconditional' arguments to 'predict.gam' and 'plot.gam' to use this. * 'gam.vcomp' bug fix. Computed intervals for families with fixed scale parameter were too wide. * gam now defaults to the Pearson estimator of the scale parameter to avoid poor scale estimates in the quasipoisson case with low counts (and possibly elsewhere). Gaussian, Poisson and binomial inference invariant to change. Thanks to Greg Dropkin, for reporting the issue. * Polish translation added thanks to Lukasz Daniel. * gam.fit3 now forces eta and mu to be consistent with coef and valid on return (previously could happen that if step halving was used in final iteration then eta or mu could be invalid, e.g. when using identity link with non-negative data) * gam.fit3 now bases its convergence criteria on grad deviance w.r.t. model coefs, rather than changes in model coefs. This prevents problems when there is rank deficiency but different coefs get dropped at different iterations. Thanks to Kristynn Sullivan. * If mgcv is not on the search path then interpret.gam now tries to evaluate in namespace of mgcv with environment of formula as enclosing environment, if evaluation in the environment of the formula fails. * bug fix to sos plotting method so that it now works with 'by' variables. * 'plot.gam' now weights partial residuals by *normalized* square root iterative weights so that the average weight is 1 and the residuals should have constant variance if all is ok. * 'pcls' now reports if the initial point is not feasible. * 'print.gam' and 'summary.gam' now report the rank of the model if it is rank deficient. 'gam.check' reports the model rank whenever it is available. * fix of bug in 'k.check' called by 'gam.check' that gave an error for smooths with by variables. * predict.gam now checks that factors in newdata do not contain more levels than those used in fitting. * predict.gam could fail for type "terms" with no intercept - fixed. * 'bfgs' now uses a finite difference approximation for the initial inverse Hessian. 1.7-29 * Single character change to Makevars file so that openMP multi-threading actually works. 1.7-28 * exclude.too.far updated to use kd-tree instead of inefficient search for neighbours. This can make plot.gam *much* faster for large datasets. * Change in smoothCon, so that sweep and drop constraints (default for bam for efficiency reasons) are no longer allowed with by variables and matrix arguments (could lead to confusing results with factor by variables in bam). * 'ti' terms now allow control of which marginals to constrain, via 'mc'. Allows e.g. y ~ ti(x) + ti(x,z,mc=c(0,1)) - for experts only! * tensor.prod.model.matrix re-written to call C code. Around 5-10 times faster than old version for large data sets. * re-write of mini.mf function used by bam to generate a reduced size model frame for model setup. New version ensures that all factor levels are present in reduced frame, and avoids production of unrealistic combinations of variables in multi-dimensional smooths which could occur with old version. * bam models could fail if a penalty matrix was 1 by 1, or if multiple penalties on a smooth were in fact seperable into single penalties. Fixed. Thanks to Martijn weiling for reporting. * Constant in tps basis computation was different to published version for odd dimensions - makes no difference to fit, but annoying if you are trying to test a re-implementation. Thanks to Weijie Cai at SAS. * prediction for "cc" and "cp" classes is now cyclic - values outside the range of knots are wrapped back into the interval. * ldTweedie now returns derivatives w.r.t. a transform of p as well as w.r.t log of scale parameter phi. * gamm can now handle 'varComb' variance functions (thanks Sven Neulinger for reporting that it didn't). * fix of a bug which could cause bam to seg fault for a model with no smooths (insufficient storage allocated in C in this case). Thanks Martijn Weiling. 1.7-27 * Further multi-threading in gam fits - final two leading order matrix operations parallelized using openMP. * Export of smooth.construct.t2.smooth.spec and Predict.matrix.t2.smooth, and Rrank. * Fix of of missing [,,drop=FALSE] in predict.gam that could cause problems with single row prediction when 'terms' supplied (thanks Yang Yang). 1.7-26 * Namespace fixes. 1.7-25 * code added to allow openMP based multi-threading in gam fits (see ?gam.control and ?"mgcv-parallel"). * bam now allows AR1 error model to be split blockwise. See argument 'AR.start'. * magic.post.proc made more efficient (one of two O(np^2) steps removed). * var.summary now coerces character to factor. * bugs fixed whereby etastart etc were not passed to initial.spg and get.null.coefs. Thanks to Gavin Simpson. * reformulate removed from predict.gam to avoid (slow) repeated parser calls. * gaussian(link="log") initialization fixed so that negative data does not make it fail, via fix.family patching function. * bug fix in plot method for "fs" basis - ignored any side conditions. Thanks to Martijn Weiling and Jacolien van Rij. * gamm now checks whether smooths nested in factors have illegal side conditions, and halts if so (re-ordering formula can help). * anova.glmlist no longer called. * Compiled code now uses R_chck_calloc and R_chk_free for memory management to avoid the possibility of unfriendly exit on running out of memory. * fix in gam.side which would fail with unpenalized interactions in the presence of main effects. 1.7-24 * Examples pruned in negbin, smooth.construct.ad.smooth.spec and bam help files to reduce CRAN checking load. * gam.side now warns if only repeated 1-D smooths of the same variable are encountered, but does not halt. * Bug fix in C code for "cr" basis, that could cause a memory violation during prediction, when an extrapolation was immediately followed by a prediction that lay exactly on the upper boundary knot. Thanks to Keith Woolner for reporting this. * Fix for bug in fast REML code that could cause bam to fail with ti/te only models. Thanks to Martijn Wieling. * Fix of bug in extract.lme.cov2, which could cause gamm to fail when a correlation structure was nested inside a grouping factor finer than the finest random effect grouping factor. * Fix for an interesting feature of lme that getGroups applied to the corStruct that is part of the fitted lme object returns groups in sorted order, not data frame order, and without an index from one order to the other. (Oddly, the same corStruct Initialized outside lme has its groups in data frame order.) This feature could cause gamm to fail, complaining that the grouping factors for the correlation did not appear to be nested inside the grouping structure of the random effects. A bunch of ordering sensitivity tests have been added to the mgcv test suite. Thanks to Dave Miller for reporting the bug. 1.7-23 *** Fix of severe bug introduced with R 2.15.2 LAPACK change. The shipped version of dsyevr can fail to produce orthogonal eigenvectors when uplo='U' (upper triangle of symmetric matrix used), as opposed to 'L'. This led to a substantial number of gam smoothing parameter estimation convergence failures, as the key stabilizing re-parameterization was substantially degraded. The issue did not affect gaussian additive models with GCV model selection. Other models could fail to converge any further as soon as any smoothing parameter became `large', as happens when a smooth is estimated as a straight line. check.gam reported the lack of full convergence, but the issue could also generate complete fit failures. Picked up late as full test suite had only been run on R > 2.15.1 with an external LAPACK. ** 'ti' smooth specification introduced, which provides a much better (and very simple) way of allowing nested models based on 'te' type tensor product smooths. 'ti' terms are used to set up smooth interactions excluding main effects (so ti(x,z) is like x:z while te(x,z) is more like x*z, although the analogy is not exact). * summary.gam now uses a more efficient approach to p-value computation for smooths, using the factor R from the QR factorization of the weighted model matrix produced during fitting. This is a weighted version of the Wood (2013) statistic used previously - simulations in that paper essentially unchanged by the change. * summary.gam now deals gracefully with terms such as "fs" smooths estimated using gamm, for which p-values can not be computed. (thanks to Gavin Simpson). * gam.check/qq.gam now uses a normal QQ-plot when the model has been fitted using gamm or gamm4, since qq.gam cannot compute corrext quantiles in the presence of random effects in these cases. * gamm could fail with fixed smooths while assembling total penalty matrix, by attempting to access non-existent penalty matrix. (Thanks Ainars Aunins for reporting this.) * stripped rownames from model matrix, eta, linear predictor etc. Saves memory and time. * plot.soap.film could switch axis ranges. Fixed. * plot.mgcv.smooth now sets smooth plot range on basis of xlim and ylim if present. * formXtViX documentation fixed + return matrix labels. * fixDependence related negative index failures for completely confounded terms - now fixed. * sos smooth model matrix re-scaled for better conditioning. * sos plot method could produce NaNs by a rounding error in argument to acos - fixed. 1.7-22 * Predict.matrix.pspline.smooth now allows prediction outside range of knots, and uses linear extrapolation in this case. * missing drop=FALSE in reTest called by summary.gam caused 1-D random effect p-value computation to fail. Fixed (thanks Silje Skår). 1.7-21 ** soap film smoother class added. See ?soap * Polish translation added thanks to Lukasz Daniel. * mgcv/po/R-mgcv.pot up-dated. * plot methods for smooths modified slightly to allow methods to return plot data directly, without a prediction matrix. 1.7-20 * '...' now passed to termplot by plot.gam (thanks Andreas Eckner). * fix to null deviance computation for binomial when n>1, matrix response used and an offset is present. (Thanks to Tim Miller) * Some pruning of unused code from recov and reTest. * recov modified to stop it returning a numerically non-symmetric Ve, and causing occasional failures of summary.gam with "re" terms. * MRF smooth bug. Region ordering could become confused under some circumstances due to incorrect setting of factor levels. Corrected thanks to detailed bug report from Andreas Bender. * polys.plot colour/grey scale bug. Could ask for colour 0 from colour scheme, and therefore fail. Fixed. 1.7-19 ** summary.gam and anova.gam now use an improved p-value computation for smooth terms with a zero dimensional penalty null space (including random effects). The new scheme has been tested by full replication of the simulation study in Scheipl (2008,CSDA) to compare it to the best method therein. In these tests it is at least as powerful as the best method given there, and usually indistinguishable, but it gives slightly too low null p-values when smoothing parameters are very poorly identified. Note that the new p-values can not be computed from old fitted gam objects. Thanks to Martijn Wieling for pointing out how bad the p-values for regular smooths could be with random effects. * t2 terms now take an argument `ord' that allows orders of interaction to be selected. * "tp" smooths can now drop the null space from their construction via a vector m argument, to allow testing against polynomials in the null space. * Fix of vicious little bug in gamm tensor product handling that could have a te term pick up the wrong model matrix and fail. * bam now resets method="fREML" to "REML" if there are no free smoothing parameters, since there is no advantage to the "fREML" optimizer in this case, and it assumes there is at least one free smoothing parameter. * print.gam modified to print effective degrees of freedom more prettily, * testStat bug fix. qr was called with default arguments, which includes tol=1e-7... * bam now correctly returns fitting weights (rather than prior) in weights field. 1.7-18 * Embarrassingly, the adjusted r^2 computation in summary.gam was wrong for models with prior weights. Now fixed, thanks to Antony Unwin. * bam(...,method="fREML") could give incorrect edfs for "re" terms as a result of a matrix indexing error in Sl.initial.repara. Now fixed. Thanks to Martijn Wieling for reporting this. * summary.gam had freq=TRUE set as default in 1.7-17. This gave better p-values for paraPen terms, but spoiled p-values for fixed effects in the presence of "re" terms (a rather more common setup). Default now reset to freq=FALSE. * bam(...,method="fREML") made fully compatible with gam.vcomp. * bam and negbin examples speeded up * predict.gam could fail for models of the form y~1 when newdata are supplied. (Could make some model averaging methods fail). Fixed. * plot.gam had an overzealous check for availibility of variance estimates, which could make rank deficient models fail to plot CIs. fixed. 1.7-17 ** p-values for terms with no un-penalized components were poor. The theory on which the p-value computation for other terms is based shows why this is, and allows fixes to be made. These are now implemented. * summary p value bug fix --- smooths with no null space had a bug in lower tail of p-value computation, yielding far too low values. Fixed. * bam now outputs frequentist cov matrix Ve and alternative effective degrees of freedom edf1, in all cases. * smoothCon now adjusts null.space.dim on constraint absorption. * Prediction with matrix arguments (i.e. for models using summation convention) could be very memory hungry. This in turn meant that bam could run out of memory when fitting models with such terms. The problem was memory inefficient handling of duplicate evaluations. Now fixed by modification of PredictMat * bam could fail if the response vector was of class matrix. fixed. * reduced rank mrf smooths with supplied penalty could use the incorrect penalty rank when computing the reduced rank basis and fail. fixed thanks to Fabian Scheipl. * a cr basis efficiency change could lead to old fitted model objects causing segfaults when used with current mgcv version. This is now caught. 1.7-16 * There was an unitialized variable bug in the 1.7-14 re-written "cr" basis code for the case k=3. Fixed. * gam.check modified slightly so that k test only applied to smooths of numeric variables, not factors. 1.7-15 * Several packages had documentation linking to the 'mgcv' function help page (now removed), when a link to the package was meant. An alias has been added to mgcv-package.Rd to fix/correct these links. 1.7-14 ** predict.bam now added as a wrapper for predict.gam, allowing parallel computation ** bam now has method="fREML" option which uses faster REML optimizer: can make a big difference on parameter rich models. * bam can now use a cross product and Choleski based method to accumulate the required model matrix factorization. Faster, but less stable than the QR based default. * bam can now obtain starting values using a random sub sample of the data. Useful for seriously large datasets. * check of adequacy of basis dimensions added to gam.check * magic can now deal with model matrices with more columns than rows. * p-value reference distribution approximations improved. * bam returns objects of class "bam" inheriting from "gam" * bam now uses newdata.guaranteed=TRUE option when predicting as part of model matrix decomposition accumulation. Speeds things up. * More efficient `sweep and drop' centering constraints added as default for bam. Constaint null space unchanged, but computation is faster. * Underlying "cr" basis code re-written for greater efficiency. * routine mgcv removed, it now being many years since there has been any reason to use it. C source code heavily pruned as a result. * coefficient name generation moved from estimate.gam to gam.setup. * smooth2random.tensor.smooth had a bug that could produce a nonsensical penalty null space rank and an error, in some cases (e.g. "cc" basis) causing te terms to fail in gamm. Fixed. * minor change to te constructor. Any unpenalized margin now has corresponding penalty rank dropped along with penalty. * Code for handling sp's fixed at exactly zero was badly thought out, and could easily fail. fixed. * TPRS prediction code made more efficient, partly by use of BLAS. Large dataset setup also made more efficient using BLAS. * smooth.construct.tensor.smooth.spec now handles marginals with factor arguments properly (there was a knot generation bug in this case) * bam now uses LAPACK version of qr, for model matrix QR, since it's faster and uses BLAS. 1.7-13 ** The Lanczos routine in mat.c was using a stupidly inefficient check for convergence of the largest magnitude eigenvectors. This resulted in far too many Lanczos steps being used in setting up thin plate regression splines, and a noticeable speed penalty. This is now fixed, with many thanks David Shavlik for reporting the slow down. * Namespace modified to import from methods. Dependency on stats and graphics made explicit. * "re" smooths are no longer subject to side constraint under nesting (since this is almost always un-necessary and undesirable, and often unexpected). * side.con modified to allow smooths to be excluded and to allow side constraint computation to take account of penalties (unused at present). 1.7-12 * bam can now compute the leading order QR decomposition on a cluster set up using the parallel package. * Default k for "tp" and "ds" modified so that it doesn't exceed 100 + the null space dimension (to avoid complaints from users smoothing in quite alot of dimensions). Also default sub-sample size reduced to 2000. * Greater use of BLAS routines in the underlying method code. In particular all leading order operations count steps for gam fitting now use BLAS. You'll need R to be using a rather fancy BLAS to see much difference, however. * Amusingly, some highly tuned blas libraries can result in lapack not always giving identical eigenvalues when called twice with the same matrix. The `newton' optimizer had assumed this wouldn't happen: not any more. * Now byte compiled by default. Turn this off in DESCRIPTION if it interferes with debugging. * summary.gam p-value computation options modified (default remains the same). * summary.gam default p-value computation made more computationally efficient. * gamm and bam could fail under some options for specifying binomial models. Now fixed. 1.7-11 * smoothCon bug fix to avoid NA labels for matrix arguments when no by variable provided. * modification to p-value computation in summary.gam: `alpha' argument removed (was set to zero anyway); computation now deals with possibility of rank deficiency computing psuedo-inverse of cov matrix for statistic. Previously p-value computation could fail for random effect smooths with large datasets, when a random effect has many levels. Also for large data sets test statistic is now based on randomly sampling max(1000,np*2) model matrix rows, where np is number of model coefficients (random number generator state unchanged by this), previous sample size was 3000. * plot.mrf.smooth modified to allow passing '...' argument. * 'negbin' modified to avoid spurious warnings on initialization call. 1.7-10 * fix stupid bug in 1.7-9 that lost term labels in plot.gam. 1.7-9 * rather lovely plot method added for splines on the sphere. * plot.gam modified to allow 'scheme' to be specified for plots, to easily select different plot looks. * schemes added for default smooth plotting method, modified for mrfs and factor-smooth interactions. * mgcv function deprected, since magic and gam are much better (let me know if this is really a problem). 1.7-8 * gamm.setup fix. Bug introduced in 1.7-7 whereby gamm with no smooths would fail. * gamm gives returned object a class "gamm" 1.7-7 * "fs" smooth factor interaction class introduced, for smooth factor interactions where smoothing parameters are same at each factor level. Very efficient with gamm, so good for e.g. individual subject smooths. * qq.gam default method modified for increased power. * "re" terms now allowed as tensor product marginals. * log saturated likelihoods modified w.r.t. weight handling, so that weights are treated as modifying the scale parameter, when scale parameter is free. i.e. obs specific scale parameter is overall scale parameter divided by obs weight. This ensures that when the scale parameter is free, RE/ML based inference is invariant to multiplicative rescaling of weights. * te and t2 now accept lists for 'm'. This allows more flexibility with marginals that can have vector 'm' arguments (Duchon splines, P splines). * minor mroot fix/gam.reparam fix. Could declare symmetric matrix not symmetric and halt gam fit. * argument sparse added to bam to allow exploitation of sparsity in fitting, but results disappointing. * "mrf" now evaluates rank of penalty null space numerically (previously assumed it was always one, which it need not be with e.g. a supplied penalty). * gam.side now corrects the penalty rank in smooth objects that have been constrained, to account for the constraint. Avoids some nested model failures. * gamm and gamm.setup code restructured to allow smooths nested in factors and for cleaner object oriented converion of smooths to random effects. * gam.fit3 bug. Could fail on immediate divergence as null.eta was matrix. * slanczos bug fixes --- could segfault if k negative. Could also fail to return correct values when k small and kl < 0 (due to a convergence testing bug, now fixed) * gamm bug --- could fail if only smooth was a fixed one, by looking for non-existent sp vector. fixed. * 'cc' Predict.matrix bug fix - prediction failed for single points. * summary.gam failed for single coefficient random effects. fixed. * gam returns rV, where t(rV)%*%rV*scale is Bayesian cov matrix. 1.7-6 ** factor `by' variable handling extended: if a by variable is an ordered factor then the first level is treated as a reference level and smooths are only generated for the other levels. This is useful for avoiding identifiability issues in complex models with factor by variables. * bam bug fix. aic was reported incorrectly (too low). 1.7-5 * gam.fit3 modified to converge more reliably with links that don't guarantee feasible mu (e.g poisson(link="identity")). One vulnerability removed + a new approach taken, which restarts the iteration from null model coefficients if the original start values lead to an infinite deviance. * Duchon spline bug fix (could fail to create model matrix if number of data was one greater than number of unique data). * fix so that 'main' is not ignored by plot.gam (got broken in 1.7-0 object orientation of smooth plotting) * Duchon spline constructor now catches k > number of data errors. * fix of a gamm bug whereby a model with no smooths would fail after fitting because of a missing smoothing parameter vector. * fix to bug introduced to gam/bam in 1.7-3, whereby '...' were passed to gam.control, instead of passing on to fitting routines. * fix of some compiler warnings in matrix.c * fix to indexing bug in monotonic additive model example in ?pcls. 1.7-4 * Fix for single letter typo bug in C code called by slanczos, could actually segfault on matrices of less than 10 by 10. * matrix.c:Rlanczos memory error fix in convergence testing of -ve eigenvalues. * Catch for min.sp vector all zeroes, which could cause an ungraceful failure. 1.7-3 ** "ds" (Duchon splines) smooth class added. See ?Duchon.spline ** "sos" (spline on the sphere) smooth class added. See ?Spherical.Spline. * Extended quasi-likelihood used with RE/ML smoothness selection and quasi families. * random subsampling code in bam, sos and tp smooths modified a little, so that .Random.seed is set if it doesn't exist. * `control' argument changed for gam/bam/gamm to a simple list, which is then passed to gam.control (or lmeControl), to match `glm'. * Efficiency of Lanczos iteration code improved, by restructuring, and calling LAPACK for the eigen decompostion of the working tri-diagonal matrix. * Slight modification to `t2' marginal reparameterization, so that `main effects' can be extracted more easily, if required. 1.7-2 * `polys.plot' now exported, to facilitate plotting of results for models involving mrf terms. * bug fix in plot.gam --- too.far had stopped working in 1.7-0. 1.7-1 * post fitting constraint modification would fail if model matrix was rank deficient until penalized. This was an issue when mixing new t2 terms with "re" type random effects. Fixed. * plot.mrf.smooth bug fix. There was an implicit assumption that the `polys' list was ordered in the same way as the levels of the covariate of the smooth. fixed. * gam.side intercept detection could occasionally fail. Improved. * concurvity would fail if model matrix contained NA's. Fixed. 1.7-0 ** `t2' alternative tensor product smooths added. These can be used with gamm4. ** "mrf" smooth class added (at the suggestion of Thomas Kneib). Implements smoothing over discrete geographic districts using a Markov random field penalty. See ?mrf * qq.gam added to allow better checking of distribution of residuals. * gam.check modified to use qq.gam for QQ plots of deviance residuals. Also, it now works with gam(*, na.action = "na.replace") and NAs. * `concurvity' function added to provide simple concurvity measures. * plot.gam automatic layout modified to be a bit more sensible (i.e. to recognise that most screens are landscape, and that usually squarish plots are wanted). * Plot method added for mrf smooths. * in.out function added to test whether points are interior to a region defined by a set of polygons. Useful when working with MRFs. * `plot.gam' restructured so that smooths are plotted by smooth specific plot methods. * Plot method added for "random.effect" smooth class. * `pen.edf' function added to extract EDF associated with each penalty. Useful with t2 smooths. * Facilty provided to allow different identifiability constraints to be used for fitting and prediction. This allows t2 smooths to be fitted with a constraint that allows fitting by gamm4, but still perform inference with the componentwise optimal sum to zero constraints. * mgcv-FAQ.Rd added. * paraPen works properly with `gam.vcomp' and full.sp names returned correctly. * bam (and bam.update) can now employ an AR1 error model in the guassian-identity case. * bam.update modified for faster updates (initial scale parameter estimate now supplied in RE/ML case) * Absorption of identifiability constraints modified to allow constraints that only affect some parameters to leave rest of parameters completely unchanged. * rTweedie added for quick simulation of Tweedie random deviates when 1
pmin) * color example added to plot.gam.Rd * bug fix in `smooth.construct.tensor.smooth.spec' - class "cyclic.smooth" marginals no longer re-parameterized. * `te' documentation modified to mention that marginal reparameterization can destabilize tensor products. 1.3-17 * print.summary.gam prints estimated ranks more prettily (thanks Martin Maechler) ** `fix.family.link' can now handle the `cauchit' link, and also appends a third derivative of link function to the family (not yet used). * `fix.family.var' now adds a second derivative of the link function to the family (not yet used). ** `magic' modified to (i) accept an argument `rss.extra' which is added to the RSS(squared norm) term in the GCV/UBRE or scale calculation; (ii) accept argument `n.score' (defaults to number of data), the number to use in place of the number of data in the GCV/UBRE calculation. These are useful for dealing with very large data sets using pseudo-model approaches. * `trans' and `shift' arguments added to `plot.gam': allows, e.g. single smooth models to be easily plotted on uncentred response scale. * Some .Rd bug fixes. ** Addition of choose.k.Rd helpfile, including example code for diagnosing overly restrictive choice of smoothing basis dimension `k'. 1.3-16 * bug fix in predict.gam documentation + example of how to predict from a `gam' outside `R'. 1.3-15 * chol(A,pivot=TRUE) now (R 2.3.0) generates a warning if `A' is not +ve definite. `mroot' modified to supress this (since it only calls `chol(A,pivot=TRUE)' because `A' is usually +ve semi-definite). 1.3-14 * mat.c:mgcv_symeig modified to allow selection of the LAPACK routine actually used: dsyevd is the routine used previously, and seems very reliable. dsyevr is the faster, smaller more modern version, which it seems possible to break... rest of code still calls dsyevd. * Symbol registration added (thanks largely to Brian Ripley). Version depends on R >= 2.3.0 1.3-13 * some doc changes ** The p-values for smooth terms had too low power sometimes. Modified testing procedure so that testing rank is at most ceiling(2*edf.for.term). This gives quite close to uniform p-value distributions when the null is true, in simulations, without excessive inflation of the p-values, relative to parametetric equivalents when it is not. Still not really satisfactory. 1.3-12 * vis.gam could fail if the original model formula contained functions of covariates, since vis.gam calls predict.gam with a newdata argument based on the *model frame* of the model object. predict.gam now recognises that this has happened and doesn't fail if newdata is a model frame which contains, e.g. log(x) rather than x itself. offset handling simplified as a result. * prediction from te smooths could fail because of a bug in handling the list of re-parameterization matrices for 1-D terms in Predict.matrix.tensor.smooth. Fixed. (tensor product docs also updated) * gamm did not handle s(...,fx=TRUE) terms properly, due to several failures to count s(...,fx=FALSE) terms properly if there were fixed terms present. Now fixed. * In the gaussian additive mixed model case `gamm' now allows "ML" or "REML" to be selected (and is slightly more self consistent in handling the results of the two alternatives). 1.3-11 * added package doc file * added French error message support (thanks to Philippe Grosjean), and error message quotation characters (thanks to Brian Ripley.) 1.3-10 * a `constant' attribute has been added to the object returned by predict.gam(...,type="terms"), although what is returned is still not an exact match to what `predict.lm' would do. ** na.action handling made closer to glm/lm functions. In particular, default for predict.gam is now to pad predictions with NA's as opposed to dropping rows of newdata containing NA's. * interpret.gam had a bug caused by a glitch in the terms.object documentation (R <=2.2.0). Formulae such as y ~ a + b:a + s(x) could cause failure. This was because attr(tf,"specials") is documented as returning indices of specials in `terms'. It doesn't, it indexes specials in the variables dimension of the attr(tf,"factors") table: latter now used to translate. * `by' variable use could fail unreasonably if a `by' variable was not of mode `numeric': now coerced to numeric at appropriate times in smooth constructors. 1.3-9 * constants multiplying TPRS basis functions were `unconventional' for d odd in function eta() in tprs.c. The constants are immaterial if you are using gam, gamm etc, but matter if you are trying to get out the explicit representation of a TPRS term yourself (e.g. to differentiate a smooth exactly). 1.3-8 * get.var() now checks that result is numeric or factor (avoids occasional problems with variable names that are functions - e.g `t') * fix.family.var and fix.family.link now pass through unaltered any family already containing the extra derivative functions. Usually, to make a family work with gam.fit2 it is only necessary to add a dvar function. * defaults modified so that when using outer iteration, several performance iteration steps are now used for initialization of smoothing parameters etc. The number is controlled by gam.control(outerPIsteps). This tends to lead to better starting values, especially with binary data. gam, gam.fit and gam.control are modified. * initial.sp modified to allow a more expensive intialization method, but this is not currently used by gam. * minor documentation changes (e.g. removal of full stops from titles) 1.3-7 * change to `pcls' example to account for model matrix rescaling changing smoothing parameter sizes. * `gamm' `control' argument set to use "L-BFGS-B" method if `lme' is using `optim' (only does this if `nlminb' not present). Consequently `mgcv' now depends on nlme_3.1-64 or above. * improvement of the algorithm in `initial.sp'. Previously it was possible for very low rank smoothers (e.g. k=3) to cause the initialization to fail, because of poor handling of unpenalized parameters. 1.3-6 * pdIdnot class changed so that parameters are variances not standard deviations - this makes for greater consistency with pdTens class, and means that limits on notLog2 parameterization should mean the same thing for both classes. ** niterEM set to 0 in lme calls. This is because EM steps in lme are not set up to deal properly with user defined pdMat classes (latter confirmed by DB). 1.3-5 ** Improvements to anova and summary functions by Henric Nilsson incorporated. Functions are now closer to glm equivalents, and printing is more informative. See ?anova.gam and ?summary.gam. * nlme 3.1-62 changed the optimizer underlying lme, so that indefintie likelihoods cause problems. See ?logExp2 for the workaround. - niterEM now reset to 25, since parameterization prevents parameters wandering to +/- infinity (this is important as starting values for Newton steps are now more critical, since reparameterization introduces new local minima). ** smoothCon modified to rescale penalty coefficient matrices to have similar `size' to X'X for each term. This is to try and ensure that gamm is reasonably scale invariant in its behaviour, given the logExp2 re-parameterization. * magic dropped dimensions of an array inapproporiately - fixed. * gam now checks that model does not have more coefficients than data. 1.3-4 * inst/CITATION file added. Some .Rd fixes 30/6/2005 1.3-3 * te() smooths were not always estimated correctly by gamm(): invariance lost and different results to equivalent s() smooths. The problem seems to lie in a sensitivity of lme() estimation to the absolute size of the `S' attribute matrices of a pdTens class pdMat object: the problem did not occur at the last revision of the pdTens class, and there are no changes logged for nlme that could have caused it, so I guess it's down to a change in something that lme calls in the base distribution. To avoid the problem, smooth.construct.tensor.smooth.spec has been modified to scale all marginal penalty matrices so that they have largest singular value 1. * Changes to GLMs in R 2.1.1 mean that if the response is an array, gam could fail, due to failure of terms like w * X when w is and array rather than a vector. Code modified accordingly. * Outer iteration now suppresses some warnings, until the final fitted model is obtained, in order to avoid printing warnings that actually don't apply to the final fit. * Version number reporting made (hopefully) more robust. * pdconstruct.pdTens removed absolute lower limit on coef - replaced with relative lower limit. * moved tensor product constraint construction to BEFORE by variable stuff in smooth.construct.tensor.smooth.spec. 1.3-1 * vcov had been left out of namespace - fixed. * cr and cc smooths now trap the case in which the incorrect number of knots are supplied to them. * `s(.)' in a formula could cause a segfault, it get's trapped now, hopefully it will be handled nicely at some point in the future. Thanks Martin Maechler. * wrong n reported in summary.gam() in the generalized case - fixed. Thanks YK Chau. 1.3-0 *** The GCV/UBRE score used in the generalized case when fitting by outer iteration (the default) in version 1.2 was based on the Pearson statistic. It is prone to serious undersmoothing, particularly of binary data. The default is now to use a GCV/UBRE score based on the deviance: this performs much better, while still maintaining the enhanced numerical convergence performance of outer iteration. * The Pearson based scores are still available as an option (see ?gam.method) * For the known scale parameter case the default UBRE score is now just a linearly rescaled AIC criterion. 1.2-6 * Two bugs in smooth.sconstruct.tensor.smooth.spec: (i) incorrect testing of class of smooth before re-parameterizing, so that cr smooths were re-parameterized, when there is no need to; (ii) knots used in re-parameterization were based on quantiles of the relevant marginal covariate, which meant that repeated knots could be generated: now uses quantiles of unique covariate values. * Thanks to Henric Nilsson a bug in the documentation of magic.post.proc has been fixed. 1.2-5 ** Bug fix in gam.fit2: prior weights not subsetted for non-informative data in GCV/UBRE calculation. Also plot.gam modified to allow for consequent NA working residuals. Thanks to B. Stollenwerk for reporting this bug. ** vcov.gam written by Henric Nilsson included... see ?vcov.gam * Some minor documentation fixes. * Some tweaking of tolerances for outer iteration (was too lax). ** Modification of the way predict.gam picks up variables. (complication is that it should behave like other predict functions, but warn if an incomplete prediction data frame is supplied -since latter violates what white book says). 1.2-2 *** An alternative approach to GCV/UBRE optimization in the *generalized* additive model case has been implemented. It leads to more reliable convergence for models with concurvity problems, but is slower than the old default `performance iteration'. Basically the GAM IRLS process is iterated to convergence for each trial set of smoothing parameters, and the derivatives of the GCV/UBRE score w.r.t. smoothing parameters are calculated explicitly as part of the IRLS iteration. This means that the GCV/UBRE optimization is now `outer' to the IRLS iteration, rather than being performed on each working model of the IRLS iteration. The faster `performance iteration' is still available as an option. As a side effect, when using outer iteration, it is not possible to find smoothing parameters that marginally improve on the GCV/UBRE scores of the estimated ones by hand tuning: this improves the logical self consistency of using GCV/UBRE scores for model selection purposes. * To facilitate the expanded list of fitting methods, `gam' now has a `method' argument requiring a 3 item list, specifying which method to use for additive models, which for generalized additive models and if using outer iteration, which optimization routine to use. See ?gam.method for details. `gam.control' has also been modified accordingly. *** By default all smoothing bases are now automatically re-parameterized to absorb centering constraints on smooths into the basis. This makes everything more modular, and is usually user transparent. See ?gam.control to get the old behaviour. ** Tensor product smooths (te) now use a reparameterization of the marginal smoothing bases, which ensures that the penalties of a tensor product smooth retain the interpretation, in terms of function shape, of the marginal penalties from which they are induced. In practice this almost always improves MSE performance (at least for smooth underlying functions.) See ?te to turn this off. *** P-values reported by anova.gam and summary.gam are now based on strictly frequentist calculations. This means that they are much better justified theoretically, and are interpretable as ordinary frequentist p-values. They are still conditional on smoothing parameters, however, and are hence underestimates when smoothing parameters have been estimated. ** Identifiability side conditions modified to work with all smooths (including user defined). Now works by identifying possible dependencies symbolically, but dealing with the resulting degeneracies numerically. This allows full ANOVA decompositions of functions using tensor product smooths, for example. * summary.gam modified to deal with prior weights in adjusted r^2 calculation. ** `gam' object now contains `Ve' the frequentist covariance matrix of the paremeter estimators, which is useful for p-value calculation. see ?gamObject and ?magic.post.proc for details. * Now depends on R >=2.0.0 * Default residual plots modified in `gam.check' ** Added `cooks.distance.gam' function. * Bug whereby te smooths ignored `by' variables is now fixed. 1.1-6 * Smoothing parameter initialization method changed in magic, to allow better initialization of te() terms. This affects default gam fits. * gamm and extract.lme.cov2 modified to work correctly when the correlation structure applies to a finer grouping than the random effects. (Example of this added to gamm help file) * modifications of pdTens class. pdFactor.pdTens now returns a vector, not a matrix in accordance with documentation (in nlme 3.1-52). Factors are now always of form A=B'B (previously, could be A=BB') in accordance with documentation (nlme 3.1-52). pdConstruct.pdTens now tests whether initializing matrix is proportional to r.e. cov matrix or its inverse and initializes appropriately. gamm fitting with te() class tested extensively with modifications and nlme 3.1-52, and lme fits with pdTens class tested against equivalent fits made using re-parameterization and pdIdent class. In particular for gamm testing : model fits with single argument te() terms now match their equivalent models using s() terms; models fitted using gam() and gamm() match if gam() is called with the gamm() estimated smoothing parameters. * modifications of gamm() for compatibility with nlme 3.1-52: in particular a work around to allow everything to work correctly with a constructed formula object in lme call. * some modifications of plot.gam to allow greater control of appearance of plots of smooths of 2 variables. * added argument `offset' to gam for further compatibility with glm/lm. * change to safe prediction for parameteric terms had a bug in offset handling (offset not picked up if no newdata supplied, since model frame not created in this case). Fixed. (thanks to Jim Young for this) 1.1-5 * predict.gam had a further bug introduced with parametric safe prediction. Fixed by using a formula only containing the actual variable names when collecting data for prediction (i.e. no terms like `offset(x)') 1.1-5 * partial argument matching made col.shade be matched by col passed in ..in plot.gam, taking away user control of colors. 1.1-5 * 2d smooth plotting in plot.gam modified. * plot.gam could fail with residuals=TRUE due to incorrect counting in the code allowing use of termplot. plot.gam failed to prompt before a newpage if there was only one smooth. gam and gamm .Rd files updated slightly. 1.1-3 * extract.lme.cov2 could fail for random effect group sizes of 1 because submatrices with only a row or column lose their dimensions, and because single number calls to diag() result in an identity matrix. 1.1-2 * Some model formulae constructed in interpret.gam and used in facilitating safe prediction for parametric terms had the wrong environment - this could cause gam to fail to find data when e.g. lm, would find it. (thanks Thomas Maiwald) * Some items were missing from the NAMESPACE file. (thanks Kurt Hornik) * A very simple formula.gam function added, purely to facilitate better printing of anova method results under R 2.0.0. 1.1-1 * Due, no doubt, to gross moral turpitude on the part of the author, gamm() calculated the complete estimated covariance matrix of the response data explicitly, despite the fact that this matrix is usually rather sparse. For large datasets this could easily require more memory than was available, and huge computational expense to find the choleski decomposition of the matrix. This has now been rectified: when the covariance matrix has diagonal or block diagonal structure, then this is exploited. * Better examples have been added to gamm(). * Some documentation bugs were fixed. 1.1-0 Main changes are as follows. Note that `gam' object has been modified, so old objects will not always work with version 1.1 functions. ** Two new smooth classes "cs" and "ts": these are like "cr" and "tp" but can be penalized all the way down to zero degrees of freedom to allow fully automatic model selection (more self consistent than having a step.gam function). * The gam object expanded to allow inheritance from type lm and type glm, although QR related components of glm and lm are not available because of the difference in fitting method between glm/lm and gam. ** An anova method for gam objects has been added, for *approximate* hypothesis testing with GAMs. ** logLik.gam added (logLik.glm with df's fixed): enables AIC() to be used with gam objects. ** plot.gam modified to allow plotting of order 1 parametric terms via call to termplot. * Thanks to Henric Nilsson option `shade' added to plot.gam * predict.gam modified to allow safe prediction of parametric model components (such as poly() terms). * predict.gam type="terms" now works like predict.glm for parametric components. (also some enhancements to facilitate calling from termplot()) * Range of smoothing parameter estimation iteration methods expanded to help with non-convergent cases --- see ?gam.convergence * monotonic smoothing examples modified in light of above changes. * gamm modified to allow offset terms. * gamm bug fixed whereby terms in a model formula could get lost if there were too many of them. * gamm object modified in light of changes to gam object. 1.0-7 * Allows a model frame to be passed as `newdata' to predict.gam: it must contain all the terms in the gam objects model frame, `model'. * vis.gam() now passes a model frame to predict.gam and should be more robust as a result. `view' and `cond' must contain names from `names(x$model)' where x is the gam object. 1.0-6/5/4 * partial residuals modified to be IRLS residuals, weighted by IRLS weights. This is a much better reflecton of the influence of residuals than the raw IRLS residuals used before. * gamm summary sorted out by using NextMethod to get around fact that summary.pdMat can't be called directly (not in nlme namespace exports). * niterPQL and verbosePQL arguments added to gamm to allow more control of PQL iteration. * backquote=TRUE added when deparsing to allow non-standard names. (thanks: Brian Ripley) * bug in gam corrected: now gives correct null deviance when an offset is present. (thanks: Louise Burt) * bug in smooth.construct.tp.smooth.spec corrected: k=2 caused a segfault as the C code was reseting k to 3 (actually null space dimension +1), and not enough space was being allocated in R to handle the resultng returned objects. k reset in R code, with warning. (Thanks: Jari Oksanen) * predict.gam() now has "standard" data searching using a model frame based on a fake formula produced from full.formula in the fitted object. However it also warns if newdata is present but incomplete. This means that if newdata does not meet White book specifications, you get a warning, but the function behaves like predict.lm etc. predict.gam had been segfaulting if variables were missing from newdata (Thanks: Andy Liaw and BR) * contour option added to vis.gam * te smooths can be forced to use only a single penalty (theoretical interest only - not recommended for practical use) 1.0-3 * Fixes bugs in handling graphics parameters in plot.gam() * Adds option of partial residuals to plot.gam() 1.0-2/1 * Fixes a bug in evaluating variables of smooths, knots and by-variables. 1.0-0 *** Tensor product smooths - any bases available via s() terms in a gam formula can be used as the basis for tensor product smooths of multiple covariates. A separate wiggliness penalty and smoothing parameter is associated with each `marginal' basis. ** Cyclic smoothers: penalized cubic regression splines which have the same value and first two derivatives at their first and last knots. *** An object oriented approach to handling smooth terms which allows the user to add their own smooths. Smooth terms are constructed using smooth.construct method functions, while predictions from individual smooth terms are handled by predict.matrix method functions. ** p-splines implemented as the illustrative example for the above in the help files. *** A generalized additive mixed model function gamm() with estimation via lme() in the normal-identity case and glmmPQL() otherwise. The main aim of the function is to allow a defensible way of modelling correlated error structures while using a GAM. * The gam object itself has changed to facilitate the above. Most information pertaining to smooth terms is now stored in a list of smooth objects, whose classes depend on the bases used. The objects are not back compatible, and neither are the new method functions. This has been done in an attempt to minimize the scope for bugs, given the amount of time available for maintenance. ** s() no longer supports old stlye (version <0.6) specification of smooths (e.g. s(x,10|f)). This is in order to reduce the scope for problems with user defined smooth classes. * The mgcv() function now has an argument list more similar to magic(). * Function GAMsetup() has been removed. * I've made a general attempt to make the R code a bit less like a simultaneous translation from C. 0.9-5/4/3/2/1 * Mixtures of fixed degree of freedom and estimated degree of freedom smooths did not work correctly with the perf.iter=FALSE option. Fixed. * fx=TRUE not handled correctly by fit.method="magic": fixed. * some fixes to GAMsetup and gam documentation. * call re-instated to the fitted gam object to allow updating * -Wall and -pedantic removed from Makevars as they are gcc specific. * isolated call to Stop() replaced by call to stop()! 0.9-0 *** There is a new underlying smoothing parameter selection method, based on pivoted QR decomposition and SVD methods implemented in LAPACK. The method is more stable than the Wood (2000) method and allows the user to fix some smoothing parameters while estimating others, regularize the GAM fit in non-convergent cases and put lower bounds on smoothing parameters. The new method can deal with rank deficient problems, for example if there is a lack of identifiability between the parametric and smooth parts of the model. See ?magic for fuller details. The old method is still available, but gam() defaults to the new method. * Note that the new method calls LAPACK routines directly, which means that the package now depends on external linear algebra libraries, rather than relying entirely on my linear algebra routines. This is a good thing in terms of numerical robustness and speed, but does mean that to install the package from source you need a BLAS library installed and accesible to the linker. If you sucessfully installed R by building from source then you should have no problem: you have everything already installed, but occasionally users may have to install ATLAS in order to install from source. * Negative binomial GAMs now use the families supplied by the MASS library and employ a fast integrated GCV based method for estiamting the negative binomial parameter. See ?gam.neg.bin for details. The new method seems to converge slightly more often than the old method, and does so more quickly. * persp.gam() has been replaced by a new routine vis.gam() which is prettier, simpler and deals better with factor covariates and at all with `by' variables. * NA's can now be handled properly in a manner consistent with lm() and glm() [thanks to Brian Ripley for pointing me in the right direction here] and there is some internal tidying of GAM so that it's behavious is more similar to glm() and lm(). * Users can now choose to `polish' gam model fits by adding an nlm() based optimization after the usual Gu (2002) style `power iteration' to find smoothing parameters. This second stage will typically result in a slightly lower final GCV/UBRE score than the defualt method, but is much slower. See ?gam.control for more information. * The option to add a ridge penalty to the GAM fitting objective has been added to help deal with some convergence issues that occur when the linear predictor is essentially un-identifiable. see ?gam.control. 0.8-7 * There was a bug in the calculation of identifiability side conditions that could lead to over constraint of smooths using `by' variables in models with mixtures of smooths of different numbers of variables. This has been fixed. 0.8-6 * Fixes a bug which occured with user supplied smoothing parameters, in which the weight vector was omitted from part of the influence (hat) matrix calculation. This could result in non-sensical variance estimates. * Stronger consistency checks introduced on estimated degrees of freedom. 0.8-5 * mgcv was using Machine() which is deprecated from R 1.6.0, this version uses .Machine instead. 0.8-4 * There was a memory bug which could occur with the "cr" basis, in which un-allocated memory was written to in the tps_g() routine in the compiled C code - this occured when that routine was asked to clean up its memory, when there was nothing to clean up. Thanks to Luke Tierney for finding this problem and locating it to tps_g()! * A very minor memory leak which occured when knots are used to start a tps basis was fixed. 0.8-3 * Elements on leading diagonal of Hat/Influence matrix are now returned in gam object. * Over-zealous error trap introduced at 0.8-2, caused failure with smoothless models. 0.8-2 * User can now supply smoothing parameters for all smooth terms (can't have a mixture of supplied and estimated smoothing parameters). Feature is useful if e.g. GCV/UBRE fails to produce sensible estimates. * svd() replaced by La.svd() in summary.gam(). * a bug in the Lanczos iteration code meant that smooths behaved poorly if the smooth had exactly one less degree of freedom than the number of data (the wrong eigenvectors were retained in this case) - this was a rather rare bug in practice! * pcls() was not using sensible tolerances and svdroot() was using tolerances incorrectly, leading to problems with pcls(), now fixed. * prior weights were missing from the pearson residuals. * Faulty by variable documentation fixed (have lost name of person who let me know this, but thanks!) * Scale factor removed from Pearson residual calculation for consistancy with a higher proportion of authors. * The proportion deviance explained has been added to summary.gam() as a better measure than r-squared in most cases. * Routine SANtest() has been removed (obsolete). * A bug in the select option of plot.gam has been fixed. 0.8-1 * The GCV/UBRE score can develop phantom minima for some models: these are minima in the score for the IRLS problem which suggest large parameter changes, but which disappear if those large changes are actually made. This problem occurs in some logistic regression models. To aid convergence in such cases, gam.fit now switches to a cautious mgcv optimization method if convergence has not been obtained in a user defined number of iterations. The cautious mode selects the local minimum of the GCV/UBRE closest to the previous minimum if multiple minima are present. See gam.control for details about controlling iterations. * Option trace in gam.control now prints and plots more useful information for diagnosing convergence problems. * The one explicit formation of an inverse in the underlying multiple GCV optimization has been replaced with something more stable (and quicker). * A bug in the calculation of side conditions has been fixed - this caused a failure with models having parametric terms and terms like: s(x)+s(z)+s(z,x). * A bug whereby predict.gam simply failed to pick up offset terms has been fixed. * gam() now drops unused levels in factors. * A bug in the conversion of svd convergence criteria between version 0.7-2 and 0.8-0 has been fixed. * Memory leaks have been removed from the C code (thanks to the superb dmalloc library). * A bug that caused an undignified exit when 1-d smoothing with full splines in 0.8-0 has been fixed. 0.8-0 * There was a problem on some platforms resulting from the default compiler optimizations used by R. Specifically: floating point registers can be used to store local variables. If the register is larger than a double (as is the case for Intel 486 and up), this means that: double a,b; a=b; if (a==b) can evaluate as FALSE. The mgcv source code assumed that this could never happen (it wouldn't under strict ieee fp compliance, for example). As a result, for some models using the package compiled using some compiler versions, the one dimensional "overall" smoothing parameter search could fail, resulting in convergence failure, or undersmoothing. The Windows version from CRAN was OK, but versions installed under Linux could have problems. Version 0.8 does not make the problematic assumption. * The search for the optimal overall smoothing parameter has been improved, providing better protection against local minima in the GCV/UBRE score. * Extra GCV/UBRE diagnostics are provided, along with a function gam.check() for checking them. * It is now possible for the user to supply "knots" to be used when producing the t.p.r.s. basis, or for the cubic regression spline basis. This makes it feasible to work with very large datasets using the of the data. It also provides a mechanism for obtaining purely "knot based" thin plate regression splines. * A new mechanism is provided for allowing a smooth term to be multiplied by a covariate within the model. Such "by" variables allow smooths to be conditional on factors, for example. * Formulae such as y~s(x)+s(z)+s(x,z) can now be used. * The package now reports the UBRE score of a fitted model if UBRE was used for smoothing parameter selection, and the GCV score otherwise. * A new help page gam.models has been added. * A bug whereby offsets in model formulae only worked if they were at the end of the formulae has been fixed. * A bug whereby weights could not be supplied in the model data frame has been fixed. * gam.fit has been upgraded using the R 1.5.0 version of glm.fit * An error in the documentaion of xp in the gam object has been fixed, in addition to numerous other changes to the documentation. * The scoping rules employed by gam() have been brought into line with lm() and glm by searching for variables in the environment of the model formula rather than in the environment from which gam() was called - usually these are the same, but not always. * A bug in persp.gam() has been fixed, whereby slice information had to be supplied in a particular order. * All compiled code calls now specify package mgcv to avoid any possibility of calling the wrong function. * All examples now set the random number generator seed to facilitate cross platform comparisons. 0.7-2 * T and F changed to TRUE and FALSE in code and examples. * Minor predict.gam error fixed (didn't get correct fitted values if called without new data and model contained multi-dimensional smooths). 0.7-1 * There was a somewhat over-zealous warning message in the single smoothing parameter selection code - gave a warning everytime that GCV suggested a smoothing parameter at the boundary of the search interval - even if this GCV function was also flat. Fixed. * The search range for 1-d smoothing parameter selection was too wide - it was possible to give so little weight to the data that numerical problems caused all parameters to be estimates as zero (along with the edf for the term!). The range has been narrowed to something more sensible [above warning should still be triggered if it is ever too narrow - but this should not be possible]. * summary.gam() documentation extended a bit. p-values for smooths are slightly improved, and an example included that shows the user how to check them! 0.7-0 * The underlying multiple GCV/UBRE optimization method has been considereably strengthened, as follows: o First and second guess starting values for the relative smoothing parameters have been improved. o Steepest descent is used if either: i) the Hessian of the objective is not positive definite, or (ii) Steps in the Newton direction fails to improve the GCV/UBRE score after 4 step halvings (since in this case the quadratic model is clearly poor). o Newton steps are rescaled so that the largest step component (in log relative smoothing parameters) is of size 5 if any step components are >5. This avoids very large Newton steps that can occur in flat regions of the objective. o All steepest descent steps are initially scaled so that their longest component is 1, this avoids long steps into flat regions of the objective. o MGCV Convergence diagnostics are returned from routines mgcv and gam. o In gam.fit() smoothing parameters are re-auto-initialized during IRLS if they have become so far apart that some are likely to be in flat parts of the GCV/UBRE score. o A bug whereby poor second guesses at relative smoothing parameters could lead to acceptance of the first guess at these parameters has been removed. o The user is warned if the initial smoothing parameter guesses are not improved upon (can happen legitmately if all s.p.s should be very high or very low.) The end result of these changes is to make fits from gam much more reliable (particularly when using the tprs basis available from version 0.6). * A summary.gam and associated print function are provided. These provide approximate p-values for all model terms. * plot.gam now provides a mechanism for selecting single plots, and allows jittering of rug plots. * A bug that prevented models with no smooth terms from being fitted has been removed. * A scoping bug in gam.setup has been fixed. * A bug preventing certain mixtures of the bases to be used has been fixed. * The neg.bin family has been renamed neg.binom to avoid masking a function in the MASS library. 0.6-2 revisions from 0.6.1 * Relatively important fix in low level numerics. Under some circumstances the Lanczos routines used to find the thin plate regression spline basis could fail to converge or give wrong answers (many thanks to Charles Paxton for spotting this). The problem was with an insufficiently stable inverse iteration scheme used to find eigenvectors as part of the Lanczos scheme. The scheme had been used because it was very fast: unfortuantely stabilizing it is as computationally costly as simply accumulating eigen-vectors with the eigen-values - hence the latter has now been done. Some further examples also added. 0.6-1 * Junk files removed from src directory. * 3 C++ style comments removed from tprs.c. 0.6-0 * Multi-dimesional smoothing is now available, using "thin plate regression splines" (MS submitted). These are based on optimal approximations to the thin-plate splines. * gam formula syntax upgraded (see ?s ). Old syntax still works, with the exception that if no df specified then the tprs basis is always used by default. * plot.gam can now deal with two dimensional smooth terms as well as one dimensional smooths. * persp.gam added to allow user to visualize slices through a gam [Mike Lonergan] * negative binomial family added [Mike Lonergan] - not quite as robust as rest of families though [can have convergence problems]. * predict.gam now has an option to return the matrix mapping the parameters to the linear predictor at the supplied covariate values. * Variance calculation has been made more robust. * Routine pcls added, for penalized, linearly constrained optimization (e.g. monotonic splines). * Residual method provided (there was a bug in the default - Thanks Carmen Fernandez). * The cubic regression spline basis behaved wrongly when extrapolating [thanks Sharon Hedley]. This is now fixed. * Tests included to check that there are enough unique covariate combinations to support the users choise of smoothing basis dimension. * Internal storage improved so that large numbers of zeroes are no longer stored in arrays of matrices. * Some method argument lists brought into line with the R default versions. 0.5 * There was a bug in gam.fit(). The square roots of the correct iterative weights were being used in place of the weights: the bug was apparent because the sum of fitted values didn't always equal the sum of the response data when using the canonical link (which it should as a result of X'f=X'y when canonical link used and unpenalized). The bug has been corrected, and the correction tested. This problem did not affect (unweighted) additive models, only generalized additive models. * There was a bug that caused a crash in the compiled code when there were more than 8000 datapoints to fit. This has been fixed. * The package now reports its version number when loaded into R. * predict.gam() now returns predictions for the original covariate values (used to fit the model) when called without new data. * predict.gam() now allows type="response" as an argument - returning predictions on the scale of the response variable. * plot.gam() no-longer defaults to automatic page layout, use argument pages=1 to get the old default behaviour. * A bug that could cause a crash with the model formula y~s(x)-1 has been fixed. * Yet more sloppy practices are now allowed for naming variables in model formulae. e.g. d$y ~ s(d$x) now works, although its not recommended. * The GCV score is now reported by print.gam() (whether or not GCV was actually used - it isn't the default for Poisson or binomial). * plot.gam() modified to avoid prompting for input when not used interactively. 0.4 * Transformations allowed on lhs of gam formulae . * Argument order same as Splus gam. * Search for data now designed to be like lm() , so you can now be quite sloppy about where your data are. * The above mean that Venables and Ripley examples can be run without having to read the documentation for gam() so carefully! * A bug in the standard error calculations for parametric terms in predict.gam() is fixed. * A serious bug in the handling of factors was fixed - it was previously possible to obtain a rank deficient design matrix when using factors, despite having specified an identifiable model. * Some glitches when dealing with formulae containing offset() and/or I() have been fixed. * Fitting defaults can now be altered using gam.control when calling gam() 0.3-3 * Documentation updated, including removal of wrong information about constraints and mgcv . Also some readability changes in code and no smooths are now allowed. 0.3-2/1 * Allows all ways of specifying a family that glm() allows (previously family=poisson or family="poisson" would fail). Some more documentation fixes. * 0.2 lost the end of long formulae (because of a difference in the way that R and Splus deal with formulae). This is now fixed. * A minor error that meant that QT() failed under some versions of Windows is now fixed. * All package functions now have help(). Also the help files have been more carefully checked - version 0.2 actually contained no information on how to write a GAM formula as a result of a single missing '}' in the help file! 0.2 * Fixed d.f. regression splines allowed as part of gam() model specification. * Bug in knot placement algorithm fixed (caused crash with df close to number of data). * Replicate covariate values dealt with properly in gam()! * Data search method in gam() revised - now looks in frame from which gam() called. * plot.gam() can now deal with missing variance estimates gracefully. * Low (1,2) d.f. smooths dealt with gracefully by gam() - no longer cause freeze or crash. * Confidence intervals simulation tested for normal(identity), poisson(log), binomial(logit) and gamma(log) cases. Average coverage probabilities from 0.89 to 0.97 term by term, 0.93 to 0.96 "across the model", for nominal 0.95. * R documentation updated and tidied.