Home > crs: Categorical Regression Splines

Page 1 |

January 7, 2023

Version 0.15-37 Date 2022-12-31 Imports boot, stats, np, quantreg Suggests logspline, quadprog, rgl Title Categorical Regression Splines Description Regression splines that handle a mix of continuous and categorical (discrete) data of- ten encountered in applied settings. I would like to gratefully acknowledge support from the Nat- ural Sciences and Engineering Research Coun- cil of Canada (NSERC, <https://www.nserc-crsng.gc.ca>), the Social Sciences and Hu- manities Research Coun- cil of Canada (SSHRC, <https://www.sshrc-crsh.gc.ca>), and the Shared Hierarchi- cal Academic Research Computing Network (SHARC- NET, <https://www.sharcnet.ca>). We would also like to acknowledge the contribu- tions of the GNU GSL authors. In particular, we adapt the GNU GSL B-spline rou- tine gsl_bspline.c adding automated support for quantile knots (in addition to uniform knots), pro- viding missing functionality for derivatives, and for extending the splines beyond their endpoints. License GPL (>= 3) URL https://github.com/JeffreyRacine/R-Package-crs BugReports https://github.com/JeffreyRacine/R-Package-crs/issues Repository CRAN NeedsCompilation yes Author Jeffrey S. Racine [aut, cre], Zhenghua Nie [aut], Brian D. Ripley [ctb] (stepCV.R) Maintainer Jeffrey S. Racine <racinej@mcmaster.ca> Date/Publication 2023-01-07 00:20:20 UTC

R topics documented:

crs-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

clsd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1

Page 2 |

8

crs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 crsiv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 crsivderiv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 crssigtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Engel95 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 frscv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 frscvNOMAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 glp.model.matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 gsl.bs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 krscv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 krscvNOMAD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 npglpreg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 snomadr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 tensor.prod.model.matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 uniquecombs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 wage1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Index

70

crs-package Nonparametric Regression Splines with Continuous and Categorical Predictors Description This package provides a method for nonparametric regression that combines the (global) approxi- mation power of regression splines for continuous predictors (��x��) with the (local) power of kernel methods for categorical predictors (��z��). The user also has the option of instead using indicator bases for the categorical predictors. When the predictors contain both continuous and categorical (discrete) data types, both approaches offer more efficient estimation than the traditional sample- splitting (i.e. ��frequency��) approach where the data is first broken into subsets governed by the categorical z. To cite the crs package type: ��citation("crs")�� (without the single quotes). For a listing of all routines in the crs package type: ��library(help="crs")��. For a listing of all demos in the crs package type: ��demo(package="crs")��. For a ��vignette�� that presents an overview of the crs package type: ��vignette("crs")��. Details For the continuous predictors the regression spline model employs the B-spline basis matrix using the B-spline routines in the GNU Scientific Library (https://www.gnu.org/software/gsl/). The tensor.prod.model.matrix function is used to construct multivariate tensor spline bases when basis="tensor" and uses additive B-splines otherwise (i.e. when basis="additive"). For the discrete predictors the product kernel function is of the ��Li-Racine�� type (see Li and Racine (2007) for details) which is formed by constructing products of one of the following univariate kernels:

Page 3 |

Page 4 |

Page 5 |

Page 6 |

gsl/) and the tensor.prod.model.matrix function.

When basis="additive" the model becomes additive in nature (i.e. no interaction/tensor terms thus semiparametric not fully nonparametric). When basis="tensor" the model uses the multivariate tensor product basis. Value clsd returns a clsd object. The generic functions coef, fitted, plot and summary support objects of this type (er=FALSE plots the density on the sample realizations (default is ��extended range�� data), see er above, distribution=TRUE plots the distribution). The returned object has the following components: density estimates of the density function at the sample points density.er the density evaluated on the ��extended range�� of the data density.deriv estimates of the derivative of the density function at the sample points density.deriv.er estimates of the derivative of the density function evaluated on the ��extended range�� of the data distribution estimates of the distribution function at the sample points distribution.er the distribution evaluated on the ��extended range�� of the data xer the ��extended range�� of the data degree integer/vector specifying the degree of the B-spline basis for each dimension of the continuous x segments integer/vector specifying the number of segments of the B-spline basis for each dimension of the continuous x xq vector of quantiles tau vector generated by quantile.seq or input by the user (lying in [0,1]) from which the quantiles xq are obtained Usage Issues This function should be considered to be in ��beta�� status until further notice. If smoother estimates are desired and degree=degree.min, increase degree.min to, say, degree.min=3. The use of ��regression�� B-splines can lead to undesirable behavior at the endpoints of the data (i.e. when monotone=FALSE). The default ��density�� B-splines ought to be well-behaved in these regions.

Page 7 |

logspline

Examples

## Not run: ## Old Faithful eruptions data histogram and clsd density library(MASS) data(faithful) attach(faithful) model <- clsd(eruptions) ylim <- c(0,max(model$density,hist(eruptions,breaks=20,plot=FALSE)$density)) plot(model,ylim=ylim) hist(eruptions,breaks=20,freq=FALSE,add=TRUE,lty=2) rug(eruptions) summary(model) coef(model) ## Simulated data set.seed(42) require(logspline) ## Example - simulated data n <- 250 x <- sort(rnorm(n)) f.dgp <- dnorm(x) model <- clsd(x) ## Standard (cubic) estimate taken from the logspline package ## Compute MSEs mse.clsd <- mean((fitted(model)-f.dgp)^2)

Page 8 |

model.logspline <- logspline(x) mse.logspline <- mean((dlogspline(x,model.logspline)-f.dgp)^2) ylim <- c(0,max(fitted(model),dlogspline(x,model.logspline),f.dgp)) plot(model, ylim=ylim, sub=paste("MSE: logspline = ",format(mse.logspline),", clsd = ", format(mse.clsd)), lty=3, col=3) xer <- model$xer lines(xer,dlogspline(xer,model.logspline),col=2,lty=2) lines(xer,dnorm(xer),col=1,lty=1) rug(x) legend("topright",c("DGP", paste("Cubic Logspline Density (package logspline, knots = ", model.logspline$nknots,")",sep=""), paste("clsd Density (degree = ", model$degree, ", segments = ", model$segments,", penalty = ",round(model$penalty,2),")",sep="")), lty=1:3, col=1:3, bty="n", cex=0.75) summary(model) coef(model) ## Simulate data with known bounds set.seed(42) n <- 10000 x <- runif(n,0,1) model <- clsd(x,lbound=0,ubound=1) plot(model) ## End(Not run)

cps71 Canadian High School Graduate Earnings

Page 9 |

## Example - we compare the nonparametric local linear kernel regression ## method with the regression spline for the cps71 data. Note that there ## are no categorical predictors in this dataset so we are merely ## comparing and contrasting the two nonparametric estimates. data(cps71) attach(cps71) require(np) model.crs <- crs(logwage~age,complexity="degree-knots") model.np <- npreg(logwage~age,regtype="ll") plot(age,logwage,cex=0.25,col="grey", sub=paste("crs-CV = ", formatC(model.crs$cv.score,format="f",digits=3), ", npreg-CV = ", formatC(model.np$bws$fval,format="f",digits=3),sep="")) lines(age,fitted(model.crs),lty=1,col=1) lines(age,fitted(model.np),lty=2,col=2) crs.txt <- paste("crs (R-squared = ",formatC(model.crs$r.squared,format="f",digits=3),")",sep="") np.txt <- paste("ll-npreg (R-squared = ",formatC(model.np$R2,format="f",digits=3),")",sep="") legend(22.5,15,c(crs.txt,np.txt),lty=c(1,2),col=c(1,2),bty="n") summary(model.crs) summary(model.np) detach("package:np")

Page 10 |

Page 11 |

Page 12 |

gnu.org/software/gsl/) and the tensor.prod.model.matrix function

knots a character string (default knots="quantiles") specifying where knots are to be placed. ��quantiles�� specifies knots placed at equally spaced quantiles (equal number of observations lie in each segment) and ��uniform�� specifies knots placed at equally spaced intervals. If knots="auto", the knot type will be auto- matically determined by cross-validation basis a character string (default basis="auto") indicating whether the additive or ten- sor product B-spline basis matrix for a multivariate polynomial spline or gen- eralized B-spline polynomial basis should be used. Note this can be automat- ically determined by cross-validation if cv="nomad" or cv="exhaustive" and basis="auto", and is an ��all or none�� proposition (i.e. interaction terms for all predictors or for no predictors given the nature of ��tensor products��). Note also

Page 13 |

Page 14 |

org/software/gsl/) and the tensor.prod.model.matrix function.

When basis="additive" the model becomes additive in nature (i.e. no interaction/tensor terms thus semiparametric not fully nonparametric). When basis="tensor" the model uses the multivariate tensor product basis. When kernel=FALSE the model uses indicator basis functions for the nominal/ordinal (factor/ordered) predictors rather than kernel weighting. When kernel=TRUE the product kernel function for the discrete predictors is of the ��Li-Racine�� type (see Li and Racine (2007) for details). When cv="nomad", numerical search is undertaken using Nonsmooth Optimization by Mesh Adap- tive Direct Search (Abramson, Audet, Couture, Dennis, Jr., and Le Digabel (2011)). When kernel=TRUE and cv="exhaustive", numerical search is undertaken using optim and the box-constrained L-BFGS-B method (see optim for details). The user may restart the algorithm as many times as desired via the restarts argument (default restarts=0). The approach ascends from degree=0 (or segments=0) through degree.max and for each value of degree (or segments) searches for the optimal bandwidths. After the most complex model has been searched then the optimal degree/segments/lambda combination is selected. If any element of the optimal degree (or segments) vector coincides with degree.max (or segments.max) a warning is produced and the user ought to restart their search with a larger value of degree.max (or segments.max).

Page 15 |

Page 16 |

smooth.spline, loess, npreg

Page 17 |

set.seed(42) ## Example - simulated data n <- 1000 num.eval <- 50 x1 <- runif(n) x2 <- runif(n) z <- rbinom(n,1,.5) dgp <- cos(2*pi*x1)+sin(2*pi*x2)+z z <- factor(z) y <- dgp + rnorm(n,sd=.5) ## Estimate a model with specified degree, segments, and bandwidth model <- crs(y~x1+x2+z,degree=c(5,5), segments=c(1,1), lambda=0.1, cv="none", kernel=TRUE) summary(model) ## Perspective plot x1.seq <- seq(min(x1),max(x1),length=num.eval) x2.seq <- seq(min(x2),max(x2),length=num.eval) x.grid <- expand.grid(x1.seq,x2.seq) newdata <- data.frame(x1=x.grid[,1],x2=x.grid[,2], z=factor(rep(0,num.eval**2),levels=c(0,1))) z0 <- matrix(predict(model,newdata=newdata),num.eval,num.eval) newdata <- data.frame(x1=x.grid[,1],x2=x.grid[,2], z=factor(rep(1,num.eval),levels=c(0,1))) z1 <- matrix(predict(model,newdata=newdata),num.eval,num.eval) zlim=c(min(z0,z1),max(z0,z1)) persp(x=x1.seq,y=x2.seq,z=z0, xlab="x1",ylab="x2",zlab="y",zlim=zlim, ticktype="detailed", border="red", theta=45,phi=45) par(new=TRUE) persp(x=x1.seq,y=x2.seq,z=z1, xlab="x1",ylab="x2",zlab="y",zlim=zlim, theta=45,phi=45, ticktype="detailed", border="blue") ## Partial regression surface plot plot(model,mean=TRUE,ci=TRUE) ## Not run: ## A plot example where we extract the partial surfaces, confidence ## intervals etc. automatically generated by plot(mean=TRUE,...) but do ## not plot, rather save for separate use. pdat <- plot(model,mean=TRUE,ci=TRUE,plot.behavior="data") ## Column 1 is the (evaluation) predictor ([,1]), 2-4 ([,-1]) the mean,

Page 18 |

## lwr, and upr (note the returned value is a list hence pdat[[1]] is ## data for the first predictor, pdat[[2]] the second etc). Note that ## matplot() can plot this nicely. matplot(pdat[[1]][,1],pdat[[1]][,-1], xlab=names(pdat[[1]][1]),ylab=names(pdat[[1]][2]), lty=c(1,2,2),col=c(1,2,2),type="l") ## End(Not run)

crsiv Nonparametric Instrumental Regression Description crsiv computes nonparametric estimation of an instrumental regression function ϕ defined by con- ditional moment restrictions stemming from a structural econometric model: E[Y −ϕ(Z, X)|W] = 0, and involving endogenous variables Y and Z, exogenous variables X, and instruments W. The function ϕ is the solution of an ill-posed inverse problem. When method="Tikhonov", crsiv uses the approach of Darolles, Fan, Florens and Renault (2011) modified for regression splines (Darolles et al use local constant kernel weighting). When method="Landweber-Fridman", crsiv uses the approach of Horowitz (2011) using the regression spline methodology implemented in the crs package. Usage crsiv(y, z, w, x = NULL, zeval = NULL, weval = NULL, xeval = NULL, alpha = NULL, alpha.min = 1e-10, alpha.max = 1e-01, alpha.tol = .Machine$double.eps^0.25, deriv = 0, iterate.max = 1000, iterate.diff.tol = 1.0e-08, constant = 0.5, penalize.iteration = TRUE, smooth.residuals = TRUE, start.from = c("Eyz","EEywz"), starting.values = NULL, stop.on.increase = TRUE, method = c("Landweber-Fridman","Tikhonov"), opts = list("MAX_BB_EVAL"=10000,

Page 19 |

Page 20 |

Page 21 |

Page 22 |

npreg, crs

Examples

## Not run: ## This illustration was made possible by Samuele Centorrino ## <samuele.centorrino@univ-tlse1.fr> set.seed(42) n <- 1500 ## The DGP is as follows: ## 1) y = phi(z) + u ## 2) E(u|z) != 0 (endogeneity present) ## 3) Suppose there exists an instrument w such that z = f(w) + v and ## E(u|w) = 0 ## 4) We generate v, w, and generate u such that u and z are ## correlated. To achieve this we express u as a function of v (i.e. u = ## gamma v + eps) v <- rnorm(n,mean=0,sd=0.27) eps <- rnorm(n,mean=0,sd=0.05) u <- -0.5*v + eps w <- rnorm(n,mean=0,sd=1) ## In Darolles et al (2011) there exist two DGPs. The first is ## phi(z)=z^2 and the second is phi(z)=exp(-abs(z)) (which is ## discontinuous and has a kink at zero). fun1 <- function(z) { z^2 } fun2 <- function(z) { exp(-abs(z)) } z <- 0.2*w + v ## Generate two y vectors for each function. y1 <- fun1(z) + u y2 <- fun2(z) + u ## You set y to be either y1 or y2 (ditto for phi) depending on which ## DGP you are considering: y <- y1 phi <- fun1 ## Create an evaluation dataset sorting on z (for plotting)

Page 23 |

evaldata <- data.frame(y,z,w) evaldata <- evaldata[order(evaldata$z),] ## Compute the non-IV regression spline estimator of E(y|z) model.noniv <- crs(y~z,opts=opts) mean.noniv <- predict(model.noniv,newdata=evaldata) ## Compute the IV-regression spline estimator of phi(z) model.iv <- crsiv(y=y,z=z,w=w) phi.iv <- predict(model.iv,newdata=evaldata) ## For the plots, restrict focal attention to the bulk of the data ## (i.e. for the plotting area trim out 1/4 of one percent from each ## tail of y and z) trim <- 0.0025 curve(phi,min(z),max(z), xlim=quantile(z,c(trim,1-trim)), ylim=quantile(y,c(trim,1-trim)), ylab="Y", xlab="Z", main="Nonparametric Instrumental Spline Regression", sub=paste("Landweber-Fridman: iterations = ", model.iv$num.iterations,sep=""), lwd=1,lty=1) points(z,y,type="p",cex=.25,col="grey") lines(evaldata$z,evaldata$z^2 -0.325*evaldata$z,lwd=1,lty=1) lines(evaldata$z,phi.iv,col="blue",lwd=2,lty=2) lines(evaldata$z,mean.noniv,col="red",lwd=2,lty=4) legend(quantile(z,trim),quantile(y,1-trim), c(expression(paste(varphi(z),", E(y|z)",sep="")), expression(paste("Nonparametric ",hat(varphi)(z))), "Nonparametric E(y|z)"), lty=c(1,2,4), col=c("black","blue","red"), lwd=c(1,2,2)) ## End(Not run)

crsivderiv Nonparametric Instrumental Derivatives

Page 24 |

Page 25 |

Page 26 |

npreg, crsiv, crs

Examples

## Not run: ## This illustration was made possible by Samuele Centorrino ## <samuele.centorrino@univ-tlse1.fr> set.seed(42) n <- 1000 ## For trimming the plot (trim .5% from each tail)

Page 27 |

trim <- 0.005 ## The DGP is as follows: ## 1) y = phi(z) + u ## 2) E(u|z) != 0 (endogeneity present) ## 3) Suppose there exists an instrument w such that z = f(w) + v and ## E(u|w) = 0 ## 4) We generate v, w, and generate u such that u and z are ## correlated. To achieve this we express u as a function of v (i.e. u = ## gamma v + eps) v <- rnorm(n,mean=0,sd=0.27) eps <- rnorm(n,mean=0,sd=0.05) u <- -0.5*v + eps w <- rnorm(n,mean=0,sd=1) ## In Darolles et al (2011) there exist two DGPs. The first is ## phi(z)=z^2 and the second is phi(z)=exp(-abs(z)) (which is ## discontinuous and has a kink at zero). fun1 <- function(z) { z^2 } fun2 <- function(z) { exp(-abs(z)) } z <- 0.2*w + v ## Generate two y vectors for each function. y1 <- fun1(z) + u y2 <- fun2(z) + u ## You set y to be either y1 or y2 (ditto for phi) depending on which ## DGP you are considering: y <- y1 phi <- fun1 ## Sort on z (for plotting) ivdata <- data.frame(y,z,w,u,v) ivdata <- ivdata[order(ivdata$z),] rm(y,z,w,u,v) attach(ivdata) model.ivderiv <- crsivderiv(y=y,z=z,w=w) ylim <-c(quantile(model.ivderiv$phi.prime,trim), quantile(model.ivderiv$phi.prime,1-trim))

Page 28 |

plot(z,model.ivderiv$phi.prime, xlim=quantile(z,c(trim,1-trim)), main="", ylim=ylim, xlab="Z", ylab="Derivative", type="l", lwd=2) rug(z) ## End(Not run)

crssigtest Regression Spline Significance Test with Mixed Data Types Description crssigtest implements a consistent test of significance of an explanatory variable in a nonpara- metric regression setting that is analogous to a simple t-test in a parametric regression setting. The test is based on Ma and Racine (2011). Usage crssigtest(model = NULL, index = NULL, boot.num = 399, boot.type = c("residual","reorder"), random.seed = 42, boot = TRUE) Arguments model a crs model object. index a vector of indices for the columns of model$xz for which the test of significance is to be conducted. Defaults to (1,2,. . . ,p) where p is the number of columns in model$xz. boot.num an integer value specifying the number of bootstrap replications to use. Defaults to 399. boot.type whether to conduct ��residual�� bootstrapping (iid) or permute (reorder) in place the predictor being tested when imposing the null. random.seed an integer used to seed R��s random number generator. This is to ensure replica- bility. Defaults to 42. boot a logical value (default TRUE) indicating whether to compute the bootstrap P- value or simply return the asymptotic P-value.

Page 29 |

## Not run: options(crs.messages=FALSE) set.seed(42) n <- 1000

Page 30 |

z <- rbinom(n,1,.5) x1 <- rnorm(n) x2 <- runif(n,-2,2) z <- factor(z) ## z is irrelevant y <- x1 + x2 + rnorm(n) model <- crs(y~x1+x2+z,complexity="degree",segments=c(1,1)) summary(model) model.sigtest <- crssigtest(model) summary(model.sigtest) ## End(Not run)

Engel95 1995 British Family Expenditure Survey Description British cross-section data consisting of a random sample taken from the British Family Expenditure Survey for 1995. The households consist of married couples with an employed head-of-household between the ages of 25 and 55 years. There are 1655 household-level observations in total. Usage data("Engel95") Format A data frame with 10 columns, and 1655 rows. food expenditure share on food, of type numeric catering expenditure share on catering, of type numeric alcohol expenditure share on alcohol, of type numeric fuel expenditure share on fuel, of type numeric motor expenditure share on motor, of type numeric fares expenditure share on fares, of type numeric leisure expenditure share on leisure, of type numeric logexp logarithm of total expenditure, of type numeric logwages logarithm of total earnings, of type numeric nkids number of children, of type numeric Source Richard Blundell and Dennis Kristensen

Page 31 |

## Not run: ## Example - we compute nonparametric instrumental regression of an ## Engel curve for food expenditure shares using Landweber-Fridman ## iteration of Fredholm integral equations of the first kind. ## We consider an equation with an endogenous predictor (z) and an ## instrument (w). Let y = phi(z) + u where phi(z) is the function of ## interest. Here E(u|z) is not zero hence the conditional mean E(y|z) ## does not coincide with the function of interest, but if there exists ## an instrument w such that E(u|w) = 0, then we can recover the ## function of interest by solving an ill-posed inverse problem. data(Engel95) ## Sort on logexp (the endogenous predictor) for plotting purposes ## (i.e. so we can plot a curve for the fitted values versus logexp) Engel95 <- Engel95[order(Engel95$logexp),] attach(Engel95) model.iv <- crsiv(y=food,z=logexp,w=logwages,method="Landweber-Fridman") phihat <- model.iv$phi ## Compute the non-IV regression (i.e. regress y on z) ghat <- crs(food~logexp) ## For the plots, we restrict focal attention to the bulk of the data ## (i.e. for the plotting area trim out 1/4 of one percent from each ## tail of y and z). This is often helpful as estimates in the tails of ## the support are less reliable (i.e. more variable) so we are ## interested in examining the relationship where the action is. trim <- 0.0025 plot(logexp,food, ylab="Food Budget Share", xlab="log(Total Expenditure)", xlim=quantile(logexp,c(trim,1-trim)), ylim=quantile(food,c(trim,1-trim)), main="Nonparametric Instrumental Regression Splines", type="p",

Page 32 |

cex=.5, col="lightgrey") lines(logexp,phihat,col="blue",lwd=2,lty=2) lines(logexp,fitted(ghat),col="red",lwd=2,lty=4) legend(quantile(logexp,trim),quantile(food,1-trim), c(expression(paste("Nonparametric IV: ",hat(varphi)(logexp))), "Nonparametric Regression: E(food | logexp)"), lty=c(2,4), col=c("blue","red"), lwd=c(2,2), bty="n") ## End(Not run)

frscv Categorical Factor Regression Spline Cross-Validation Description frscv computes exhaustive cross-validation directed search for a regression spline estimate of a one (1) dimensional dependent variable on an r-dimensional vector of continuous predictors and nominal/ordinal (factor/ordered) predictors. Usage frscv(xz, y, degree.max = 10, segments.max = 10, degree.min = 0, segments.min = 1, complexity = c("degree-knots","degree","knots"), knots = c("quantiles","uniform","auto"), basis = c("additive","tensor","glp","auto"), cv.func = c("cv.ls","cv.gcv","cv.aic"), degree = degree, segments = segments, tau = NULL, weights = NULL, singular.ok = FALSE) Arguments y continuous univariate vector xz continuous and/or nominal/ordinal (factor/ordered) predictors

Page 33 |

Page 34 |

loess, npregbw,

Page 35 |

set.seed(42) ## Simulated data n <- 1000 x <- runif(n) z <- round(runif(n,min=-0.5,max=1.5)) z.unique <- uniquecombs(as.matrix(z)) ind <- attr(z.unique,"index") ind.vals <- sort(unique(ind)) dgp <- numeric(length=n) for(i in 1:nrow(z.unique)) { zz <- ind == ind.vals[i] dgp[zz] <- z[zz]+cos(2*pi*x[zz]) } y <- dgp + rnorm(n,sd=.1) xdata <- data.frame(x,z=factor(z)) ## Compute the optimal K and I, determine optimal number of knots, set ## spline degree for x to 3 cv <- frscv(x=xdata,y=y,complexity="knots",degree=c(3)) summary(cv)

frscvNOMAD Categorical Factor Regression Spline Cross-Validation Description frscvNOMAD computes NOMAD-based (Nonsmooth Optimization by Mesh Adaptive Direct Search, Abramson, Audet, Couture and Le Digabel (2011)) cross-validation directed search for a regression spline estimate of a one (1) dimensional dependent variable on an r-dimensional vector of continu- ous predictors and nominal/ordinal (factor/ordered) predictors. Usage frscvNOMAD(xz, y, degree.max = 10, segments.max = 10, degree.min = 0, segments.min = 1, cv.df.min = 1, complexity = c("degree-knots","degree","knots"), knots = c("quantiles","uniform","auto"), basis = c("additive","tensor","glp","auto"),

Page 36 |

Page 37 |

Page 38 |

Page 39 |

loess, npregbw,

Examples

set.seed(42) ## Simulated data n <- 1000 x <- runif(n) z <- round(runif(n,min=-0.5,max=1.5)) z.unique <- uniquecombs(as.matrix(z)) ind <- attr(z.unique,"index") ind.vals <- sort(unique(ind)) dgp <- numeric(length=n) for(i in 1:nrow(z.unique)) { zz <- ind == ind.vals[i] dgp[zz] <- z[zz]+cos(2*pi*x[zz]) } y <- dgp + rnorm(n,sd=.1) xdata <- data.frame(x,z=factor(z)) ## Compute the optimal K and I, determine optimal number of knots, set ## spline degree for x to 3 cv <- frscvNOMAD(x=xdata,y=y,complexity="knots",degree=c(3),segments=c(5)) summary(cv)

glp.model.matrix Utility function for constructing generalized polynomial smooths Description Produce model matrices for a generalized polynomial smooth from the model matrices for the marginal bases of the smooth. Usage glp.model.matrix(X) Arguments X a list of model matrices for the marginal bases of a smooth Details This function computes a generalized polynomial where the orders of each term entering the poly- nomial may vary.

Page 40 |

X <- list(matrix(1:4,2,2),matrix(5:10,2,3)) glp.model.matrix(X)

gsl.bs GSL (GNU Scientific Library) B-spline/B-spline Derivatives Description gsl.bs generates the B-spline basis matrix for a polynomial spline and (optionally) the B-spline basis matrix derivative of a specified order with respect to each predictor Usage gsl.bs(...) ## Default S3 method: gsl.bs(x, degree = 3, nbreak = 2, deriv = 0, x.min = NULL, x.max = NULL, intercept = FALSE, knots = NULL, ...) Arguments x the predictor variable. Missing values are not allowed degree degree of the piecewise polynomial - default is ��3�� (cubic spline) nbreak number of breaks in each interval - default is ��2�� deriv the order of the derivative to be computed-default if 0 x.min the lower bound on which to construct the spline - defaults to min(x)

Page 41 |

bs

Examples

## Plot the spline bases and their first order derivatives x <- seq(0,1,length=100) matplot(x,gsl.bs(x,degree=5),type="l") matplot(x,gsl.bs(x,degree=5,deriv=1),type="l") ## Regression example n <- 1000 x <- sort(runif(n)) y <- cos(2*pi*x) + rnorm(n,sd=.25)

Page 42 |

B <- gsl.bs(x,degree=5,intercept=FALSE) plot(x,y,cex=.5,col="grey") lines(x,fitted(lm(y~B)))

krscv Categorical Kernel Regression Spline Cross-Validation Description krscv computes exhaustive cross-validation directed search for a regression spline estimate of a one (1) dimensional dependent variable on an r-dimensional vector of continuous and nominal/ordinal (factor/ordered) predictors. Usage krscv(xz, y, degree.max = 10, segments.max = 10, degree.min = 0, segments.min = 1, restarts = 0, complexity = c("degree-knots","degree","knots"), knots = c("quantiles","uniform","auto"), basis = c("additive","tensor","glp","auto"), cv.func = c("cv.ls","cv.gcv","cv.aic"), degree = degree, segments = segments, tau = NULL, weights = NULL, singular.ok = FALSE) Arguments y continuous univariate vector xz continuous and/or nominal/ordinal (factor/ordered) predictors degree.max the maximum degree of the B-spline basis for each of the continuous predictors (default degree.max=10) segments.max the maximum segments of the B-spline basis for each of the continuous predic- tors (default segments.max=10) degree.min the minimum degree of the B-spline basis for each of the continuous predictors (default degree.min=0) segments.min the minimum segments of the B-spline basis for each of the continuous predic- tors (default segments.min=1)

Page 43 |

Page 44 |

Page 45 |

loess, npregbw,

Examples

set.seed(42) ## Simulated data n <- 1000 x <- runif(n) z <- round(runif(n,min=-0.5,max=1.5)) z.unique <- uniquecombs(as.matrix(z)) ind <- attr(z.unique,"index") ind.vals <- sort(unique(ind)) dgp <- numeric(length=n) for(i in 1:nrow(z.unique)) { zz <- ind == ind.vals[i] dgp[zz] <- z[zz]+cos(2*pi*x[zz]) } y <- dgp + rnorm(n,sd=.1) xdata <- data.frame(x,z=factor(z)) ## Compute the optimal K and lambda, determine optimal number of knots, set ## spline degree for x to 3 cv <- krscv(x=xdata,y=y,complexity="knots",degree=c(3)) summary(cv)

krscvNOMAD Categorical Kernel Regression Spline Cross-Validation Description krscvNOMAD computes NOMAD-based (Nonsmooth Optimization by Mesh Adaptive Direct Search, Abramson, Audet, Couture and Le Digabel (2011)) cross-validation directed search for a regression spline estimate of a one (1) dimensional dependent variable on an r-dimensional vector of continu- ous and nominal/ordinal (factor/ordered) predictors.

Page 46 |

Page 47 |

Page 48 |

Page 49 |

loess, npregbw,

Examples

set.seed(42) ## Simulated data n <- 1000 x <- runif(n) z <- round(runif(n,min=-0.5,max=1.5)) z.unique <- uniquecombs(as.matrix(z))

Page 50 |

ind <- attr(z.unique,"index") ind.vals <- sort(unique(ind)) dgp <- numeric(length=n) for(i in 1:nrow(z.unique)) { zz <- ind == ind.vals[i] dgp[zz] <- z[zz]+cos(2*pi*x[zz]) } y <- dgp + rnorm(n,sd=.1) xdata <- data.frame(x,z=factor(z)) ## Compute the optimal K and lambda, determine optimal number of knots, set ## spline degree for x to 3 cv <- krscvNOMAD(x=xdata,y=y,complexity="knots",degree=c(3),segments=c(5)) summary(cv)

npglpreg Generalized Local Polynomial Regression Description npglpreg computes a generalized local polynomial kernel regression estimate (Hall and Racine (2015)) of a one (1) dimensional dependent variable on an r-dimensional vector of continuous and categorical (factor/ordered) predictors. Usage npglpreg(...) ## Default S3 method: npglpreg(tydat = NULL, txdat = NULL, eydat = NULL, exdat = NULL, bws = NULL, degree = NULL, leave.one.out = FALSE, ckertype = c("gaussian", "epanechnikov", "uniform", "truncated gaussian"), ckerorder = 2, ukertype = c("liracine", "aitchisonaitken"), okertype = c("liracine", "wangvanryzin"), bwtype = c("fixed", "generalized_nn", "adaptive_nn", "auto"), gradient.vec = NULL, gradient.categorical = FALSE, cv.shrink = TRUE, cv.maxPenalty = sqrt(.Machine$double.xmax), cv.warning = FALSE,

Page 51 |

Page 52 |

Page 53 |

Page 54 |

Page 55 |

Page 56 |

Page 57 |

npreg

Examples

## Not run: set.seed(42) n <- 100 x1 <- runif(n,-2,2) x2 <- runif(n,-2,2) y <- x1^3 + rnorm(n,sd=1) ## Ideally the method should choose large bandwidths for x1 and x2 and a ## generalized polynomial that is a cubic for x1 and degree 0 for x2. model <- npglpreg(y~x1+x2,nmulti=1) summary(model) ## Plot the partial means and percentile confidence intervals plot(model,ci=T) ## Extract the data from the plot object and plot it separately myplot.dat <- plot(model,plot.behavior="data",ci=T) matplot(myplot.dat[[1]][,1],myplot.dat[[1]][,-1],type="l")

Page 58 |

matplot(myplot.dat[[2]][,1],myplot.dat[[2]][,-1],type="l") ## End(Not run)

snomadr R interface to NOMAD Description snomadr is an R interface to NOMAD (Nonsmooth Optimization by Mesh Adaptive Direct Search, Abramson, Audet, Couture and Le Digabel (2011)), an open source software C++ implementation of the Mesh Adaptive Direct Search (MADS, Le Digabel (2011)) algorithm designed for constrained optimization of blackbox functions. NOMAD is designed to find (local) solutions of mathematical optimization problems of the form min f(x) x in R^n s.t. g(x) <= 0 x_L <= x <= x_U where f(x): Rn �� Rk is the objective function, and g(x): Rn �� Rm are the constraint func- tions. The vectors xL and xU are the bounds on the variables x. The functions f(x) and g(x) can be nonlinear and nonconvex. The variables can be integer, continuous real number, binary, and categorical. Kindly see https://www.gerad.ca/en/software/nomad and the references below for details. Usage snomadr(eval.f, n, bbin = NULL, bbout = NULL, x0 = NULL, lb = NULL, ub = NULL, nmulti = 0, random.seed = 0, opts = list(), print.output = TRUE, information = list(), snomadr.environment = new.env(), ... )

Page 59 |

readthedocs.io/en/latest/#

x0 vector with starting values for the optimization. If it is provided and nmulti is bigger than 1, x0 will be the first initial point for multiple initial points lb vector with lower bounds of the controls (use -1.0e19 for controls without lower bound) ub vector with upper bounds of the controls (use 1.0e19 for controls without upper bound) nmulti when it is missing, or it is equal to 0 and x0 is provided, snomadRSolve will be called to solve the problem. Otherwise, smultinomadRSolve will be called random.seed when it is not missing and not equal to 0, the initial points will be generated using this seed when nmulti > 0 opts list of options for NOMAD, see the NOMAD user guide https://nomad-4-user-guide.

readthedocs.io/en/latest/#. Options can also be set by nomad.opt which

should be in the folder where R (snomadr) is executed. Options that affect the solution and their defaults and some potential values are "MAX_BB_EVAL"=10000 "INITIAL_MESH_SIZE"=1 "MIN_MESH_SIZE"="r1.0e-10" "MIN_POLL_SIZE"=1 Note that the "r..." denotes relative measurement (relative to lb and ub) Note that decreasing the maximum number of black box evaluations ("MAX_BB_EVAL") will terminate search sooner and may result in a less accurate solution. For com- plicated problems you may want to increase this value. When experimenting it is desirable to set "DISPLAY_DEGREE"=1 (or a larger integer) to get some sense for how the algorithm is progressing print.output when FALSE, no output from snomadr is displayed on the screen. If the NO- MAD option "DISPLAY_DEGREE"=0, is set, there will also be no output from NOMAD. Higher integer values for "DISPLAY_DEGREE"= provide successively more detail information is a list. snomadr will call snomadRInfo to return the information about NO- MAD according to the values of "info", "version" and "help". "info"="-i": display the usage and copyright of NOMAD "version"="-v": display the version of NOMAD you are using "help"="-h": display all options You also can display a specific option, for example, "help"="-h x0", this will tell you how to set x0

Page 60 |

optim, nlm, nlminb

Page 61 |

## Not run: ## List all options snomadr(information=list("help"="-h")) ## Print given option, for example, MESH_SIZE snomadr(information=list("help"="-h MESH_SIZE")) ## Print the version of NOMAD snomadr(information=list("version"="-v")) ## Print usage and copyright snomadr(information=list("info"="-i")) ## This is the example found in ## NOMAD/examples/basic/library/single_obj/basic_lib.cpp eval.f <- function ( x ) { f <- c(Inf, Inf, Inf); n <- length (x); if ( n == 5 && ( is.double(x) || is.integer(x) ) ) { f[1] <- x[5]; f[2] <- sum ( (x-1)^2 ) - 25; f[3] <- 25 - sum ( (x+1)^2 ); } return ( as.double(f) ); } ## Initial values x0 <- rep(0.0, 5 ) bbin <-c(1, 1, 1, 1, 1) ## Bounds lb <- rep(-6.0,5 ) ub <- c(5.0, 6.0, 7.0, 1000000, 100000) bbout <-c(0, 2, 1) ## Options opts <-list("MAX_BB_EVAL"=500, "MIN_MESH_SIZE"=0.001, "INITIAL_MESH_SIZE"=0.1, "MIN_POLL_SIZE"=1) snomadr(eval.f=eval.f,n=5, x0=x0, bbin=bbin, bbout=bbout, lb=lb, ub=ub, opts=opts) ## How to transfer other parameters into eval.f ## ## First example: supply additional arguments in user-defined functions

Page 62 |

## ## objective function and gradient in terms of parameters eval.f.ex1 <- function(x, params) { return( params[1]*x^2 + params[2]*x + params[3] ) } ## Define parameters that we want to use params <- c(1,2,3) ## Define initial value of the optimization problem x0 <- 0 ## solve using snomadr snomadr( n =1, x0 = x0, eval.f = eval.f.ex1, params = params ) ## ## Second example: define an environment that contains extra parameters ## ## Objective function and gradient in terms of parameters ## without supplying params as an argument eval.f.ex2 <- function(x) { return( params[1]*x^2 + params[2]*x + params[3] ) } ## Define initial value of the optimization problem x0 <- 0 ## Define a new environment that contains params auxdata <- new.env() auxdata$params <- c(1,2,3) ## pass The environment that should be used to evaluate functions to snomadr snomadr(n =1, x0 = x0, eval.f = eval.f.ex2, snomadr.environment = auxdata ) ## Solve using algebra cat( paste( "Minimizing f(x) = ax^2 + bx + c\n" ) ) cat( paste( "Optimal value of control is -b/(2a) = ", -params[2]/(2*params[1]), "\n" ) ) cat( paste( "With value of the objective function f(-b/(2a)) = ", eval.f.ex1( -params[2]/(2*params[1]), params ), "\n" ) ) ## The following example is NOMAD/examples/advanced/multi_start/multi.cpp ## This will call smultinomadRSolve to resolve the problem. eval.f.ex1 <- function(x, params) { M<-as.numeric(params$M)

Page 63 |

NBC<-as.numeric(params$NBC) f<-rep(0, M+1) x<-as.numeric(x) x1 <- rep(0.0, NBC) y1 <- rep(0.0, NBC) x1[1]<-x[1] x1[2]<-x[2] y1[3]<-x[3] x1[4]<-x[4] y1[4]<-x[5] epi <- 6 for(i in 5:NBC){ x1[i]<-x[epi] epi <- epi+1 y1[i]<-x[epi] epi<-epi+1 } constraint <- 0.0 ic <- 1 f[ic]<-constraint ic <- ic+1 constraint <- as.numeric(1.0) distmax <- as.numeric(0.0) avg_dist <- as.numeric(0.0) dist1<-as.numeric(0.0) for(i in 1:(NBC-1)){ for (j in (i+1):NBC){ dist1 <- as.numeric((x1[i]-x1[j])*(x1[i]-x1[j])+(y1[i]-y1[j])*(y1[i]-y1[j])) if((dist1 > distmax)) {distmax <- as.numeric(dist1)} if((dist1[1]) < 1) {constraint <- constraint*sqrt(dist1)} else if((dist1) > 14) {avg_dist <- avg_dist+sqrt(dist1)} } } if(constraint < 0.9999) constraint <- 1001.0-constraint else constraint = sqrt(distmax)+avg_dist/(10.0*NBC) f[2] <- 0.0 f[M+1] <- constraint return(as.numeric(f) ) } ## Define parameters that we want to use params<-list()

Page 64 |

NBC <- 5 M <- 2 n<-2*NBC-3 params$NBC<-NBC params$M<-M x0<-rep(0.1, n) lb<-rep(0, n) ub<-rep(4.5, n) eval.f.ex1(x0, params) bbout<-c(2, 2, 0) nmulti=5 bbin<-rep(0, n) ## Define initial value of the optimization problem ## Solve using snomadRSolve snomadr(n = as.integer(n), x0 = x0, eval.f = eval.f.ex1, bbin = bbin, bbout = bbout, lb = lb, ub = ub, params = params ) ## Solve using smultinomadRSolve, if x0 is provided, x0 will ## be the first initial point, otherwise, the program will ## check best_x.txt, if it exists, it will be read in as ## the first initial point. Other initial points will be ## generated by uniform distribution. ## nmulti represents the number of mads will run. ## snomadr(n = as.integer(n), eval.f = eval.f.ex1, bbin = bbin, bbout = bbout, lb = lb, ub = ub, nmulti = as.integer(nmulti), print.output = TRUE, params = params ) ## End(Not run)

tensor.prod.model.matrix Utility functions for constructing tensor product smooths

Page 65 |

te, smooth.construct.tensor.smooth.spec

Examples

X <- list(matrix(1:4,2,2),matrix(5:10,2,3)) tensor.prod.model.matrix(X)

Page 66 |

unique can do the same thing, including for non-numeric matrices, but more slowly and without

returning the index. Examples

X<-matrix(c(1,2,3,1,2,3,4,5,6,1,3,2,4,5,6,1,1,1),6,3,byrow=TRUE) print(X) Xu <- uniquecombs(X);Xu ind <- attr(Xu,"index") ## find the value for row 3 of the original from Xu Xu[ind[3],];X[3,]

Page 67 |

Page 68 |

## Not run: data(wage1) ## Cross-validated model selection for spline degree and bandwidths Note ## - we override the default nmulti here to get a quick illustration ## (we dont advise doing this, in fact advise using more restarts in ## serious applications) model <- crs(lwage~married+ female+ nonwhite+ educ+ exper+ tenure, basis="additive", complexity="degree", data=wage1, segments=c(1,1,1), nmulti=1) summary(model) ## Residual plots plot(model) ## Partial mean plots (control for non axis predictors) plot(model,mean=TRUE) ## Partial first derivative plots (control for non axis predictors) plot(model,deriv=1) ## Partial second derivative plots (control for non axis predictors) plot(model,deriv=2) ## Compare with local linear kernel regression require(np) model <- npreg(lwage~married+ female+ nonwhite+ educ+ exper+ tenure, regtype="ll", bwmethod="cv.aic", data=wage1)

Page 69 |

summary(model) ## Partial mean plots (control for non axis predictors) plot(model,common.scale=FALSE) ## Partial first derivative plots (control for non axis predictors) plot(model,gradients=TRUE,common.scale=FALSE) detach("package:np") ## End(Not run)

Page 70 |

∗ datasets cps71, 8 Engel95, 30 wage1, 67 ∗ instrument crsiv, 18 crsivderiv, 23 ∗ interface snomadr, 58 ∗ models glp.model.matrix, 39 tensor.prod.model.matrix, 64 uniquecombs, 66 ∗ nonparametric clsd, 3 crs, 10 crssigtest, 28 frscv, 32 frscvNOMAD, 35 gsl.bs, 40 krscv, 42 krscvNOMAD, 45 npglpreg, 50 ∗ optimize snomadr, 58 ∗ package crs-package, 2 ∗ regression glp.model.matrix, 39 npglpreg, 50 tensor.prod.model.matrix, 64 uniquecombs, 66 ∗ smooth glp.model.matrix, 39 tensor.prod.model.matrix, 64 bs, 41 class, 15 clsd, 3 coef, 6 cps71, 8 crs, 10, 20–22, 25, 26, 60 crs-package, 2 crsiv, 18, 26 crsivderiv, 21, 23 crssigtest, 28 Engel95, 30 extendrange, 5 factor, 10, 11, 14, 15, 32, 34–38, 42–46, 48,

50

fitted, 6, 15, 21, 55 frscv, 32, 60 frscvNOMAD, 13, 35, 60 glp.model.matrix, 39 gsl.bs, 40 hatvalues, 16 krscv, 42, 60 krscvNOMAD, 13, 45, 60 lm, 13, 15 loess, 16, 34, 39, 45, 49 logspline, 7 model.frame, 15 nlm, 60 nlminb, 60 npglpreg, 50 npreg, 16, 22, 26, 57 npregbw, 34, 39, 45, 49 optim, 5, 12, 14, 43, 44, 60 ordered, 10, 11, 14, 15, 32, 34–38, 42–46, 48,

50

plot, 6, 15, 21, 55 70

Page 71 |

65

smooth.spline, 16 snomadr, 5, 12, 13, 20, 25, 37, 47, 48, 52, 53,

56, 58

stepAIC, 13, 15 summary, 6, 15, 16, 21, 29, 34, 38, 44, 48, 55 te, 65 tensor.prod.model.matrix, 2, 6, 12, 14, 34,

37, 44, 48, 64

unique, 66 uniquecombs, 66 vignette, 2 wage1, 67

- The Voice of Downtown Los Angeles
- Nexus 7 Guidebook
- Story and Narrative Structures in Computer Games
- Can Value-Based Reimbursement Models Transform Health Care?
- TEMPLATE FOR FOUNDATIONS COURSE OUTLINES
- Doktori értekezés
- Optimization Methods in Computational Fluid Dynamics
- Participant and Non-participant Observation in Gambling
- Instagram for business Strategy guide
- gar003, Chapter 3 Systems Design: Job-Order Costing
- Focus On Government Purchasing
- Untitled
- 'UBUNTU': A PERSON IS A PERSON THROUGH OTHER PERSONS
- Home
- UNIVERSIDAD DE BUENOS AIRES
- Language and Style in Writing
- Do Investment Banks Matter for M&A Returns?
- DEFINITIONS OF MENTAL HEALTH PROMOTION (MHP) vs. MENTAL ILLNESS PREVENTION (MIP)/MENTAL DISORDER PREVENTION (MDP)
- TEMA 7: EL PATRIMONIO DE LA EMPRESA
- Four Theories of Myth
- The Effects of the development of Distance Education on the K-12 Curriculum: Cyberschools

- Minority Movies
- Three decades of reform and opening up
- Field
- Space to build
- Identity Construction
- Diversity in Unity
- Advertisement offering a reward
- Contract
- Legal system
- Legislative proposals
- Corporate double subject
- Lifecycle
- Managers' Incentive
- Incentives
- Mechanism
- Aortic dissection
- Abdominal aortic aneurysm
- Endovascular repair (EVAR)
- Capillary Electrophoresis
- Amperometric detection
- Chrysanthemum

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com