Purpose:

I created the following SAS code in order to implement the Stepwise Regression algorithm in SAS. This SAS macro performs an automated backward elimination variable selection process for PROC GENMOD which does not come with model selection options. Note that the GENMOD procedure in SAS versions prior to 9.4 does not come with model selection options.

Introduction:

SAS users of SAS 9.2 and prior versions may face situations where some "powerful" options are only available in certain SAS procedures but not available in others. For example, the model selection options are available in PROC REG, LOGISTIC, PHREG, etc., but not in PROC GENMOD, CATMOD, MIXED, etc. This backwards selection macro could be used with the procedures GENNMOD, CATMOD, MIXED, GLIMMIX, etc.

Illustration:

The following SAS statements simulate 5000 observations, which are based on an underlying Tweedie generalized linear model (GLM) that exploits its connection with the compound Poisson distribution. A natural logarithm link function is assumed for modeling the response variable (yTweedie), and there are five categorical variables (C1–C5), each of which has four numerical levels and two continuous variables (D1 and D2). By design, two of the categorical variables, C3 and C4, and one of the two continuous variables, D2, have no effect on the response. The dispersion parameter is set to 0.5, and the power parameter is set to 1.5.

%let nObs = 5000;
%let nClass = 5;
%let nLevs = 4;
%let seed = 1234;

data tmp1;
   array c{&nClass};

   keep c1-c&nClass yTweedie d1 d2;

   /* Tweedie parms */
   phi=0.5;
   p=1.5;

   do i=1 to &nObs;

      do j=1 to &nClass;
         c{j} = int(ranuni(1)*&nLevs);
      end;

      d1 = ranuni(&seed);
      d2 = ranuni(&seed);

      xBeta = 0.5*((c2<2) - 2*(c1=1) + 0.5*c&nClass + 0.05*d1);
      mu = exp(xBeta);

      /* Poisson distributions parms */
      lambda = mu**(2-p)/(phi*(2-p));
      /* Gamma distribution parms */
      alpha = (2-p)/(p-1);
      gamma = phi*(p-1)*(mu**(p-1));

      rpoi = ranpoi(&seed,lambda);
      if rpoi=0 then yTweedie=0;
      else do;
         yTweedie=0;
         do j=1 to rpoi;
         yTweedie = yTweedie + rangam(&seed,alpha);
         end;
         yTweedie = yTweedie * gamma;
      end;
      output;
   end;
run;

11   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
12   
13   %let nObs = 5000;
14   %let nClass = 5;
15   %let nLevs = 4;
16   %let seed = 1234;
17   
18   data tmp1;
19      array c{&nClass};
20   
21      keep c1-c&nClass yTweedie d1 d2;
22   
23      /* Tweedie parms */
24      phi=0.5;
25      p=1.5;
26   
27      do i=1 to &nObs;
28   
29         do j=1 to &nClass;
30            c{j} = int(ranuni(1)*&nLevs);
31         end;
32   
33         d1 = ranuni(&seed);
34         d2 = ranuni(&seed);
35   
36         xBeta = 0.5*((c2<2) - 2*(c1=1) +
36 ! 0.5*c&nClass + 0.05*d1);
37         mu = exp(xBeta);
38   
39         /* Poisson distributions parms */
40         lambda = mu**(2-p)/(phi*(2-p));
41         /* Gamma distribution parms */
42         alpha = (2-p)/(p-1);
43         gamma = phi*(p-1)*(mu**(p-1));
44   
45         rpoi = ranpoi(&seed,lambda);
46         if rpoi=0 then yTweedie=0;
47         else do;
48            yTweedie=0;
49            do j=1 to rpoi;
50            yTweedie = yTweedie + rangam(&seed,alpha);
51            end;
52            yTweedie = yTweedie * gamma;
53         end;
54         output;
55      end;
56   run;
NOTE: The data set WORK.TMP1 has 5000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
      
57   ods html5 close;ods listing;

58

The following code generates a basic explanatory data analysis for the dependent and independent variables. The histogram for the yTweedie dependent variable and the independent character variable c1-c5:

/* EDA */

%let var_char = yTweedie c1 c2 c3 c4 c5 d1 d2;

%put &var_char;


data var_char;
    set tmp1
    (keep= &var_char);     
run;

proc contents data = var_char varnum nodetails noprint 
out=var_char_names (keep=name);
run;

data var_char_names;
    set var_char_names;
    j = _n_;
run;

* Determine the number of observations;
data _NULL_;
    if 0 then set var_char_names nobs=n;
    call symputx('nrows',n);
    stop;
run;

%put &nrows;

%macro do_eda_uni;
%do obs = 1 %to &nrows;

data _null_;
    set var_char_names;
    if j = &obs  then call symputx("var", put(name, 10.));
run;
%if (%upcase(&var)=YTWEEDIE) or (%upcase(&var)=D1) or (%upcase(&var)=D2)   %then %do; 

    ods graphics on;
        proc means data=tmp1 fw=12 printalltypes chartype
            qmethod=os maxdec=2

            mean 
            min 
            max 
            mode 
            range 
            n 
            nmiss   
            p1 
            p5 
            median 
            p95 
            p99 ;
            var &var;
        run;

        title "histograms";
        proc univariate data=tmp1   noprint;
            var &var;
            histogram ;
        run; 
    ods graphics off;
    %end;

    %else %do;
    ods graphics on;
        proc freq data=tmp1
        order=internal;
        tables &var /  scores=table plots(only)=freq;
        run;
    ods graphics off;
    %end;

%end;
%mend do_eda_uni;

%do_eda_uni;

SAS Output

The SAS System

The FREQ Procedure

Table c1

One-Way Frequencies

c1	Frequency	Percent	Cumulative Frequency	Cumulative Percent
0	1218	24.36	1218	24.36
1	1233	24.66	2451	49.02
2	1263	25.26	3714	74.28
3	1286	25.72	5000	100.00

Distribution Plots

Frequency Plot

The SAS System

The FREQ Procedure

Table c2

One-Way Frequencies

c2	Frequency	Percent	Cumulative Frequency	Cumulative Percent
0	1247	24.94	1247	24.94
1	1222	24.44	2469	49.38
2	1262	25.24	3731	74.62
3	1269	25.38	5000	100.00

Distribution Plots

Frequency Plot

The SAS System

The FREQ Procedure

Table c3

One-Way Frequencies

c3	Frequency	Percent	Cumulative Frequency	Cumulative Percent
0	1209	24.18	1209	24.18
1	1340	26.80	2549	50.98
2	1254	25.08	3803	76.06
3	1197	23.94	5000	100.00

Distribution Plots

Frequency Plot

The SAS System

The FREQ Procedure

Table c4

One-Way Frequencies

c4	Frequency	Percent	Cumulative Frequency	Cumulative Percent
0	1210	24.20	1210	24.20
1	1263	25.26	2473	49.46
2	1292	25.84	3765	75.30
3	1235	24.70	5000	100.00

Distribution Plots

Frequency Plot

The SAS System

The FREQ Procedure

Table c5

One-Way Frequencies

c5	Frequency	Percent	Cumulative Frequency	Cumulative Percent
0	1249	24.98	1249	24.98
1	1208	24.16	2457	49.14
2	1267	25.34	3724	74.48
3	1276	25.52	5000	100.00

Distribution Plots

Frequency Plot

The SAS System

The MEANS Procedure

Summary statistics

Analysis Variable : d1
Mean	Minimum	Maximum	Mode	Range	N	N Miss	1st Pctl	5th Pctl	Median	95th Pctl	99th Pctl
0.51	0.00	1.00	.	1.00	5000	0	0.01	0.06	0.52	0.95	0.99

histograms

The UNIVARIATE Procedure

d1

Histogram 1

Panel 1

histograms

The MEANS Procedure

Summary statistics

Analysis Variable : d2
Mean	Minimum	Maximum	Mode	Range	N	N Miss	1st Pctl	5th Pctl	Median	95th Pctl	99th Pctl
0.50	0.00	1.00	.	1.00	5000	0	0.01	0.05	0.49	0.95	0.99

histograms

The UNIVARIATE Procedure

d2

Histogram 1

Panel 1

histograms

The MEANS Procedure

Summary statistics

Analysis Variable : yTweedie
Mean	Minimum	Maximum	Mode	Range	N	N Miss	1st Pctl	5th Pctl	Median	95th Pctl	99th Pctl
1.72	0.00	12.78	0.00	12.78	5000	0	0.00	0.16	1.37	4.54	6.40

histograms

The UNIVARIATE Procedure

yTweedie

Histogram 1

Panel 1

The next lines contain the two SAS macros for the backwards elimination selection process using a Tweedie error function.

The first macro %MdStmt is a stand-alone macro. The main macro, %MdSelect, consists of multiple calls to the macro %MdStmt.

/* Variable Selection Macro: Backwards elimination */
%let p=1.5;
options mlogic;
%macro MdStmt(
        resvar = /*response variable */
       ,expvar = /*list of explanatory variables, separated by ' ' */
       ,clsvar = /*classification variables in the CLASS statement separated by ' ' */
       ,p = 
       );

        ods output Type3=pval(rename=source=parm);
        proc genmod data=tmp1 NAMELEN=50; 
            if _resp_ > 0 then 
            d = 2*(_resp_*(_resp_**(1-&p)-_mean_**(1-&p))/
            (1-&p)-(_resp_**(2-&p)-_mean_**(2-&p))/(2-&p)); 
            else d = 2* _mean_**(2-&p)/(2-&p);
            variance var = _mean_**&p;
            deviance dev = d;
            class &clsvar;  
            model &resvar =  &expvar /link=log type3 scale=pearson;                 
            *scwgt expos;
            title "&resvar = &expvar";  
        run;
        ods output close;
 %mend MdStmt;

144  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
145  
146  /* Variable Selection Macro: Backwards elimination */
147  %let p=1.5;
148  options mlogic;
149  %macro MdStmt(
150        resvar = /*response variable */
151      ,expvar = /*list of explanatory variables, separated by ' ' */
152        ,clsvar = /*classification variables in the CLASS statement separated by ' ' */
153       ,p =
154      );
155  
156          ods output Type3=pval(rename=source=parm);
157       proc genmod data=tmp1 NAMELEN=50;
158             if _resp_ > 0 then
159             d = 2*(_resp_*(_resp_**(1-&p)-_mean_**(1-&p))/
160            (1-&p)-(_resp_**(2-&p)-_mean_**(2-&p))/(2-&p));
161           else d = 2* _mean_**(2-&p)/(2-&p);
162            variance var = _mean_**&p;
163            deviance dev = d;
164             class &clsvar;  
165              model &resvar =  &expvar /link=log type3 scale=pearson;
166           *scwgt expos;
167              title "&resvar = &expvar";   
168          run;
169          ods output close;
170   %mend MdStmt;
171  ods html5 close;ods listing;

172

There are five macro parameters in the macro %MdSelect: &VAR, &INTVAR, &CATVAR, &SLSTAY and &POWER:

&VAR is the response variable which will be passed into &RESVAR when calling the macro %MdStmt;
&INTVAR includes all the potential explanatory variables which will be passed into &EXPVAR in %MdStmt only forthe first call;
&CATVAR contains all the categorical explanatory variables which will be passed into %CLSVAR in %MdStmt;
&SLSTAY is the criteria for removing variable;
and &POWER is the power parameter of the Tweedie distribution

%macro MdSelect(
       var= /*response variable */
       ,intvar= /*initial explanatory variables for full model */
       ,catvar= /*categorical explanatory variables */
       ,slstay= /*criterion for removing variable */
       ,power=
       );
    %let var=%upcase(&var);
    %let intvar=%upcase(&intvar);
    %let catvar=%upcase(&catvar);
    %let power =&power; 
%*-------------------------------------------------------------------------*;
%* Create empty dataset "step" with only one column "parm". It will be *;
%* merged with "pval" from PROC GENMOD by "parm" *;
%*-------------------------------------------------------------------------*;
 proc sql;
    create table step_&var (parm char(9));
 quit;
%*------------------------------------------------------------------------------*;
%* %do %until performs multivariate backward model selection: *;
%* In each iteration: *;
%* 1. Run the logistic regression model *;
%* 2. Update the dataset "step_&var" *;
%* 3. Create &pmax as the maximum p-value, and &varlist as the list of *;
%* variables without the one with the max p-value *;
%* 4. Check whether the max p-value <= &SLSTAY *;
%* 5. If NO, then eliminate the variable with max p-value, repeat step 1 to 4.*;
%* If YES, the loop stops *;
%*------------------------------------------------------------------------------*;
 %let i=1;
 %do %until (&pmax<=&slstay);

    %if &i = 1 %then
        %MdStmt(resvar=&var ,expvar=&intvar, clsvar=&catvar, p=&power); %*initial model;
    %else %do;
        %MdStmt(resvar=&var ,expvar=&varlist, clsvar=&catvar, p=&power); %*reduced model;
    %end;
    proc sort data=step_&var; by parm;
    proc sort data=pval; by parm;
    data step_&var;
        merge step_&var pval;
        by parm;
        p&i=put(ProbChiSq, pvalue6.3);
        drop ProbChiSq ChiSq DF;
    run;
    proc sql noprint;
        select max(ProbChiSq) into :pmax
        from pval;  
        select distinct parm into :varlist separated by ' '
        from pval
        having ProbChiSq^=max(ProbChiSq);
    quit;

    %let i=%eval(&i+1);

 %end;
 proc print data=step_&var;
    title "&var: model selection process";
 run;
%mend MdSelect; 

%MdSelect(var=yTweedie, intvar=c1 c2 c3 c4 c5 d1 d2, catvar=c1 c2 c3 c4 c5, slstay=0.05, power=1.5);

SAS Output

YTWEEDIE = C1 C2 C3 C4 C5 D1 D2

The GENMOD Procedure

Model Information

Model Information
Data Set	WORK.TMP1
Distribution	User
Link Function	Log
Dependent Variable	yTweedie

Number of Observations

Number of Observations Read	5000
Number of Observations Used	5000

Class Level Information

Class Level Information
Class	Levels	Values
c1	4	0 1 2 3
c2	4	0 1 2 3
c3	4	0 1 2 3
c4	4	0 1 2 3
c5	4	0 1 2 3

Criteria For Assessing Goodness Of Fit

Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance	4982	2730.8750	0.5481
Scaled Deviance	4982	5581.9454	1.1204
Pearson Chi-Square	4982	2437.3615	0.4892
Scaled Pearson X2	4982	4982.0000	1.0000
Log Likelihood		-2790.9727
Full Log Likelihood		-2790.9727
AIC (smaller is better)		5617.9454
AICC (smaller is better)		5618.0827
BIC (smaller is better)		5735.2549

Convergence Status

Algorithm converged.

Analysis Of Parameter Estimates

Analysis Of Maximum Likelihood Parameter Estimates
Parameter		DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept		1	0.7181	0.0405	0.6386	0.7975	313.63	<.0001
c1	0	1	-0.0347	0.0237	-0.0811	0.0116	2.15	0.1422
c1	1	1	-1.0170	0.0271	-1.0701	-0.9638	1405.32	<.0001
c1	2	1	-0.0091	0.0234	-0.0549	0.0367	0.15	0.6956
c1	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c2	0	1	0.4966	0.0249	0.4478	0.5454	397.70	<.0001
c2	1	1	0.5139	0.0251	0.4648	0.5630	420.61	<.0001
c2	2	1	-0.0098	0.0264	-0.0615	0.0420	0.14	0.7118
c2	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c3	0	1	0.0118	0.0255	-0.0382	0.0617	0.21	0.6439
c3	1	1	0.0154	0.0248	-0.0332	0.0640	0.38	0.5351
c3	2	1	0.0498	0.0251	0.0007	0.0990	3.95	0.0469
c3	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c4	0	1	0.0060	0.0252	-0.0434	0.0553	0.06	0.8132
c4	1	1	0.0064	0.0248	-0.0423	0.0551	0.07	0.7977
c4	2	1	-0.0092	0.0248	-0.0578	0.0395	0.14	0.7113
c4	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c5	0	1	-0.7479	0.0252	-0.7973	-0.6986	882.99	<.0001
c5	1	1	-0.4761	0.0245	-0.5240	-0.4281	378.73	<.0001
c5	2	1	-0.2523	0.0234	-0.2981	-0.2065	116.67	<.0001
c5	3	0	0.0000	0.0000	0.0000	0.0000	.	.
d1		1	0.0618	0.0308	0.0013	0.1222	4.01	0.0452
d2		1	0.0150	0.0303	-0.0445	0.0745	0.24	0.6212
Scale		0	0.6995	0.0000	0.6995	0.6995

Note:The scale parameter was estimated by the square root of Pearson's Chi-Square/DOF.

LR Statistics For Type 3 Analysis - Scaled

LR Statistics For Type 3 Analysis
Source	Num DF	Den DF	F Value	Pr > F	Chi-Square	Pr > ChiSq
c1	3	4982	607.21	<.0001	1821.63	<.0001
c2	3	4982	276.99	<.0001	830.98	<.0001
c3	3	4982	1.48	0.2168	4.45	0.2167
c4	3	4982	0.17	0.9148	0.52	0.9148
c5	3	4982	324.79	<.0001	974.37	<.0001
d1	1	4982	4.01	0.0452	4.01	0.0452
d2	1	4982	0.24	0.6212	0.24	0.6212

YTWEEDIE = c1 c2 c3 c5 d1 d2

The GENMOD Procedure

Model Information

Model Information
Data Set	WORK.TMP1
Distribution	User
Link Function	Log
Dependent Variable	yTweedie

Number of Observations

Number of Observations Read	5000
Number of Observations Used	5000

Class Level Information

Class Level Information
Class	Levels	Values
c1	4	0 1 2 3
c2	4	0 1 2 3
c3	4	0 1 2 3
c4	4	0 1 2 3
c5	4	0 1 2 3

Criteria For Assessing Goodness Of Fit

Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance	4985	2731.1286	0.5479
Scaled Deviance	4985	5584.6187	1.1203
Pearson Chi-Square	4985	2437.8882	0.4890
Scaled Pearson X2	4985	4985.0000	1.0000
Log Likelihood		-2792.3094
Full Log Likelihood		-2792.3094
AIC (smaller is better)		5614.6187
AICC (smaller is better)		5614.7150
BIC (smaller is better)		5712.3766

Convergence Status

Algorithm converged.

Analysis Of Parameter Estimates

Analysis Of Maximum Likelihood Parameter Estimates
Parameter		DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept		1	0.7189	0.0378	0.6448	0.7931	361.45	<.0001
c1	0	1	-0.0348	0.0236	-0.0811	0.0116	2.16	0.1412
c1	1	1	-1.0168	0.0271	-1.0700	-0.9637	1405.64	<.0001
c1	2	1	-0.0091	0.0234	-0.0549	0.0367	0.15	0.6959
c1	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c2	0	1	0.4965	0.0249	0.4477	0.5453	397.93	<.0001
c2	1	1	0.5139	0.0250	0.4648	0.5630	421.34	<.0001
c2	2	1	-0.0100	0.0264	-0.0617	0.0417	0.14	0.7049
c2	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c3	0	1	0.0120	0.0255	-0.0379	0.0619	0.22	0.6363
c3	1	1	0.0154	0.0248	-0.0331	0.0640	0.39	0.5335
c3	2	1	0.0500	0.0251	0.0009	0.0991	3.98	0.0461
c3	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c5	0	1	-0.7483	0.0252	-0.7976	-0.6990	884.42	<.0001
c5	1	1	-0.4764	0.0245	-0.5243	-0.4285	379.62	<.0001
c5	2	1	-0.2523	0.0234	-0.2981	-0.2065	116.72	<.0001
c5	3	0	0.0000	0.0000	0.0000	0.0000	.	.
d1		1	0.0619	0.0308	0.0015	0.1223	4.04	0.0445
d2		1	0.0148	0.0303	-0.0446	0.0742	0.24	0.6257
Scale		0	0.6993	0.0000	0.6993	0.6993

Note:The scale parameter was estimated by the square root of Pearson's Chi-Square/DOF.

LR Statistics For Type 3 Analysis - Scaled

LR Statistics For Type 3 Analysis
Source	Num DF	Den DF	F Value	Pr > F	Chi-Square	Pr > ChiSq
c1	3	4985	607.35	<.0001	1822.04	<.0001
c2	3	4985	277.51	<.0001	832.53	<.0001
c3	3	4985	1.49	0.2151	4.47	0.2149
c5	3	4985	325.45	<.0001	976.35	<.0001
d1	1	4985	4.04	0.0446	4.04	0.0445
d2	1	4985	0.24	0.6257	0.24	0.6257

YTWEEDIE = c1 c2 c3 c5 d1

The GENMOD Procedure

Model Information

Model Information
Data Set	WORK.TMP1
Distribution	User
Link Function	Log
Dependent Variable	yTweedie

Number of Observations

Number of Observations Read	5000
Number of Observations Used	5000

Class Level Information

Class Level Information
Class	Levels	Values
c1	4	0 1 2 3
c2	4	0 1 2 3
c3	4	0 1 2 3
c4	4	0 1 2 3
c5	4	0 1 2 3

Criteria For Assessing Goodness Of Fit

Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance	4986	2731.2449	0.5478
Scaled Deviance	4986	5584.3940	1.1200
Pearson Chi-Square	4986	2438.5792	0.4891
Scaled Pearson X2	4986	4986.0000	1.0000
Log Likelihood		-2792.1970
Full Log Likelihood		-2792.1970
AIC (smaller is better)		5612.3940
AICC (smaller is better)		5612.4783
BIC (smaller is better)		5703.6347

Convergence Status

Algorithm converged.

Analysis Of Parameter Estimates

Analysis Of Maximum Likelihood Parameter Estimates
Parameter		DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept		1	0.7264	0.0346	0.6587	0.7942	441.89	<.0001
c1	0	1	-0.0352	0.0236	-0.0815	0.0111	2.22	0.1363
c1	1	1	-1.0172	0.0271	-1.0703	-0.9641	1407.64	<.0001
c1	2	1	-0.0091	0.0234	-0.0549	0.0366	0.15	0.6954
c1	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c2	0	1	0.4963	0.0249	0.4475	0.5451	397.71	<.0001
c2	1	1	0.5137	0.0250	0.4646	0.5628	421.08	<.0001
c2	2	1	-0.0101	0.0264	-0.0618	0.0416	0.15	0.7020
c2	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c3	0	1	0.0125	0.0254	-0.0374	0.0624	0.24	0.6234
c3	1	1	0.0155	0.0248	-0.0330	0.0641	0.39	0.5306
c3	2	1	0.0502	0.0251	0.0011	0.0993	4.01	0.0451
c3	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c5	0	1	-0.7483	0.0252	-0.7977	-0.6990	884.56	<.0001
c5	1	1	-0.4767	0.0244	-0.5246	-0.4288	380.16	<.0001
c5	2	1	-0.2525	0.0234	-0.2983	-0.2068	116.92	<.0001
c5	3	0	0.0000	0.0000	0.0000	0.0000	.	.
d1		1	0.0621	0.0308	0.0018	0.1225	4.07	0.0437
Scale		0	0.6993	0.0000	0.6993	0.6993

Note:The scale parameter was estimated by the square root of Pearson's Chi-Square/DOF.

LR Statistics For Type 3 Analysis - Scaled

LR Statistics For Type 3 Analysis
Source	Num DF	Den DF	F Value	Pr > F	Chi-Square	Pr > ChiSq
c1	3	4986	607.82	<.0001	1823.47	<.0001
c2	3	4986	277.41	<.0001	832.22	<.0001
c3	3	4986	1.50	0.2132	4.49	0.2131
c5	3	4986	325.55	<.0001	976.64	<.0001
d1	1	4986	4.07	0.0437	4.07	0.0437

YTWEEDIE = c1 c2 c5 d1

The GENMOD Procedure

Model Information

Model Information
Data Set	WORK.TMP1
Distribution	User
Link Function	Log
Dependent Variable	yTweedie

Number of Observations

Number of Observations Read	5000
Number of Observations Used	5000

Class Level Information

Class Level Information
Class	Levels	Values
c1	4	0 1 2 3
c2	4	0 1 2 3
c3	4	0 1 2 3
c4	4	0 1 2 3
c5	4	0 1 2 3

Criteria For Assessing Goodness Of Fit

Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance	4989	2733.4414	0.5479
Scaled Deviance	4989	5582.3056	1.1189
Pearson Chi-Square	4989	2442.9224	0.4897
Scaled Pearson X2	4989	4989.0000	1.0000
Log Likelihood		-2791.1528
Full Log Likelihood		-2791.1528
AIC (smaller is better)		5604.3056
AICC (smaller is better)		5604.3585
BIC (smaller is better)		5675.9947

Convergence Status

Algorithm converged.

Analysis Of Parameter Estimates

Analysis Of Maximum Likelihood Parameter Estimates
Parameter		DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept		1	0.7460	0.0309	0.6855	0.8066	583.67	<.0001
c1	0	1	-0.0346	0.0236	-0.0809	0.0116	2.15	0.1424
c1	1	1	-1.0170	0.0271	-1.0702	-0.9639	1406.61	<.0001
c1	2	1	-0.0092	0.0234	-0.0550	0.0366	0.15	0.6940
c1	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c2	0	1	0.4973	0.0249	0.4485	0.5461	399.04	<.0001
c2	1	1	0.5144	0.0250	0.4653	0.5635	421.94	<.0001
c2	2	1	-0.0097	0.0264	-0.0615	0.0421	0.13	0.7136
c2	3	0	0.0000	0.0000	0.0000	0.0000	.	.
c5	0	1	-0.7486	0.0252	-0.7980	-0.6993	884.84	<.0001
c5	1	1	-0.4775	0.0245	-0.5254	-0.4295	381.13	<.0001
c5	2	1	-0.2528	0.0234	-0.2986	-0.2070	117.12	<.0001
c5	3	0	0.0000	0.0000	0.0000	0.0000	.	.
d1		1	0.0620	0.0308	0.0016	0.1224	4.05	0.0442
Scale		0	0.6998	0.0000	0.6998	0.6998

Note:The scale parameter was estimated by the square root of Pearson's Chi-Square/DOF.

LR Statistics For Type 3 Analysis - Scaled

LR Statistics For Type 3 Analysis
Source	Num DF	Den DF	F Value	Pr > F	Chi-Square	Pr > ChiSq
c1	3	4989	607.22	<.0001	1821.67	<.0001
c2	3	4989	277.87	<.0001	833.62	<.0001
c5	3	4989	325.70	<.0001	977.11	<.0001
d1	1	4989	4.05	0.0442	4.05	0.0442

YTWEEDIE: model selection process

The PRINT Procedure

Data Set WORK.STEP_YTWEEDIE

Obs	parm	NumDF	DenDF	FValue	ProbF	Method	p1	p2	p3	p4
1	c1	3	4989	607.22	<.0001	LR	<.001	<.001	<.001	<.001
2	c2	3	4989	277.87	<.0001	LR	<.001	<.001	<.001	<.001
3	c3	3	4986	1.50	0.2132	LR	0.217	0.215	0.213	.
4	c4	3	4982	0.17	0.9148	LR	0.915	.	.	.
5	c5	3	4989	325.70	<.0001	LR	<.001	<.001	<.001	<.001
6	d1	1	4989	4.05	0.0442	LR	0.045	0.045	0.044	0.044
7	d2	1	4985	0.24	0.6257	LR	0.621	0.626	.	.

The execution of the above two macros create two outputs:

A summary table of the model selection process
The whole model selection process step by step

The summary table of the model selection process is the about last table. The table shows that the variable C4 is eliminated in the second step of the process. The variable D2 is eliminated in the third step. And the variable C3 is eliminated in the fourth step. After the fourth step the algorithm arrive at final main effects model.

Conclusion:

The above lines shows how the variable selection algorithm eliminates those variables (C3, C4 and D2) no associated with the dependent variable yTweedie - remember that the illustrative dataset was arterially created with this aim. Therefore, the macro works accurately.

The SAS macros %MdStmt and %MdSelect:

Performs a backwards elimination variable selection process
The last step in the elimination process shows the selected model and a summary table of the elimination process
The macro needs around 15 minutes to get results with a dataset of one million observations and around 13 variables
The elimination criteria is based on the p-values of the type 3 analysis
With small changes the macro is useful in a context with a GENMOD procedure under Gamma, Inverse Gaussian, Log-Normal, Binomial, Gaussian, Poisson, Negative Binomial, Zero Inflated Poisson and Zero inflated Negative Binomial error functions.
This macro could be useful as a template to create Forward and Stepwise variable selection processes
One drawback of the backwards elimination process is that if the full model with all potential main factors does not converge the macro does not work. That is one of the reasons because a forward option is interesting
The specification of the model is the same that the Tweedie macro used in the NAR project
This macro only admits main factors. So, it is not possible to include interactions in the model statement of the GENMODE procedure. To include interactions it is needed create a new variable with the interaction

References:

A detailed explanation of the algorithm and the code appears here:

Using Macro and ODS to Overcome Limitations of SAS® Procedures Jing Su and Wei (Lisa) Lin, Merck & Co, Inc., North Wales, PA

The dataset for the example comes from here:

http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_genmod_examples12.htm

I made some changes in order to get coherence results.