SAS is short for statistical analysis system. SAS is widely used in statistics research and especially industry. For a brief introduction of the SAS software system, refer to A SAS Tutorial for Programmers.
In that short tutorial, you'll see that a SAS program is usually composed of three types of paragraphs: one options section, one or more data section, and one or more proc section. The options section specifies your preferences such as linesize, pagesize and such. A data section allows you to type in data or read data from input devices. A proc section applies one SAS procedure (usually predefined) on one of the datasets you defined in data sections.
Corresponding to the three types of paragraphs in SAS, this tutorial is organized as follows: Section 2 briefly introduces how to add comments in a SAS program; Section 3 is about the options section: how to change the output format. Then Section 4 focuses on how to define datasets. Section 5 discusses major SAS procedures and how to use them. Finally, Section 6 proposes a set of tentative SAS programming conventions (for references only). 2 Add comments to your SAS program Comments are important to any kind of programming especially when other people are supposed to use, read, or maintain your programs. SAS supports two types of comments: the block comment and the line comment. A charcter denotes that everything from * until the end of a line is comment. Like C, C++, and Java, /* and */ pairs allows you to add block comments. This is an example of line comments: ... DATA storage; INPUT prodId amount; * amount is in dollars (this is a line comment) ... RUN; This is a block comment example: /* * show the means of variables in storage dataset * and more comments here */ PROC MEANS data=storage; VAR amount; RUN; 3 Specify Options When writing a SAS program, you might want to customize the look-and-feel of your output. 3.1 Set characters per line OPTIONS linesize = 80; or OPTIONS ls = 80; 3.2 Set lines per page OPTIONS pagesize = 60; Note that you can put multiple options in one single OPTIONS statement: OPTIONS ls = 80 ps = 60; 3.3 Allignment OPTIONS nocenter; 3.4 Start from a specific page number Start from page 100: OPTIONS PAGENO=100; 3.5 Disable time data information OPTIONS nodate; 3.6 Disable page number in output OPTIONS nonumber; 3.7 Skip n-lines To skip 10 lines before printing a page: OPTIONS skip=10; 3.8 Define how to print missing numeric values To print 'M' instead of the default '.': OPTIONS missing = 'M'; 3.9 Title and footnote Use TITLE and FOOTNOTE to set the default title and footnote for all output pages. Note that you can always change output of a specific PROC by setting adding a TITLE statement in that PROC. OPTIONS ls = 80 ps = 60; TITLE "this is my title"; FOOTNOTE "this is my footnote"; 4 Define Data The basic framework for DATA section: DATA dataSetName; data manipulation statements; RUN; 4.1 Create data sets in program To input data in your program, use CARDS or DATALINES keyword. For example DATA dataSetName; INPUT var1 var2$ var3; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN; Note that in the INPUT statement, if a variable name is followed by a $ sign, then this veriable is a string, otherwise the default type is numeric. By default, each row after the CARDS statement corresponds a record in your data set. Each record contains as many fields as you defined in your INPUT statement. For example, in our case, each record contains three fields named var1, var2(string), and var3. To allow multiple records in one line, end your INPUT statement with @@;. For example DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN; If the value for a specific variable is missing, put a dot where there should be a value: DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 . 45 4 'eee' . ; RUN; If each record in a data set is so large that you cannot put them into a single line, then use #1, #2 to indicate lines: DATA dataSetName; INPUT #1 var1 var2 var3 #2 var4 var5; CARDS; 1 'abc' 34 234 123 2 'eee' 35 123 123 ; RUN; The following example reads variables from rows formatted by number of characters: characters 1-3 forms id, 4-6 is age, and 7 is gender. DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; RUN; 4.2 Create data sets from data files DATA dataSetName; INFILE "you/file/name"; INPUT var1, var2, var3 $; RUN; Of course, if you have too many columns, you could give them generic names instead of naming them one by one. The following example reads in a data file with 100 columns, and names columns 1-99 var1 - var99 respectively, and names the last column score: DATA dataSetName; INFILE "you/file/name"; INPUT var1-var99 score; RUN; To write data into extenal files, use PUT and FILE keywords. This seems natual given that we use INPUT and INFILE to read data from external files. /* write data into external files */ DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; PUT id gender; FILE "/your/file/name"; RUN; Anything you do with INPUT, you can do the same thing with PUT. For example, "PUT var1 1-3 var2 4-6" formats your output, $ indicates a string-type variable, etc. An alternative to "PUT var1 1-3 var2 4-6" is "PUT @1 var1 @4 var2", where @x varName indicates that varName starts from character x. 4.3 Create data sets from data sets 4.3.1 Create data sets from data sets by extending one data set Here is the most basic format for creating a data set from another dataset. The following example creates an exact duplicate of dataSetName: DATA dataSetNameDup; SET dataSetName; RUN; To add variables into your new data set: DATA dataSetNameDup; SET dataSetName; var4 = var1 + var3; RUN; To remove certain variables from the dataset: DATA dataSetNameDup; SET dataSetName; DROP var1; RUN; To keep specific variables only: DATA dataSetNameDup; SET dataSetName; KEEP var2 var3; RUN; Use FIRSTOBS and OBS to select a subset of rows from a data set. The following example selects records 2 and 3 from dataSetName and creates another dataset called dataSetSub: /* this is a sub set of data from dataSetName*/ DATA dataSetSub; SET dataSetName (FIRSTOBS=2 OBS=2); RUN; Or use IF (or WHERE) statement to include records conditionally into the new data set. The following dataset dataSetNameFiltered only keeps those rows with var1 > 100. DATA dataSetNameFiltered; SET dataSetName; IF var1 > 100; RUN; The implicit variable _n_ is defined for every SAS program. This variable is re-initialized to 0 at the beginning of every data section (aka data step). Then when the program loops thru the data, this variable serves as a counter and increments by one after each loop. Therefore, the following select the first 10 records from dataSetName: DATA dataSetNameN; SET dataSetName; IF _N_<10; RUN; To select a random sample from another data set: DATA dataSetNameRandom; SET dataSetName; WHERE ranuni(1000) < .30; RUN; Note that most data definition methods can be combined. For example, you could use both KEEP/DROP and FIRSTOBS/OBS to create a row-wise and col-wise subset of a data set. 4.3.2 Create data sets from data sets by concatenating two or more data sets If you put more than one data set after SET keyword, these data sets will be concatenated: /*products from USA*/ DATA productsUsa; INPUT id, price @@; CARDS; 100 987.12 108 999.98 187 1024.00 ; RUN; /*products from Canada*/ DATA productsCan; INPUT id, price @@; CARDS; 200 987.12 208 999.98 675 1876.00 ; RUN; /*products from USA or Canada*/ DATA products; SET productsUsa productsCan; RUN; 4.3.3 Create data sets from data sets by merging two or more data sets You can also merge two or more data sets. To use MERGE, you input datasets must have at least one common variable, and all datasets to be merged must have been sorted by that variable. DATA products; INPUT productId productPrice; CARDS; 1001 105.00 1003 204.00 2098 300.00 ; RUN; DATA sales; INPUT productId quantity; CARDS; 1001 30 1003 50 2098 10 ; RUN; DATA productSales; MERGE product sales; BY productId; RUN; By default, SAS merge result includes non-perfect records, i.e., if a record in product has no match in sales, it will be returned as a record (productId, productPrice, .). If a record in sales has no match in product, it will be returned as (productId, . ,quantity). If you are familiar about database queries, this is similar to left join, innner join, and right join {refer to SQL). SAS also allows you to specify how to deal with these problems. In the following example, your merge result will only contain records where product records are not empty. [UNC] DATA productSales; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1; RUN; Here is a more complex example where you return three different data sets each containing a different type of results; DATA x1y1 x1y0 x0y1; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1 AND y=1 THEN OUTPUT x1y1; /* write all matches to x1y1 */ IF x=1 AND y=0 THEN OUTPUT x1y0; IF x=0 and y=1 THEN OUTPUT x0y1; RUN; 4.4 Create/Use permanent data sets During a SAS session, by default your data is stored in the memory, which means you lose them after you shutdown SAS (without saving them). You could choose to save your data sets to permanent storage devices such as hard drive, floppy drive, etc. To save your data sets, first, you need to use LIBNAME statement to define where you want the data saved. When you use the data sets, prefix your data set name with your LIBNAME. The following statements save data sets into "C:\sasdata\" directory: LIBNAME marineResearch "c:\sasdata\"; DATA marineResearch.marineFish; ... RUN; PROC PRINT data=marineResearch.marineFish; RUN; To use previously-stored data sets, load libraries at the beginning of your program: /* load data from ... */ LIBNAME mylib "file/path"; /* a data set in mylib */ PROC PRINT mylib.dataset1; RUN; 5 Process Data This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC. ACCESS: input data from data sources including excel and more. BOXPLOT: the name tells all, draw a boxplot. CONTENT: summary of data set. CORR: compare two data sets, get covariance and correlation matrices. DBF: read data from dbase files. DISCRIM: discrimination and classification. FACTOR: factor analysis. FORMAT: translate values of a variable into another set of values. FREQ: create frequency table. GCHART: draw histograms as well as other types of bar charts (vertical or horizontal), block charts, pie charts, etc. GPLOT: draw scatter plot with or without confidence limits and regresssion options. GLM: analysis of variance (ANOVA), one way, two-way, or n-way as defined by CLASS keyword. LOGISTIC: logistic regression. MEANS: calculate (univariate) means and variances of specific variables in a data set. MIXED: similar to GLM, but designed for mixed effects (both fixed and random). NPAR1WAY: non-parametric test for one-way anova (Kruskal-Wallis test). PLOT: text version of GPLOT PRINT: print out a dataset RANK: get ranking information about a data set. REPORT: summary of a data set. REG: linear regression procedure. SORT: sort a data set by a specific variable in it TRANSPOSE: exchange variables and observations. UNIVARIATE: simple descriptive univariate statistics for numeric variables in a data set. 6 SAS programming conventions People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's". However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions. In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code. The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming. 6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
Comments are important to any kind of programming especially when other people are supposed to use, read, or maintain your programs. SAS supports two types of comments: the block comment and the line comment. A charcter denotes that everything from * until the end of a line is comment. Like C, C++, and Java, /* and */ pairs allows you to add block comments. This is an example of line comments:
... DATA storage; INPUT prodId amount; * amount is in dollars (this is a line comment) ... RUN;
/* * show the means of variables in storage dataset * and more comments here */ PROC MEANS data=storage; VAR amount; RUN;
3 Specify Options When writing a SAS program, you might want to customize the look-and-feel of your output. 3.1 Set characters per line OPTIONS linesize = 80; or OPTIONS ls = 80; 3.2 Set lines per page OPTIONS pagesize = 60; Note that you can put multiple options in one single OPTIONS statement: OPTIONS ls = 80 ps = 60; 3.3 Allignment OPTIONS nocenter; 3.4 Start from a specific page number Start from page 100: OPTIONS PAGENO=100; 3.5 Disable time data information OPTIONS nodate; 3.6 Disable page number in output OPTIONS nonumber; 3.7 Skip n-lines To skip 10 lines before printing a page: OPTIONS skip=10; 3.8 Define how to print missing numeric values To print 'M' instead of the default '.': OPTIONS missing = 'M'; 3.9 Title and footnote Use TITLE and FOOTNOTE to set the default title and footnote for all output pages. Note that you can always change output of a specific PROC by setting adding a TITLE statement in that PROC. OPTIONS ls = 80 ps = 60; TITLE "this is my title"; FOOTNOTE "this is my footnote"; 4 Define Data The basic framework for DATA section: DATA dataSetName; data manipulation statements; RUN; 4.1 Create data sets in program To input data in your program, use CARDS or DATALINES keyword. For example DATA dataSetName; INPUT var1 var2$ var3; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN; Note that in the INPUT statement, if a variable name is followed by a $ sign, then this veriable is a string, otherwise the default type is numeric. By default, each row after the CARDS statement corresponds a record in your data set. Each record contains as many fields as you defined in your INPUT statement. For example, in our case, each record contains three fields named var1, var2(string), and var3. To allow multiple records in one line, end your INPUT statement with @@;. For example DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN; If the value for a specific variable is missing, put a dot where there should be a value: DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 . 45 4 'eee' . ; RUN; If each record in a data set is so large that you cannot put them into a single line, then use #1, #2 to indicate lines: DATA dataSetName; INPUT #1 var1 var2 var3 #2 var4 var5; CARDS; 1 'abc' 34 234 123 2 'eee' 35 123 123 ; RUN; The following example reads variables from rows formatted by number of characters: characters 1-3 forms id, 4-6 is age, and 7 is gender. DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; RUN; 4.2 Create data sets from data files DATA dataSetName; INFILE "you/file/name"; INPUT var1, var2, var3 $; RUN; Of course, if you have too many columns, you could give them generic names instead of naming them one by one. The following example reads in a data file with 100 columns, and names columns 1-99 var1 - var99 respectively, and names the last column score: DATA dataSetName; INFILE "you/file/name"; INPUT var1-var99 score; RUN; To write data into extenal files, use PUT and FILE keywords. This seems natual given that we use INPUT and INFILE to read data from external files. /* write data into external files */ DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; PUT id gender; FILE "/your/file/name"; RUN; Anything you do with INPUT, you can do the same thing with PUT. For example, "PUT var1 1-3 var2 4-6" formats your output, $ indicates a string-type variable, etc. An alternative to "PUT var1 1-3 var2 4-6" is "PUT @1 var1 @4 var2", where @x varName indicates that varName starts from character x. 4.3 Create data sets from data sets 4.3.1 Create data sets from data sets by extending one data set Here is the most basic format for creating a data set from another dataset. The following example creates an exact duplicate of dataSetName: DATA dataSetNameDup; SET dataSetName; RUN; To add variables into your new data set: DATA dataSetNameDup; SET dataSetName; var4 = var1 + var3; RUN; To remove certain variables from the dataset: DATA dataSetNameDup; SET dataSetName; DROP var1; RUN; To keep specific variables only: DATA dataSetNameDup; SET dataSetName; KEEP var2 var3; RUN; Use FIRSTOBS and OBS to select a subset of rows from a data set. The following example selects records 2 and 3 from dataSetName and creates another dataset called dataSetSub: /* this is a sub set of data from dataSetName*/ DATA dataSetSub; SET dataSetName (FIRSTOBS=2 OBS=2); RUN; Or use IF (or WHERE) statement to include records conditionally into the new data set. The following dataset dataSetNameFiltered only keeps those rows with var1 > 100. DATA dataSetNameFiltered; SET dataSetName; IF var1 > 100; RUN; The implicit variable _n_ is defined for every SAS program. This variable is re-initialized to 0 at the beginning of every data section (aka data step). Then when the program loops thru the data, this variable serves as a counter and increments by one after each loop. Therefore, the following select the first 10 records from dataSetName: DATA dataSetNameN; SET dataSetName; IF _N_<10; RUN; To select a random sample from another data set: DATA dataSetNameRandom; SET dataSetName; WHERE ranuni(1000) < .30; RUN; Note that most data definition methods can be combined. For example, you could use both KEEP/DROP and FIRSTOBS/OBS to create a row-wise and col-wise subset of a data set. 4.3.2 Create data sets from data sets by concatenating two or more data sets If you put more than one data set after SET keyword, these data sets will be concatenated: /*products from USA*/ DATA productsUsa; INPUT id, price @@; CARDS; 100 987.12 108 999.98 187 1024.00 ; RUN; /*products from Canada*/ DATA productsCan; INPUT id, price @@; CARDS; 200 987.12 208 999.98 675 1876.00 ; RUN; /*products from USA or Canada*/ DATA products; SET productsUsa productsCan; RUN; 4.3.3 Create data sets from data sets by merging two or more data sets You can also merge two or more data sets. To use MERGE, you input datasets must have at least one common variable, and all datasets to be merged must have been sorted by that variable. DATA products; INPUT productId productPrice; CARDS; 1001 105.00 1003 204.00 2098 300.00 ; RUN; DATA sales; INPUT productId quantity; CARDS; 1001 30 1003 50 2098 10 ; RUN; DATA productSales; MERGE product sales; BY productId; RUN; By default, SAS merge result includes non-perfect records, i.e., if a record in product has no match in sales, it will be returned as a record (productId, productPrice, .). If a record in sales has no match in product, it will be returned as (productId, . ,quantity). If you are familiar about database queries, this is similar to left join, innner join, and right join {refer to SQL). SAS also allows you to specify how to deal with these problems. In the following example, your merge result will only contain records where product records are not empty. [UNC] DATA productSales; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1; RUN; Here is a more complex example where you return three different data sets each containing a different type of results; DATA x1y1 x1y0 x0y1; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1 AND y=1 THEN OUTPUT x1y1; /* write all matches to x1y1 */ IF x=1 AND y=0 THEN OUTPUT x1y0; IF x=0 and y=1 THEN OUTPUT x0y1; RUN; 4.4 Create/Use permanent data sets During a SAS session, by default your data is stored in the memory, which means you lose them after you shutdown SAS (without saving them). You could choose to save your data sets to permanent storage devices such as hard drive, floppy drive, etc. To save your data sets, first, you need to use LIBNAME statement to define where you want the data saved. When you use the data sets, prefix your data set name with your LIBNAME. The following statements save data sets into "C:\sasdata\" directory: LIBNAME marineResearch "c:\sasdata\"; DATA marineResearch.marineFish; ... RUN; PROC PRINT data=marineResearch.marineFish; RUN; To use previously-stored data sets, load libraries at the beginning of your program: /* load data from ... */ LIBNAME mylib "file/path"; /* a data set in mylib */ PROC PRINT mylib.dataset1; RUN; 5 Process Data This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC. ACCESS: input data from data sources including excel and more. BOXPLOT: the name tells all, draw a boxplot. CONTENT: summary of data set. CORR: compare two data sets, get covariance and correlation matrices. DBF: read data from dbase files. DISCRIM: discrimination and classification. FACTOR: factor analysis. FORMAT: translate values of a variable into another set of values. FREQ: create frequency table. GCHART: draw histograms as well as other types of bar charts (vertical or horizontal), block charts, pie charts, etc. GPLOT: draw scatter plot with or without confidence limits and regresssion options. GLM: analysis of variance (ANOVA), one way, two-way, or n-way as defined by CLASS keyword. LOGISTIC: logistic regression. MEANS: calculate (univariate) means and variances of specific variables in a data set. MIXED: similar to GLM, but designed for mixed effects (both fixed and random). NPAR1WAY: non-parametric test for one-way anova (Kruskal-Wallis test). PLOT: text version of GPLOT PRINT: print out a dataset RANK: get ranking information about a data set. REPORT: summary of a data set. REG: linear regression procedure. SORT: sort a data set by a specific variable in it TRANSPOSE: exchange variables and observations. UNIVARIATE: simple descriptive univariate statistics for numeric variables in a data set. 6 SAS programming conventions People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's". However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions. In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code. The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming. 6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
When writing a SAS program, you might want to customize the look-and-feel of your output. 3.1 Set characters per line OPTIONS linesize = 80; or OPTIONS ls = 80; 3.2 Set lines per page OPTIONS pagesize = 60; Note that you can put multiple options in one single OPTIONS statement: OPTIONS ls = 80 ps = 60; 3.3 Allignment OPTIONS nocenter; 3.4 Start from a specific page number Start from page 100: OPTIONS PAGENO=100; 3.5 Disable time data information OPTIONS nodate; 3.6 Disable page number in output OPTIONS nonumber; 3.7 Skip n-lines To skip 10 lines before printing a page: OPTIONS skip=10; 3.8 Define how to print missing numeric values To print 'M' instead of the default '.': OPTIONS missing = 'M'; 3.9 Title and footnote Use TITLE and FOOTNOTE to set the default title and footnote for all output pages. Note that you can always change output of a specific PROC by setting adding a TITLE statement in that PROC. OPTIONS ls = 80 ps = 60; TITLE "this is my title"; FOOTNOTE "this is my footnote"; 4 Define Data The basic framework for DATA section: DATA dataSetName; data manipulation statements; RUN; 4.1 Create data sets in program To input data in your program, use CARDS or DATALINES keyword. For example DATA dataSetName; INPUT var1 var2$ var3; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN; Note that in the INPUT statement, if a variable name is followed by a $ sign, then this veriable is a string, otherwise the default type is numeric. By default, each row after the CARDS statement corresponds a record in your data set. Each record contains as many fields as you defined in your INPUT statement. For example, in our case, each record contains three fields named var1, var2(string), and var3. To allow multiple records in one line, end your INPUT statement with @@;. For example DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN; If the value for a specific variable is missing, put a dot where there should be a value: DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 . 45 4 'eee' . ; RUN; If each record in a data set is so large that you cannot put them into a single line, then use #1, #2 to indicate lines: DATA dataSetName; INPUT #1 var1 var2 var3 #2 var4 var5; CARDS; 1 'abc' 34 234 123 2 'eee' 35 123 123 ; RUN; The following example reads variables from rows formatted by number of characters: characters 1-3 forms id, 4-6 is age, and 7 is gender. DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; RUN; 4.2 Create data sets from data files DATA dataSetName; INFILE "you/file/name"; INPUT var1, var2, var3 $; RUN; Of course, if you have too many columns, you could give them generic names instead of naming them one by one. The following example reads in a data file with 100 columns, and names columns 1-99 var1 - var99 respectively, and names the last column score: DATA dataSetName; INFILE "you/file/name"; INPUT var1-var99 score; RUN; To write data into extenal files, use PUT and FILE keywords. This seems natual given that we use INPUT and INFILE to read data from external files. /* write data into external files */ DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; PUT id gender; FILE "/your/file/name"; RUN; Anything you do with INPUT, you can do the same thing with PUT. For example, "PUT var1 1-3 var2 4-6" formats your output, $ indicates a string-type variable, etc. An alternative to "PUT var1 1-3 var2 4-6" is "PUT @1 var1 @4 var2", where @x varName indicates that varName starts from character x. 4.3 Create data sets from data sets 4.3.1 Create data sets from data sets by extending one data set Here is the most basic format for creating a data set from another dataset. The following example creates an exact duplicate of dataSetName: DATA dataSetNameDup; SET dataSetName; RUN; To add variables into your new data set: DATA dataSetNameDup; SET dataSetName; var4 = var1 + var3; RUN; To remove certain variables from the dataset: DATA dataSetNameDup; SET dataSetName; DROP var1; RUN; To keep specific variables only: DATA dataSetNameDup; SET dataSetName; KEEP var2 var3; RUN; Use FIRSTOBS and OBS to select a subset of rows from a data set. The following example selects records 2 and 3 from dataSetName and creates another dataset called dataSetSub: /* this is a sub set of data from dataSetName*/ DATA dataSetSub; SET dataSetName (FIRSTOBS=2 OBS=2); RUN; Or use IF (or WHERE) statement to include records conditionally into the new data set. The following dataset dataSetNameFiltered only keeps those rows with var1 > 100. DATA dataSetNameFiltered; SET dataSetName; IF var1 > 100; RUN; The implicit variable _n_ is defined for every SAS program. This variable is re-initialized to 0 at the beginning of every data section (aka data step). Then when the program loops thru the data, this variable serves as a counter and increments by one after each loop. Therefore, the following select the first 10 records from dataSetName: DATA dataSetNameN; SET dataSetName; IF _N_<10; RUN; To select a random sample from another data set: DATA dataSetNameRandom; SET dataSetName; WHERE ranuni(1000) < .30; RUN; Note that most data definition methods can be combined. For example, you could use both KEEP/DROP and FIRSTOBS/OBS to create a row-wise and col-wise subset of a data set. 4.3.2 Create data sets from data sets by concatenating two or more data sets If you put more than one data set after SET keyword, these data sets will be concatenated: /*products from USA*/ DATA productsUsa; INPUT id, price @@; CARDS; 100 987.12 108 999.98 187 1024.00 ; RUN; /*products from Canada*/ DATA productsCan; INPUT id, price @@; CARDS; 200 987.12 208 999.98 675 1876.00 ; RUN; /*products from USA or Canada*/ DATA products; SET productsUsa productsCan; RUN; 4.3.3 Create data sets from data sets by merging two or more data sets You can also merge two or more data sets. To use MERGE, you input datasets must have at least one common variable, and all datasets to be merged must have been sorted by that variable. DATA products; INPUT productId productPrice; CARDS; 1001 105.00 1003 204.00 2098 300.00 ; RUN; DATA sales; INPUT productId quantity; CARDS; 1001 30 1003 50 2098 10 ; RUN; DATA productSales; MERGE product sales; BY productId; RUN; By default, SAS merge result includes non-perfect records, i.e., if a record in product has no match in sales, it will be returned as a record (productId, productPrice, .). If a record in sales has no match in product, it will be returned as (productId, . ,quantity). If you are familiar about database queries, this is similar to left join, innner join, and right join {refer to SQL). SAS also allows you to specify how to deal with these problems. In the following example, your merge result will only contain records where product records are not empty. [UNC] DATA productSales; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1; RUN; Here is a more complex example where you return three different data sets each containing a different type of results; DATA x1y1 x1y0 x0y1; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1 AND y=1 THEN OUTPUT x1y1; /* write all matches to x1y1 */ IF x=1 AND y=0 THEN OUTPUT x1y0; IF x=0 and y=1 THEN OUTPUT x0y1; RUN; 4.4 Create/Use permanent data sets During a SAS session, by default your data is stored in the memory, which means you lose them after you shutdown SAS (without saving them). You could choose to save your data sets to permanent storage devices such as hard drive, floppy drive, etc. To save your data sets, first, you need to use LIBNAME statement to define where you want the data saved. When you use the data sets, prefix your data set name with your LIBNAME. The following statements save data sets into "C:\sasdata\" directory: LIBNAME marineResearch "c:\sasdata\"; DATA marineResearch.marineFish; ... RUN; PROC PRINT data=marineResearch.marineFish; RUN; To use previously-stored data sets, load libraries at the beginning of your program: /* load data from ... */ LIBNAME mylib "file/path"; /* a data set in mylib */ PROC PRINT mylib.dataset1; RUN; 5 Process Data This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC. ACCESS: input data from data sources including excel and more. BOXPLOT: the name tells all, draw a boxplot. CONTENT: summary of data set. CORR: compare two data sets, get covariance and correlation matrices. DBF: read data from dbase files. DISCRIM: discrimination and classification. FACTOR: factor analysis. FORMAT: translate values of a variable into another set of values. FREQ: create frequency table. GCHART: draw histograms as well as other types of bar charts (vertical or horizontal), block charts, pie charts, etc. GPLOT: draw scatter plot with or without confidence limits and regresssion options. GLM: analysis of variance (ANOVA), one way, two-way, or n-way as defined by CLASS keyword. LOGISTIC: logistic regression. MEANS: calculate (univariate) means and variances of specific variables in a data set. MIXED: similar to GLM, but designed for mixed effects (both fixed and random). NPAR1WAY: non-parametric test for one-way anova (Kruskal-Wallis test). PLOT: text version of GPLOT PRINT: print out a dataset RANK: get ranking information about a data set. REPORT: summary of a data set. REG: linear regression procedure. SORT: sort a data set by a specific variable in it TRANSPOSE: exchange variables and observations. UNIVARIATE: simple descriptive univariate statistics for numeric variables in a data set. 6 SAS programming conventions People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's". However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions. In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code. The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming. 6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
OPTIONS linesize = 80;
OPTIONS ls = 80;
OPTIONS pagesize = 60;
OPTIONS ls = 80 ps = 60;
OPTIONS nocenter;
Start from page 100:
OPTIONS PAGENO=100;
OPTIONS nodate;
OPTIONS nonumber;
To skip 10 lines before printing a page:
OPTIONS skip=10;
To print 'M' instead of the default '.':
OPTIONS missing = 'M';
Use TITLE and FOOTNOTE to set the default title and footnote for all output pages. Note that you can always change output of a specific PROC by setting adding a TITLE statement in that PROC.
OPTIONS ls = 80 ps = 60; TITLE "this is my title"; FOOTNOTE "this is my footnote";
The basic framework for DATA section:
DATA dataSetName; data manipulation statements; RUN;
To input data in your program, use CARDS or DATALINES keyword. For example
DATA dataSetName; INPUT var1 var2$ var3; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN;
By default, each row after the CARDS statement corresponds a record in your data set. Each record contains as many fields as you defined in your INPUT statement. For example, in our case, each record contains three fields named var1, var2(string), and var3. To allow multiple records in one line, end your INPUT statement with @@;. For example
DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 'dd' 45 4 'eee' 67 ; RUN;
DATA dataSetName; INPUT var1 var2$ var3 @@; CARDS; 1 'abc' 34 2 . 45 4 'eee' . ; RUN;
DATA dataSetName; INPUT #1 var1 var2 var3 #2 var4 var5; CARDS; 1 'abc' 34 234 123 2 'eee' 35 123 123 ; RUN;
DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; RUN;
DATA dataSetName; INFILE "you/file/name"; INPUT var1, var2, var3 $; RUN;
Of course, if you have too many columns, you could give them generic names instead of naming them one by one. The following example reads in a data file with 100 columns, and names columns 1-99 var1 - var99 respectively, and names the last column score:
DATA dataSetName; INFILE "you/file/name"; INPUT var1-var99 score; RUN;
To write data into extenal files, use PUT and FILE keywords. This seems natual given that we use INPUT and INFILE to read data from external files.
/* write data into external files */ DATA dataSetName; INPUT id 1-3 age 4-6 gender 7; CARDS; 100018f 101020m 104017f ; PUT id gender; FILE "/your/file/name"; RUN;
4.3.1 Create data sets from data sets by extending one data set Here is the most basic format for creating a data set from another dataset. The following example creates an exact duplicate of dataSetName: DATA dataSetNameDup; SET dataSetName; RUN; To add variables into your new data set: DATA dataSetNameDup; SET dataSetName; var4 = var1 + var3; RUN; To remove certain variables from the dataset: DATA dataSetNameDup; SET dataSetName; DROP var1; RUN; To keep specific variables only: DATA dataSetNameDup; SET dataSetName; KEEP var2 var3; RUN; Use FIRSTOBS and OBS to select a subset of rows from a data set. The following example selects records 2 and 3 from dataSetName and creates another dataset called dataSetSub: /* this is a sub set of data from dataSetName*/ DATA dataSetSub; SET dataSetName (FIRSTOBS=2 OBS=2); RUN; Or use IF (or WHERE) statement to include records conditionally into the new data set. The following dataset dataSetNameFiltered only keeps those rows with var1 > 100. DATA dataSetNameFiltered; SET dataSetName; IF var1 > 100; RUN; The implicit variable _n_ is defined for every SAS program. This variable is re-initialized to 0 at the beginning of every data section (aka data step). Then when the program loops thru the data, this variable serves as a counter and increments by one after each loop. Therefore, the following select the first 10 records from dataSetName: DATA dataSetNameN; SET dataSetName; IF _N_<10; RUN; To select a random sample from another data set: DATA dataSetNameRandom; SET dataSetName; WHERE ranuni(1000) < .30; RUN; Note that most data definition methods can be combined. For example, you could use both KEEP/DROP and FIRSTOBS/OBS to create a row-wise and col-wise subset of a data set. 4.3.2 Create data sets from data sets by concatenating two or more data sets If you put more than one data set after SET keyword, these data sets will be concatenated: /*products from USA*/ DATA productsUsa; INPUT id, price @@; CARDS; 100 987.12 108 999.98 187 1024.00 ; RUN; /*products from Canada*/ DATA productsCan; INPUT id, price @@; CARDS; 200 987.12 208 999.98 675 1876.00 ; RUN; /*products from USA or Canada*/ DATA products; SET productsUsa productsCan; RUN; 4.3.3 Create data sets from data sets by merging two or more data sets You can also merge two or more data sets. To use MERGE, you input datasets must have at least one common variable, and all datasets to be merged must have been sorted by that variable. DATA products; INPUT productId productPrice; CARDS; 1001 105.00 1003 204.00 2098 300.00 ; RUN; DATA sales; INPUT productId quantity; CARDS; 1001 30 1003 50 2098 10 ; RUN; DATA productSales; MERGE product sales; BY productId; RUN; By default, SAS merge result includes non-perfect records, i.e., if a record in product has no match in sales, it will be returned as a record (productId, productPrice, .). If a record in sales has no match in product, it will be returned as (productId, . ,quantity). If you are familiar about database queries, this is similar to left join, innner join, and right join {refer to SQL). SAS also allows you to specify how to deal with these problems. In the following example, your merge result will only contain records where product records are not empty. [UNC] DATA productSales; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1; RUN; Here is a more complex example where you return three different data sets each containing a different type of results; DATA x1y1 x1y0 x0y1; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1 AND y=1 THEN OUTPUT x1y1; /* write all matches to x1y1 */ IF x=1 AND y=0 THEN OUTPUT x1y0; IF x=0 and y=1 THEN OUTPUT x0y1; RUN; 4.4 Create/Use permanent data sets During a SAS session, by default your data is stored in the memory, which means you lose them after you shutdown SAS (without saving them). You could choose to save your data sets to permanent storage devices such as hard drive, floppy drive, etc. To save your data sets, first, you need to use LIBNAME statement to define where you want the data saved. When you use the data sets, prefix your data set name with your LIBNAME. The following statements save data sets into "C:\sasdata\" directory: LIBNAME marineResearch "c:\sasdata\"; DATA marineResearch.marineFish; ... RUN; PROC PRINT data=marineResearch.marineFish; RUN; To use previously-stored data sets, load libraries at the beginning of your program: /* load data from ... */ LIBNAME mylib "file/path"; /* a data set in mylib */ PROC PRINT mylib.dataset1; RUN; 5 Process Data This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC. ACCESS: input data from data sources including excel and more. BOXPLOT: the name tells all, draw a boxplot. CONTENT: summary of data set. CORR: compare two data sets, get covariance and correlation matrices. DBF: read data from dbase files. DISCRIM: discrimination and classification. FACTOR: factor analysis. FORMAT: translate values of a variable into another set of values. FREQ: create frequency table. GCHART: draw histograms as well as other types of bar charts (vertical or horizontal), block charts, pie charts, etc. GPLOT: draw scatter plot with or without confidence limits and regresssion options. GLM: analysis of variance (ANOVA), one way, two-way, or n-way as defined by CLASS keyword. LOGISTIC: logistic regression. MEANS: calculate (univariate) means and variances of specific variables in a data set. MIXED: similar to GLM, but designed for mixed effects (both fixed and random). NPAR1WAY: non-parametric test for one-way anova (Kruskal-Wallis test). PLOT: text version of GPLOT PRINT: print out a dataset RANK: get ranking information about a data set. REPORT: summary of a data set. REG: linear regression procedure. SORT: sort a data set by a specific variable in it TRANSPOSE: exchange variables and observations. UNIVARIATE: simple descriptive univariate statistics for numeric variables in a data set. 6 SAS programming conventions People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's". However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions. In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code. The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming. 6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
Here is the most basic format for creating a data set from another dataset. The following example creates an exact duplicate of dataSetName:
DATA dataSetNameDup; SET dataSetName; RUN;
DATA dataSetNameDup; SET dataSetName; var4 = var1 + var3; RUN;
DATA dataSetNameDup; SET dataSetName; DROP var1; RUN;
DATA dataSetNameDup; SET dataSetName; KEEP var2 var3; RUN;
/* this is a sub set of data from dataSetName*/ DATA dataSetSub; SET dataSetName (FIRSTOBS=2 OBS=2); RUN;
DATA dataSetNameFiltered; SET dataSetName; IF var1 > 100; RUN;
DATA dataSetNameN; SET dataSetName; IF _N_<10; RUN;
To select a random sample from another data set:
DATA dataSetNameRandom; SET dataSetName; WHERE ranuni(1000) < .30; RUN;
Note that most data definition methods can be combined. For example, you could use both KEEP/DROP and FIRSTOBS/OBS to create a row-wise and col-wise subset of a data set. 4.3.2 Create data sets from data sets by concatenating two or more data sets If you put more than one data set after SET keyword, these data sets will be concatenated: /*products from USA*/ DATA productsUsa; INPUT id, price @@; CARDS; 100 987.12 108 999.98 187 1024.00 ; RUN; /*products from Canada*/ DATA productsCan; INPUT id, price @@; CARDS; 200 987.12 208 999.98 675 1876.00 ; RUN; /*products from USA or Canada*/ DATA products; SET productsUsa productsCan; RUN; 4.3.3 Create data sets from data sets by merging two or more data sets You can also merge two or more data sets. To use MERGE, you input datasets must have at least one common variable, and all datasets to be merged must have been sorted by that variable. DATA products; INPUT productId productPrice; CARDS; 1001 105.00 1003 204.00 2098 300.00 ; RUN; DATA sales; INPUT productId quantity; CARDS; 1001 30 1003 50 2098 10 ; RUN; DATA productSales; MERGE product sales; BY productId; RUN; By default, SAS merge result includes non-perfect records, i.e., if a record in product has no match in sales, it will be returned as a record (productId, productPrice, .). If a record in sales has no match in product, it will be returned as (productId, . ,quantity). If you are familiar about database queries, this is similar to left join, innner join, and right join {refer to SQL). SAS also allows you to specify how to deal with these problems. In the following example, your merge result will only contain records where product records are not empty. [UNC] DATA productSales; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1; RUN; Here is a more complex example where you return three different data sets each containing a different type of results; DATA x1y1 x1y0 x0y1; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1 AND y=1 THEN OUTPUT x1y1; /* write all matches to x1y1 */ IF x=1 AND y=0 THEN OUTPUT x1y0; IF x=0 and y=1 THEN OUTPUT x0y1; RUN; 4.4 Create/Use permanent data sets During a SAS session, by default your data is stored in the memory, which means you lose them after you shutdown SAS (without saving them). You could choose to save your data sets to permanent storage devices such as hard drive, floppy drive, etc. To save your data sets, first, you need to use LIBNAME statement to define where you want the data saved. When you use the data sets, prefix your data set name with your LIBNAME. The following statements save data sets into "C:\sasdata\" directory: LIBNAME marineResearch "c:\sasdata\"; DATA marineResearch.marineFish; ... RUN; PROC PRINT data=marineResearch.marineFish; RUN; To use previously-stored data sets, load libraries at the beginning of your program: /* load data from ... */ LIBNAME mylib "file/path"; /* a data set in mylib */ PROC PRINT mylib.dataset1; RUN; 5 Process Data This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC. ACCESS: input data from data sources including excel and more. BOXPLOT: the name tells all, draw a boxplot. CONTENT: summary of data set. CORR: compare two data sets, get covariance and correlation matrices. DBF: read data from dbase files. DISCRIM: discrimination and classification. FACTOR: factor analysis. FORMAT: translate values of a variable into another set of values. FREQ: create frequency table. GCHART: draw histograms as well as other types of bar charts (vertical or horizontal), block charts, pie charts, etc. GPLOT: draw scatter plot with or without confidence limits and regresssion options. GLM: analysis of variance (ANOVA), one way, two-way, or n-way as defined by CLASS keyword. LOGISTIC: logistic regression. MEANS: calculate (univariate) means and variances of specific variables in a data set. MIXED: similar to GLM, but designed for mixed effects (both fixed and random). NPAR1WAY: non-parametric test for one-way anova (Kruskal-Wallis test). PLOT: text version of GPLOT PRINT: print out a dataset RANK: get ranking information about a data set. REPORT: summary of a data set. REG: linear regression procedure. SORT: sort a data set by a specific variable in it TRANSPOSE: exchange variables and observations. UNIVARIATE: simple descriptive univariate statistics for numeric variables in a data set. 6 SAS programming conventions People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's". However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions. In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code. The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming. 6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
If you put more than one data set after SET keyword, these data sets will be concatenated:
/*products from USA*/ DATA productsUsa; INPUT id, price @@; CARDS; 100 987.12 108 999.98 187 1024.00 ; RUN; /*products from Canada*/ DATA productsCan; INPUT id, price @@; CARDS; 200 987.12 208 999.98 675 1876.00 ; RUN; /*products from USA or Canada*/ DATA products; SET productsUsa productsCan; RUN;
4.3.3 Create data sets from data sets by merging two or more data sets You can also merge two or more data sets. To use MERGE, you input datasets must have at least one common variable, and all datasets to be merged must have been sorted by that variable. DATA products; INPUT productId productPrice; CARDS; 1001 105.00 1003 204.00 2098 300.00 ; RUN; DATA sales; INPUT productId quantity; CARDS; 1001 30 1003 50 2098 10 ; RUN; DATA productSales; MERGE product sales; BY productId; RUN; By default, SAS merge result includes non-perfect records, i.e., if a record in product has no match in sales, it will be returned as a record (productId, productPrice, .). If a record in sales has no match in product, it will be returned as (productId, . ,quantity). If you are familiar about database queries, this is similar to left join, innner join, and right join {refer to SQL). SAS also allows you to specify how to deal with these problems. In the following example, your merge result will only contain records where product records are not empty. [UNC] DATA productSales; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1; RUN; Here is a more complex example where you return three different data sets each containing a different type of results; DATA x1y1 x1y0 x0y1; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1 AND y=1 THEN OUTPUT x1y1; /* write all matches to x1y1 */ IF x=1 AND y=0 THEN OUTPUT x1y0; IF x=0 and y=1 THEN OUTPUT x0y1; RUN; 4.4 Create/Use permanent data sets During a SAS session, by default your data is stored in the memory, which means you lose them after you shutdown SAS (without saving them). You could choose to save your data sets to permanent storage devices such as hard drive, floppy drive, etc. To save your data sets, first, you need to use LIBNAME statement to define where you want the data saved. When you use the data sets, prefix your data set name with your LIBNAME. The following statements save data sets into "C:\sasdata\" directory: LIBNAME marineResearch "c:\sasdata\"; DATA marineResearch.marineFish; ... RUN; PROC PRINT data=marineResearch.marineFish; RUN; To use previously-stored data sets, load libraries at the beginning of your program: /* load data from ... */ LIBNAME mylib "file/path"; /* a data set in mylib */ PROC PRINT mylib.dataset1; RUN; 5 Process Data This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC. ACCESS: input data from data sources including excel and more. BOXPLOT: the name tells all, draw a boxplot. CONTENT: summary of data set. CORR: compare two data sets, get covariance and correlation matrices. DBF: read data from dbase files. DISCRIM: discrimination and classification. FACTOR: factor analysis. FORMAT: translate values of a variable into another set of values. FREQ: create frequency table. GCHART: draw histograms as well as other types of bar charts (vertical or horizontal), block charts, pie charts, etc. GPLOT: draw scatter plot with or without confidence limits and regresssion options. GLM: analysis of variance (ANOVA), one way, two-way, or n-way as defined by CLASS keyword. LOGISTIC: logistic regression. MEANS: calculate (univariate) means and variances of specific variables in a data set. MIXED: similar to GLM, but designed for mixed effects (both fixed and random). NPAR1WAY: non-parametric test for one-way anova (Kruskal-Wallis test). PLOT: text version of GPLOT PRINT: print out a dataset RANK: get ranking information about a data set. REPORT: summary of a data set. REG: linear regression procedure. SORT: sort a data set by a specific variable in it TRANSPOSE: exchange variables and observations. UNIVARIATE: simple descriptive univariate statistics for numeric variables in a data set. 6 SAS programming conventions People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's". However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions. In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code. The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming. 6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
You can also merge two or more data sets. To use MERGE, you input datasets must have at least one common variable, and all datasets to be merged must have been sorted by that variable.
DATA products; INPUT productId productPrice; CARDS; 1001 105.00 1003 204.00 2098 300.00 ; RUN; DATA sales; INPUT productId quantity; CARDS; 1001 30 1003 50 2098 10 ; RUN; DATA productSales; MERGE product sales; BY productId; RUN;
DATA productSales; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1; RUN;
DATA x1y1 x1y0 x0y1; MERGE product(IN=x) sales(IN=y); BY productId; IF x=1 AND y=1 THEN OUTPUT x1y1; /* write all matches to x1y1 */ IF x=1 AND y=0 THEN OUTPUT x1y0; IF x=0 and y=1 THEN OUTPUT x0y1; RUN;
During a SAS session, by default your data is stored in the memory, which means you lose them after you shutdown SAS (without saving them). You could choose to save your data sets to permanent storage devices such as hard drive, floppy drive, etc. To save your data sets, first, you need to use LIBNAME statement to define where you want the data saved. When you use the data sets, prefix your data set name with your LIBNAME. The following statements save data sets into "C:\sasdata\" directory:
LIBNAME marineResearch "c:\sasdata\"; DATA marineResearch.marineFish; ... RUN; PROC PRINT data=marineResearch.marineFish; RUN;
/* load data from ... */ LIBNAME mylib "file/path"; /* a data set in mylib */ PROC PRINT mylib.dataset1; RUN;
/* a data set in mylib */ PROC PRINT mylib.dataset1; RUN;
5 Process Data This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC. ACCESS: input data from data sources including excel and more. BOXPLOT: the name tells all, draw a boxplot. CONTENT: summary of data set. CORR: compare two data sets, get covariance and correlation matrices. DBF: read data from dbase files. DISCRIM: discrimination and classification. FACTOR: factor analysis. FORMAT: translate values of a variable into another set of values. FREQ: create frequency table. GCHART: draw histograms as well as other types of bar charts (vertical or horizontal), block charts, pie charts, etc. GPLOT: draw scatter plot with or without confidence limits and regresssion options. GLM: analysis of variance (ANOVA), one way, two-way, or n-way as defined by CLASS keyword. LOGISTIC: logistic regression. MEANS: calculate (univariate) means and variances of specific variables in a data set. MIXED: similar to GLM, but designed for mixed effects (both fixed and random). NPAR1WAY: non-parametric test for one-way anova (Kruskal-Wallis test). PLOT: text version of GPLOT PRINT: print out a dataset RANK: get ranking information about a data set. REPORT: summary of a data set. REG: linear regression procedure. SORT: sort a data set by a specific variable in it TRANSPOSE: exchange variables and observations. UNIVARIATE: simple descriptive univariate statistics for numeric variables in a data set. 6 SAS programming conventions People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's". However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions. In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code. The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming. 6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
This is a list of frequently-used SAS PROCs(procedures), with very brief introductions of whey these PROCs do. These only serve as pointers to these PROCs. For more detailed syntax and examples, please use the SAS help documentations: SAS Menu->Help->Documentation, Go to the Index sheet, and use the PROC name as index to find detailed help information about the syntax of your specific SAS PROC.
People have different preferences over how they like to write their programs. This makes it tough for one programmer to read a program written by other programmers, and call them whatever you want, programmers tend to be stubborn about their own programming habits (conventions). As said by one of the developers of gaim[*1], "Coding convention is like an asshole, everybody has one, but nobody likes anybody else's".
However, people have to come up with some sort of commonly-admitted programming conventions to reduce the large amount of time we've used in reading and soemtimes guessing other programmer's code. For example, in the Java programming community, people usually follow the Java Coding Conventions.
In the SAS programming community, people have also come up with various coding conventions. For example, Levin[LV] proposed a set of conventions for naming, appearances, documentation, etc. Levin's article also mentioned efficiency of SAS programs. Efficiency is usually considered a separate topic from coding conventions and therefore is not discussed here. However, for people who are interested in improving performance of their SAS programs, Levin's article is definitely a good reference. Martin et.al.[MT] also has a nice article on how to write clean and easy-to-maintain SAS code.
The naming conventions proposed here follows the popular Java coding convention where possible, and is customized to SAS programming.
6.1 A few rules of thumb One important thing to know before reading any conventions. Programming convention is never mandatory or exclusive, you do not have to follow programming conventions to make your program work. The reason why people come with programming conventions is because those are considered good practices and they recommend them to the community. They are never mandatory. Add good comments. Note that this doesnot mean the more comments the better. For example, a comment "this increases a by 1" is considered a bad comment for "a=a+1" because it's obvious to everybody. Add comment where you think people need a few words to better understand what youa are doing there. Be consistent about capitlization. No matter you like all SAS keywords in capital letters (e.g., PROC, DATA, IF, ELSE, etc) or lower case (proc data if else), be consistent. 6.2 File names Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
Use .sas as your file extenion for SAS source file. Name your file something meaningful. "a123.sas" is a bad file name, "storage.sas" is considered a better one because it tells what this SAS program does. 6.2.1 File organization Organize your SAS source file as follows: Begin your SAS code file with a beginning comments stating what this program does The options section, add comments if necessary Data sections, add comments about what kind of data they are, and comment on how you filter the datasets if necessary. Proc sections, add comments on what you are doing with these PROCs. Note that it's not what the PROCs do, that's already in the SAS manual. It is what specifically doing with these procs. Add comments on what kind of options you have if necessary. 6.3 Indentation Avoid lines over 80 characters. Break lines at meaningful breakpoints (e.g., after a comma, before a keyword, etc) Use tab-size=4 6.4 Statements Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
Organize your SAS source file as follows:
Each line should contain at most one statement. Pay attention to the indentations. 6.4.1 Simple statements For example, INPUT age bloodpressure height; 6.4.2 IF ELSE END IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END; If there is only one statement in the then part *and* else part, the format could be simplified into IF condition THEN statement; ELSE statement; or even IF condition THEN statement; 6.4.3 DO - END statements DO index=1 to 100; statement1; ...; END; 6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
For example,
INPUT age bloodpressure height;
IF condition THEN DO; statement1; statement2; ...; ELSE DO; statement; statement; END;
IF condition THEN statement; ELSE statement;
IF condition THEN statement;
DO index=1 to 100; statement1; ...; END;
6.5 White space Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton: /* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN; As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places). 6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
Use blank lines (i) between DATA sections and PROC sections, and also (ii) between code segments inside a DATA or PROC block. The following is an example program skeleton:
/* * Comments on what this program does in general */ /* what is this data set?*/ DATA dataSetName; data processing code segment1; data processing code segment2; RUN; /* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN;
/* what is this data set?*/ DATA dataSetName; data processing code segment1;
data processing code segment2; RUN;
/* what does this proc do */ PROC PROCNAME OPTIONS; code segments, separated by blank lines; RUN;
As to white spaces, use them wherever you think would make you program look clearer (of course, SAS grammar also requires white spaces in many places).
6.6 Naming conventions Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
Make sure that your dataset names and variable names make sense. For example, these are bad names: data1 as compared to guineas, guinea1 and guinea2 is not as good as guineaOriginal and guineaReciprocal. Also, start your names with a lowercase word, and then capitalize the first letter of each following words to increase readability, for example, airQuality, productColors, etc. 7 External resources SAS Code Fragments, a part of the UCLA SAS resources Topics in SAS programming at UNC Footnotes[*1] gaim is a widely-used open source instant-message (IM) tool. Many people like gaim because it is compatible with most main stream IM tools including yahoo messenger, AIM, MSN messenger etc. Comments #1, at Apr 21, 2010 10:35:22, Rgurusamy said: This is simply excellent. References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html
References[LV] http://www.nesug.org/html/Proceedings/nesug01/tr/tr1004.pdf[MT] http://www.ats.ucla.edu/stat/sas/library/nesug00/ap2004.pdf[UNC] http://www.cpc.unc.edu/services/computer/presentations/sasclass99/merge.html