1@c PSPP - a program for statistical analysis. 2@c Copyright (C) 2017 Free Software Foundation, Inc. 3@c Permission is granted to copy, distribute and/or modify this document 4@c under the terms of the GNU Free Documentation License, Version 1.3 5@c or any later version published by the Free Software Foundation; 6@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 7@c A copy of the license is included in the section entitled "GNU 8@c Free Documentation License". 9@c 10@node Data Manipulation 11@chapter Data transformations 12@cindex transformations 13 14The @pspp{} procedures examined in this chapter manipulate data and 15prepare the active dataset for later analyses. They do not produce output, 16as a rule. 17 18@menu 19* AGGREGATE:: Summarize multiple cases into a single case. 20* AUTORECODE:: Automatic recoding of variables. 21* COMPUTE:: Assigning a variable a calculated value. 22* COUNT:: Counting variables with particular values. 23* FLIP:: Exchange variables with cases. 24* IF:: Conditionally assigning a calculated value. 25* RECODE:: Mapping values from one set to another. 26* SORT CASES:: Sort the active dataset. 27@end menu 28 29@node AGGREGATE 30@section AGGREGATE 31@vindex AGGREGATE 32 33@display 34AGGREGATE 35 OUTFILE=@{*,'@var{file_name}',@var{file_handle}@} [MODE=@{REPLACE, ADDVARIABLES@}] 36 /PRESORTED 37 /DOCUMENT 38 /MISSING=COLUMNWISE 39 /BREAK=@var{var_list} 40 /@var{dest_var}['@var{label}']@dots{}=@var{agr_func}(@var{src_vars}, @var{args}@dots{})@dots{} 41@end display 42 43@cmd{AGGREGATE} summarizes groups of cases into single cases. 44Cases are divided into groups that have the same values for one or more 45variables called @dfn{break variables}. Several functions are available 46for summarizing case contents. 47 48The @subcmd{OUTFILE} subcommand is required and must appear first. Specify a 49system file or portable file by file name or file 50handle (@pxref{File Handles}), or a dataset by its name 51(@pxref{Datasets}). 52The aggregated cases are written to this file. If @samp{*} is 53specified, then the aggregated cases replace the active dataset's data. 54Use of @subcmd{OUTFILE} to write a portable file is a @pspp{} extension. 55 56If @subcmd{OUTFILE=*} is given, then the subcommand @subcmd{MODE} may also be 57specified. 58The mode subcommand has two possible values: @subcmd{ADDVARIABLES} or @subcmd{REPLACE}. 59In @subcmd{REPLACE} mode, the entire active dataset is replaced by a new dataset 60which contains just the break variables and the destination varibles. 61In this mode, the new file will contain as many cases as there are 62unique combinations of the break variables. 63In @subcmd{ADDVARIABLES} mode, the destination variables will be appended to 64the existing active dataset. 65Cases which have identical combinations of values in their break 66variables, will receive identical values for the destination variables. 67The number of cases in the active dataset will remain unchanged. 68Note that if @subcmd{ADDVARIABLES} is specified, then the data @emph{must} be 69sorted on the break variables. 70 71By default, the active dataset will be sorted based on the break variables 72before aggregation takes place. If the active dataset is already sorted 73or otherwise grouped in terms of the break variables, specify 74@subcmd{PRESORTED} to save time. 75@subcmd{PRESORTED} is assumed if @subcmd{MODE=ADDVARIABLES} is used. 76 77Specify @subcmd{DOCUMENT} to copy the documents from the active dataset into the 78aggregate file (@pxref{DOCUMENT}). Otherwise, the aggregate file will 79not contain any documents, even if the aggregate file replaces the 80active dataset. 81 82Normally, only a single case (for @subcmd{SD} and @subcmd{SD}., two cases) need be 83non-missing in each group for the aggregate variable to be 84non-missing. Specifying @subcmd{/MISSING=COLUMNWISE} inverts this behavior, so 85that the aggregate variable becomes missing if any aggregated value is 86missing. 87 88If @subcmd{PRESORTED}, @subcmd{DOCUMENT}, or @subcmd{MISSING} are specified, they must appear 89between @subcmd{OUTFILE} and @subcmd{BREAK}. 90 91At least one break variable must be specified on @subcmd{BREAK}, a 92required subcommand. The values of these variables are used to divide 93the active dataset into groups to be summarized. In addition, at least 94one @var{dest_var} must be specified. 95 96One or more sets of aggregation variables must be specified. Each set 97comprises a list of aggregation variables, an equals sign (@samp{=}), 98the name of an aggregation function (see the list below), and a list 99of source variables in parentheses. Some aggregation functions expect 100additional arguments following the source variable names. 101 102Aggregation variables typically are created with no variable label, 103value labels, or missing values. Their default print and write 104formats depend on the aggregation function used, with details given in 105the table below. A variable label for an aggregation variable may be 106specified just after the variable's name in the aggregation variable 107list. 108 109Each set must have exactly as many source variables as aggregation 110variables. Each aggregation variable receives the results of applying 111the specified aggregation function to the corresponding source 112variable. The @subcmd{MEAN}, @subcmd{MEDIAN}, @subcmd{SD}, and @subcmd{SUM} 113aggregation functions may only be 114applied to numeric variables. All the rest may be applied to numeric 115and string variables. 116 117The available aggregation functions are as follows: 118 119@table @asis 120@item @subcmd{FGT(@var{var_name}, @var{value})} 121Fraction of values greater than the specified constant. The default 122format is F5.3. 123 124@item @subcmd{FIN(@var{var_name}, @var{low}, @var{high})} 125Fraction of values within the specified inclusive range of constants. 126The default format is F5.3. 127 128@item @subcmd{FLT(@var{var_name}, @var{value})} 129Fraction of values less than the specified constant. The default 130format is F5.3. 131 132@item @subcmd{FIRST(@var{var_name})} 133First non-missing value in break group. The aggregation variable 134receives the complete dictionary information from the source variable. 135The sort performed by @cmd{AGGREGATE} (and by @cmd{SORT CASES}) is stable, so that 136the first case with particular values for the break variables before 137sorting will also be the first case in that break group after sorting. 138 139@item @subcmd{FOUT(@var{var_name}, @var{low}, @var{high})} 140Fraction of values strictly outside the specified range of constants. 141The default format is F5.3. 142 143@item @subcmd{LAST(@var{var_name})} 144Last non-missing value in break group. The aggregation variable 145receives the complete dictionary information from the source variable. 146The sort performed by @cmd{AGGREGATE} (and by @cmd{SORT CASES}) is stable, so that 147the last case with particular values for the break variables before 148sorting will also be the last case in that break group after sorting. 149 150@item @subcmd{MAX(@var{var_name})} 151Maximum value. The aggregation variable receives the complete 152dictionary information from the source variable. 153 154@item @subcmd{MEAN(@var{var_name})} 155Arithmetic mean. Limited to numeric values. The default format is 156F8.2. 157 158@item @subcmd{MEDIAN(@var{var_name})} 159The median value. Limited to numeric values. The default format is F8.2. 160 161@item @subcmd{MIN(@var{var_name})} 162Minimum value. The aggregation variable receives the complete 163dictionary information from the source variable. 164 165@item @subcmd{N(@var{var_name})} 166Number of non-missing values. The default format is F7.0 if weighting 167is not enabled, F8.2 if it is (@pxref{WEIGHT}). 168 169@item @subcmd{N} 170Number of cases aggregated to form this group. The default format is 171F7.0 if weighting is not enabled, F8.2 if it is (@pxref{WEIGHT}). 172 173@item @subcmd{NMISS(@var{var_name})} 174Number of missing values. The default format is F7.0 if weighting is 175not enabled, F8.2 if it is (@pxref{WEIGHT}). 176 177@item @subcmd{NU(@var{var_name})} 178Number of non-missing values. Each case is considered to have a weight 179of 1, regardless of the current weighting variable (@pxref{WEIGHT}). 180The default format is F7.0. 181 182@item @subcmd{NU} 183Number of cases aggregated to form this group. Each case is considered 184to have a weight of 1, regardless of the current weighting variable. 185The default format is F7.0. 186 187@item @subcmd{NUMISS(@var{var_name})} 188Number of missing values. Each case is considered to have a weight of 1891, regardless of the current weighting variable. The default format is F7.0. 190 191@item @subcmd{PGT(@var{var_name}, @var{value})} 192Percentage between 0 and 100 of values greater than the specified 193constant. The default format is F5.1. 194 195@item @subcmd{PIN(@var{var_name}, @var{low}, @var{high})} 196Percentage of values within the specified inclusive range of 197constants. The default format is F5.1. 198 199@item @subcmd{PLT(@var{var_name}, @var{value})} 200Percentage of values less than the specified constant. The default 201format is F5.1. 202 203@item @subcmd{POUT(@var{var_name}, @var{low}, @var{high})} 204Percentage of values strictly outside the specified range of 205constants. The default format is F5.1. 206 207@item @subcmd{SD(@var{var_name})} 208Standard deviation of the mean. Limited to numeric values. The 209default format is F8.2. 210 211@item @subcmd{SUM(@var{var_name})} 212Sum. Limited to numeric values. The default format is F8.2. 213@end table 214 215Aggregation functions compare string values in terms of internal 216character codes. 217On most modern computers, this is @acronym{ASCII} or a superset thereof. 218 219The aggregation functions listed above exclude all user-missing values 220from calculations. To include user-missing values, insert a period 221(@samp{.}) at the end of the function name. (e.g.@: @samp{SUM.}). 222(Be aware that specifying such a function as the last token on a line 223will cause the period to be interpreted as the end of the command.) 224 225@cmd{AGGREGATE} both ignores and cancels the current @cmd{SPLIT FILE} 226settings (@pxref{SPLIT FILE}). 227 228@node AUTORECODE 229@section AUTORECODE 230@vindex AUTORECODE 231 232@display 233AUTORECODE VARIABLES=@var{src_vars} INTO @var{dest_vars} 234 [ /DESCENDING ] 235 [ /PRINT ] 236 [ /GROUP ] 237 [ /BLANK = @{VALID, MISSING@} ] 238@end display 239 240The @cmd{AUTORECODE} procedure considers the @var{n} values that a variable 241takes on and maps them onto values 1@dots{}@var{n} on a new numeric 242variable. 243 244Subcommand @subcmd{VARIABLES} is the only required subcommand and must come 245first. Specify @subcmd{VARIABLES}, an equals sign (@samp{=}), a list of source 246variables, @subcmd{INTO}, and a list of target variables. There must the same 247number of source and target variables. The target variables must not 248already exist. 249 250@cmd{AUTORECODE} ordinarily assigns each increasing non-missing value 251of a source variable (for a string, this is based on character code 252comparisons) to consecutive values of its target variable. For 253example, the smallest non-missing value of the source variable is 254recoded to value 1, the next smallest to 2, and so on. If the source 255variable has user-missing values, they are recoded to 256consecutive values just above the non-missing values. For example, if 257a source variables has seven distinct non-missing values, then the 258smallest missing value would be recoded to 8, the next smallest to 9, 259and so on. 260 261Use @subcmd{DESCENDING} to reverse the sort order for non-missing 262values, so that the largest non-missing value is recoded to 1, the 263second-largest to 2, and so on. Even with @subcmd{DESCENDING}, 264user-missing values are still recoded in ascending order just above 265the non-missing values. 266 267The system-missing value is always recoded into the system-missing 268variable in target variables. 269 270If a source value has a value label, then that value label is retained 271for the new value in the target variable. Otherwise, the source value 272itself becomes each new value's label. 273 274Variable labels are copied from the source to target variables. 275 276@subcmd{PRINT} is currently ignored. 277 278The @subcmd{GROUP} subcommand is relevant only if more than one variable is to be 279recoded. It causes a single mapping between source and target values to 280be used, instead of one map per variable. With @subcmd{GROUP}, 281user-missing values are taken from the first source variable that has 282any user-missing values. 283 284If @subcmd{/BLANK=MISSING} is given, then string variables which contain only 285whitespace are recoded as SYSMIS. If @subcmd{/BLANK=VALID} is given then they 286will be allocated a value like any other. @subcmd{/BLANK} is not relevant 287to numeric values. @subcmd{/BLANK=VALID} is the default. 288 289@cmd{AUTORECODE} is a procedure. It causes the data to be read. 290 291@node COMPUTE 292@section COMPUTE 293@vindex COMPUTE 294 295@display 296COMPUTE @var{variable} = @var{expression}. 297@end display 298 or 299@display 300COMPUTE vector(@var{index}) = @var{expression}. 301@end display 302 303@cmd{COMPUTE} assigns the value of an expression to a target 304variable. For each case, the expression is evaluated and its value 305assigned to the target variable. Numeric and string 306variables may be assigned. When a string expression's width differs 307from the target variable's width, the string result of the expression 308is truncated or padded with spaces on the right as necessary. The 309expression and variable types must match. 310 311For numeric variables only, the target variable need not already 312exist. Numeric variables created by @cmd{COMPUTE} are assigned an 313@code{F8.2} output format. String variables must be declared before 314they can be used as targets for @cmd{COMPUTE}. 315 316The target variable may be specified as an element of a vector 317(@pxref{VECTOR}). In this case, an expression @var{index} must be 318specified in parentheses following the vector name. The expression @var{index} 319must evaluate to a numeric value that, after rounding down 320to the nearest integer, is a valid index for the named vector. 321 322Using @cmd{COMPUTE} to assign to a variable specified on @cmd{LEAVE} 323(@pxref{LEAVE}) resets the variable's left state. Therefore, 324@code{LEAVE} should be specified following @cmd{COMPUTE}, not before. 325 326@cmd{COMPUTE} is a transformation. It does not cause the active dataset to be 327read. 328 329When @cmd{COMPUTE} is specified following @cmd{TEMPORARY} 330(@pxref{TEMPORARY}), the @cmd{LAG} function may not be used 331(@pxref{LAG}). 332 333@node COUNT 334@section COUNT 335@vindex COUNT 336 337@display 338COUNT @var{var_name} = @var{var}@dots{} (@var{value}@dots{}) 339 [/@var{var_name} = @var{var}@dots{} (@var{value}@dots{})]@dots{} 340 341Each @var{value} takes one of the following forms: 342 @var{number} 343 @var{string} 344 @var{num1} THRU @var{num2} 345 MISSING 346 SYSMIS 347where @var{num1} is a numeric expression or the words @subcmd{LO} or @subcmd{LOWEST} 348 and @var{num2} is a numeric expression or @subcmd{HI} or @subcmd{HIGHEST}. 349@end display 350 351@cmd{COUNT} creates or replaces a numeric @dfn{target} variable that 352counts the occurrence of a @dfn{criterion} value or set of values over 353one or more @dfn{test} variables for each case. 354 355The target variable values are always nonnegative integers. They are 356never missing. The target variable is assigned an F8.2 output format. 357@xref{Input and Output Formats}. Any variables, including 358string variables, may be test variables. 359 360User-missing values of test variables are treated just like any other 361values. They are @strong{not} treated as system-missing values. 362User-missing values that are criterion values or inside ranges of 363criterion values are counted as any other values. However (for numeric 364variables), keyword @subcmd{MISSING} may be used to refer to all system- 365and user-missing values. 366 367@cmd{COUNT} target variables are assigned values in the order 368specified. In the command @subcmd{COUNT @var{A}=@var{A} @var{B}(1) /@var{B}=@var{A} @var{B}(2).}, the 369following actions occur: 370 371@itemize @minus 372@item 373The number of occurrences of 1 between @var{A} and @var{B} is counted. 374 375@item 376@var{A} is assigned this value. 377 378@item 379The number of occurrences of 1 between @var{B} and the @strong{new} 380value of @var{A} is counted. 381 382@item 383@var{B} is assigned this value. 384@end itemize 385 386Despite this ordering, all @cmd{COUNT} criterion variables must exist 387before the procedure is executed---they may not be created as target 388variables earlier in the command! Break such a command into two 389separate commands. 390 391The examples below may help to clarify. 392 393@enumerate A 394@item 395Assuming @code{Q0}, @code{Q2}, @dots{}, @code{Q9} are numeric variables, 396the following commands: 397 398@enumerate 399@item 400Count the number of times the value 1 occurs through these variables 401for each case and assigns the count to variable @code{QCOUNT}. 402 403@item 404Print out the total number of times the value 1 occurs throughout 405@emph{all} cases using @cmd{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for 406details. 407@end enumerate 408 409@example 410COUNT QCOUNT=Q0 TO Q9(1). 411DESCRIPTIVES QCOUNT /STATISTICS=SUM. 412@end example 413 414@item 415Given these same variables, the following commands: 416 417@enumerate 418@item 419Count the number of valid values of these variables for each case and 420assigns the count to variable @code{QVALID}. 421 422@item 423Multiplies each value of @code{QVALID} by 10 to obtain a percentage of 424valid values, using @cmd{COMPUTE}. @xref{COMPUTE}, for details. 425 426@item 427Print out the percentage of valid values across all cases, using 428@cmd{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for details. 429@end enumerate 430 431@example 432COUNT QVALID=Q0 TO Q9 (LO THRU HI). 433COMPUTE QVALID=QVALID*10. 434DESCRIPTIVES QVALID /STATISTICS=MEAN. 435@end example 436@end enumerate 437 438@node FLIP 439@section FLIP 440@vindex FLIP 441 442@display 443FLIP /VARIABLES=@var{var_list} /NEWNAMES=@var{var_name}. 444@end display 445 446@cmd{FLIP} transposes rows and columns in the active dataset. It 447causes cases to be swapped with variables, and vice versa. 448 449All variables in the transposed active dataset are numeric. String 450variables take on the system-missing value in the transposed file. 451 452@subcmd{N} subcommands are required. If specified, the @subcmd{VARIABLES} subcommand 453selects variables to be transformed into cases, and variables not 454specified are discarded. If the @subcmd{VARIABLES} subcommand is omitted, all 455variables are selected for transposition. 456 457The variables specified by @subcmd{NEWNAMES}, which must be a 458string variable, is 459used to give names to the variables created by @cmd{FLIP}. Only the 460first 8 characters of the variable are used. If 461@subcmd{NEWNAMES} is not 462specified then the default is a variable named CASE_LBL, if it exists. 463If it does not then the variables created by @cmd{FLIP} are named VAR000 464through VAR999, then VAR1000, VAR1001, and so on. 465 466When a @subcmd{NEWNAMES} variable is available, the names must be canonicalized 467before becoming variable names. Invalid characters are replaced by 468letter @samp{V} in the first position, or by @samp{_} in subsequent 469positions. If the name thus generated is not unique, then numeric 470extensions are added, starting with 1, until a unique name is found or 471there are no remaining possibilities. If the latter occurs then the 472@cmd{FLIP} operation aborts. 473 474The resultant dictionary contains a CASE_LBL variable, a string 475variable of width 8, which stores the names of the variables in the 476dictionary before the transposition. Variables names longer than 8 477characters are truncated. If the active dataset is subsequently 478transposed using @cmd{FLIP}, this variable can be used to recreate the 479original variable names. 480 481@cmd{FLIP} honors @cmd{N OF CASES} (@pxref{N OF CASES}). It ignores 482@cmd{TEMPORARY} (@pxref{TEMPORARY}), so that ``temporary'' 483transformations become permanent. 484 485@node IF 486@section IF 487@vindex IF 488 489@display 490IF @var{condition} @var{variable}=@var{expression}. 491@end display 492 or 493@display 494IF @var{condition} vector(@var{index})=@var{expression}. 495@end display 496 497The @cmd{IF} transformation conditionally assigns the value of a target 498expression to a target variable, based on the truth of a test 499expression. 500 501Specify a boolean-valued expression (@pxref{Expressions}) to be tested 502following the @cmd{IF} keyword. This expression is evaluated for each case. 503If the value is true, then the value of the expression is computed and 504assigned to the specified variable. If the value is false or missing, 505nothing is done. Numeric and string variables may be 506assigned. When a string expression's width differs from the target 507variable's width, the string result of the expression is truncated or 508padded with spaces on the right as necessary. The expression and 509variable types must match. 510 511The target variable may be specified as an element of a vector 512(@pxref{VECTOR}). In this case, a vector index expression must be 513specified in parentheses following the vector name. The index 514expression must evaluate to a numeric value that, after rounding down 515to the nearest integer, is a valid index for the named vector. 516 517Using @cmd{IF} to assign to a variable specified on @cmd{LEAVE} 518(@pxref{LEAVE}) resets the variable's left state. Therefore, 519@code{LEAVE} should be specified following @cmd{IF}, not before. 520 521When @cmd{IF} is specified following @cmd{TEMPORARY} 522(@pxref{TEMPORARY}), the @cmd{LAG} function may not be used 523(@pxref{LAG}). 524 525@node RECODE 526@section RECODE 527@vindex RECODE 528 529The @cmd{RECODE} command is used to transform existing values into other, 530user specified values. 531The general form is: 532 533@display 534RECODE @var{src_vars} 535 (@var{src_value} @var{src_value} @dots{} = @var{dest_value}) 536 (@var{src_value} @var{src_value} @dots{} = @var{dest_value}) 537 (@var{src_value} @var{src_value} @dots{} = @var{dest_value}) @dots{} 538 [INTO @var{dest_vars}]. 539@end display 540 541Following the @cmd{RECODE} keyword itself comes @var{src_vars} which is a list 542of variables whose values are to be transformed. 543These variables may be string variables or they may be numeric. 544However the list must be homogeneous; you may not mix string variables and 545numeric variables in the same recoding. 546 547After the list of source variables, there should be one or more @dfn{mappings}. 548Each mapping is enclosed in parentheses, and contains the source values and 549a destination value separated by a single @samp{=}. 550The source values are used to specify the values in the dataset which 551need to change, and the destination value specifies the new value 552to which they should be changed. 553Each @var{src_value} may take one of the following forms: 554@table @asis 555@item @var{number} 556If the source variables are numeric then @var{src_value} may be a literal 557number. 558@item @var{string} 559If the source variables are string variables then @var{src_value} may be a 560literal string (like all strings, enclosed in single or double quotes). 561@item @var{num1} THRU @var{num2} 562This form is valid only when the source variables are numeric. 563It specifies all values in the range between @var{num1} and @var{num2}, 564including both endpoints of the range. By convention, @var{num1} 565should be less than @var{num2}. 566Open-ended ranges may be specified using @samp{LO} or @samp{LOWEST} 567for @var{num1} 568or @samp{HI} or @samp{HIGHEST} for @var{num2}. 569@item @samp{MISSING} 570The literal keyword @samp{MISSING} matches both system missing and user 571missing values. 572It is valid for both numeric and string variables. 573@item @samp{SYSMIS} 574The literal keyword @samp{SYSMIS} matches system missing 575values. 576It is valid for both numeric variables only. 577@item @samp{ELSE} 578The @samp{ELSE} keyword may be used to match any values which are 579not matched by any other @var{src_value} appearing in the command. 580If this keyword appears, it should be used in the last mapping of the 581command. 582@end table 583 584After the source variables comes an @samp{=} and then the @var{dest_value}. 585The @var{dest_value} may take any of the following forms: 586@table @asis 587@item @var{number} 588A literal numeric value to which the source values should be changed. 589This implies the destination variable must be numeric. 590@item @var{string} 591A literal string value (enclosed in quotation marks) to which the source 592values should be changed. 593This implies the destination variable must be a string variable. 594@item @samp{SYSMIS} 595The keyword @samp{SYSMIS} changes the value to the system missing value. 596This implies the destination variable must be numeric. 597@item @samp{COPY} 598The special keyword @samp{COPY} means that the source value should not be 599modified, but 600copied directly to the destination value. 601This is meaningful only if @samp{INTO @var{dest_vars}} is specified. 602@end table 603 604Mappings are considered from left to right. 605Therefore, if a value is matched by a @var{src_value} from more than 606one mapping, the first (leftmost) mapping which matches will be considered. 607Any subsequent matches will be ignored. 608 609The clause @samp{INTO @var{dest_vars}} is optional. 610The behaviour of the command is slightly different depending on whether it 611appears or not. 612 613If @samp{INTO @var{dest_vars}} does not appear, then values will be recoded 614``in place''. 615This means that the recoded values are written back to the 616source variables from whence the original values came. 617In this case, the @var{dest_value} for every mapping must imply a value which 618has the same type as the @var{src_value}. 619For example, if the source value is a string value, it is not permissible for 620@var{dest_value} to be @samp{SYSMIS} or another forms which implies a numeric 621result. 622It is also not permissible for @var{dest_value} to be longer than the width 623of the source variable. 624 625The following example two numeric variables @var{x} and @var{y} are recoded 626in place. 627Zero is recoded to 99, the values 1 to 10 inclusive are unchanged, 628values 1000 and higher are recoded to the system-missing value and all other 629values are changed to 999: 630@example 631recode @var{x} @var{y} 632 (0 = 99) 633 (1 THRU 10 = COPY) 634 (1000 THRU HIGHEST = SYSMIS) 635 (ELSE = 999). 636@end example 637 638If @samp{INTO @var{dest_vars}} is given, then recoded values are written 639into the variables specified in @var{dest_vars}, which must therefore 640 contain a list of valid variable names. 641The number of variables in @var{dest_vars} must be the same as the number 642of variables in @var{src_vars} 643and the respective order of the variables in @var{dest_vars} corresponds to 644the order of @var{src_vars}. 645That is to say, recoded values whose 646original value came from the @var{n}th variable in @var{src_vars} will be 647placed into the @var{n}th variable in @var{dest_vars}. 648The source variables will be unchanged. 649If any mapping implies a string as its destination value, then the respective 650destination variable must already exist, or 651have been declared using @cmd{STRING} or another transformation. 652Numeric variables however will be automatically created if they don't already 653exist. 654The following example deals with two source variables, @var{a} and @var{b} 655which contain string values. Hence there are two destination variables 656@var{v1} and @var{v2}. 657Any cases where @var{a} or @var{b} contain the values @samp{apple}, 658@samp{pear} or @samp{pomegranate} will result in @var{v1} or @var{v2} being 659filled with the string @samp{fruit} whilst cases with 660@samp{tomato}, @samp{lettuce} or @samp{carrot} will result in @samp{vegetable}. 661Any other values will produce the result @samp{unknown}: 662@example 663string @var{v1} (a20). 664string @var{v2} (a20). 665 666recode @var{a} @var{b} 667 ("apple" "pear" "pomegranate" = "fruit") 668 ("tomato" "lettuce" "carrot" = "vegetable") 669 (ELSE = "unknown") 670 into @var{v1} @var{v2}. 671@end example 672 673There is one very special mapping, not mentioned above. 674If the source variable is a string variable 675then a mapping may be specified as @samp{(CONVERT)}. 676This mapping, if it appears must be the last mapping given and 677the @samp{INTO @var{dest_vars}} clause must also be given and 678must not refer to a string variable. 679@samp{CONVERT} causes a number specified as a string to 680be converted to a numeric value. 681For example it will convert the string @samp{"3"} into the numeric 682value 3 (note that it will not convert @samp{three} into 3). 683If the string cannot be parsed as a number, then the system-missing value 684is assigned instead. 685In the following example, cases where the value of @var{x} (a string variable) 686is the empty string, are recoded to 999 and all others are converted to the 687numeric equivalent of the input value. The results are placed into the 688numeric variable @var{y}: 689@example 690recode @var{x} 691 ("" = 999) 692 (convert) 693 into @var{y}. 694@end example 695 696It is possible to specify multiple recodings on a single command. 697Introduce additional recodings with a slash (@samp{/}) to 698separate them from the previous recodings: 699@example 700recode 701 @var{a} (2 = 22) (else = 99) 702 /@var{b} (1 = 3) into @var{z} 703 . 704@end example 705@noindent Here we have two recodings. The first affects the source variable 706@var{a} and recodes in-place the value 2 into 22 and all other values to 99. 707The second recoding copies the values of @var{b} into the variable @var{z}, 708changing any instances of 1 into 3. 709 710@node SORT CASES 711@section SORT CASES 712@vindex SORT CASES 713 714@display 715SORT CASES BY @var{var_list}[(@{D|A@}] [ @var{var_list}[(@{D|A@}] ] ... 716@end display 717 718@cmd{SORT CASES} sorts the active dataset by the values of one or more 719variables. 720 721Specify @subcmd{BY} and a list of variables to sort by. By default, variables 722are sorted in ascending order. To override sort order, specify @subcmd{(D)} or 723@subcmd{(DOWN)} after a list of variables to get descending order, or @subcmd{(A)} or @subcmd{(UP)} 724for ascending order. These apply to all the listed variables 725up until the preceding @subcmd{(A)}, @subcmd{(D)}, @subcmd{(UP)} or @subcmd{(DOWN)}. 726 727The sort algorithms used by @cmd{SORT CASES} are stable. That is, 728records that have equal values of the sort variables will have the 729same relative order before and after sorting. As a special case, 730re-sorting an already sorted file will not affect the ordering of 731cases. 732 733@cmd{SORT CASES} is a procedure. It causes the data to be read. 734 735@cmd{SORT CASES} attempts to sort the entire active dataset in main memory. 736If workspace is exhausted, it falls back to a merge sort algorithm that 737involves creates numerous temporary files. 738 739@cmd{SORT CASES} may not be specified following @cmd{TEMPORARY}. 740