1{- 2(c) The GRASP/AQUA Project, Glasgow University, 1993-1998 3 4 5 ----------------- 6 A demand analysis 7 ----------------- 8-} 9 10{-# LANGUAGE CPP #-} 11 12module DmdAnal ( dmdAnalProgram ) where 13 14#include "HsVersions.h" 15 16import GhcPrelude 17 18import DynFlags 19import WwLib ( findTypeShape, deepSplitProductType_maybe ) 20import Demand -- All of it 21import CoreSyn 22import CoreSeq ( seqBinds ) 23import Outputable 24import VarEnv 25import BasicTypes 26import Data.List ( mapAccumL, sortBy ) 27import DataCon 28import Id 29import CoreUtils ( exprIsHNF, exprType, exprIsTrivial, exprOkForSpeculation ) 30import TyCon 31import Type 32import Coercion ( Coercion, coVarsOfCo ) 33import FamInstEnv 34import Util 35import Maybes ( isJust ) 36import TysWiredIn 37import TysPrim ( realWorldStatePrimTy ) 38import ErrUtils ( dumpIfSet_dyn ) 39import Name ( getName, stableNameCmp ) 40import Data.Function ( on ) 41import UniqSet 42 43{- 44************************************************************************ 45* * 46\subsection{Top level stuff} 47* * 48************************************************************************ 49-} 50 51dmdAnalProgram :: DynFlags -> FamInstEnvs -> CoreProgram -> IO CoreProgram 52dmdAnalProgram dflags fam_envs binds 53 = do { 54 let { binds_plus_dmds = do_prog binds } ; 55 dumpIfSet_dyn dflags Opt_D_dump_str_signatures 56 "Strictness signatures" $ 57 dumpStrSig binds_plus_dmds ; 58 -- See Note [Stamp out space leaks in demand analysis] 59 seqBinds binds_plus_dmds `seq` return binds_plus_dmds 60 } 61 where 62 do_prog :: CoreProgram -> CoreProgram 63 do_prog binds = snd $ mapAccumL dmdAnalTopBind (emptyAnalEnv dflags fam_envs) binds 64 65-- Analyse a (group of) top-level binding(s) 66dmdAnalTopBind :: AnalEnv 67 -> CoreBind 68 -> (AnalEnv, CoreBind) 69dmdAnalTopBind env (NonRec id rhs) 70 = (extendAnalEnv TopLevel env id2 (idStrictness id2), NonRec id2 rhs2) 71 where 72 ( _, _, rhs1) = dmdAnalRhsLetDown TopLevel Nothing env cleanEvalDmd id rhs 73 ( _, id2, rhs2) = dmdAnalRhsLetDown TopLevel Nothing (nonVirgin env) cleanEvalDmd id rhs1 74 -- Do two passes to improve CPR information 75 -- See Note [CPR for thunks] 76 -- See Note [Optimistic CPR in the "virgin" case] 77 -- See Note [Initial CPR for strict binders] 78 79dmdAnalTopBind env (Rec pairs) 80 = (env', Rec pairs') 81 where 82 (env', _, pairs') = dmdFix TopLevel env cleanEvalDmd pairs 83 -- We get two iterations automatically 84 -- c.f. the NonRec case above 85 86{- Note [Stamp out space leaks in demand analysis] 87~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 88The demand analysis pass outputs a new copy of the Core program in 89which binders have been annotated with demand and strictness 90information. It's tiresome to ensure that this information is fully 91evaluated everywhere that we produce it, so we just run a single 92seqBinds over the output before returning it, to ensure that there are 93no references holding on to the input Core program. 94 95This makes a ~30% reduction in peak memory usage when compiling 96DynFlags (cf #9675 and #13426). 97 98This is particularly important when we are doing late demand analysis, 99since we don't do a seqBinds at any point thereafter. Hence code 100generation would hold on to an extra copy of the Core program, via 101unforced thunks in demand or strictness information; and it is the 102most memory-intensive part of the compilation process, so this added 103seqBinds makes a big difference in peak memory usage. 104-} 105 106 107{- 108************************************************************************ 109* * 110\subsection{The analyser itself} 111* * 112************************************************************************ 113 114Note [Ensure demand is strict] 115~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 116It's important not to analyse e with a lazy demand because 117a) When we encounter case s of (a,b) -> 118 we demand s with U(d1d2)... but if the overall demand is lazy 119 that is wrong, and we'd need to reduce the demand on s, 120 which is inconvenient 121b) More important, consider 122 f (let x = R in x+x), where f is lazy 123 We still want to mark x as demanded, because it will be when we 124 enter the let. If we analyse f's arg with a Lazy demand, we'll 125 just mark x as Lazy 126c) The application rule wouldn't be right either 127 Evaluating (f x) in a L demand does *not* cause 128 evaluation of f in a C(L) demand! 129-} 130 131-- If e is complicated enough to become a thunk, its contents will be evaluated 132-- at most once, so oneify it. 133dmdTransformThunkDmd :: CoreExpr -> Demand -> Demand 134dmdTransformThunkDmd e 135 | exprIsTrivial e = id 136 | otherwise = oneifyDmd 137 138-- Do not process absent demands 139-- Otherwise act like in a normal demand analysis 140-- See ↦* relation in the Cardinality Analysis paper 141dmdAnalStar :: AnalEnv 142 -> Demand -- This one takes a *Demand* 143 -> CoreExpr -- Should obey the let/app invariatn 144 -> (BothDmdArg, CoreExpr) 145dmdAnalStar env dmd e 146 | (dmd_shell, cd) <- toCleanDmd dmd 147 , (dmd_ty, e') <- dmdAnal env cd e 148 = ASSERT2( not (isUnliftedType (exprType e)) || exprOkForSpeculation e, ppr e ) 149 -- The argument 'e' should satisfy the let/app invariant 150 -- See Note [Analysing with absent demand] in Demand.hs 151 (postProcessDmdType dmd_shell dmd_ty, e') 152 153-- Main Demand Analsysis machinery 154dmdAnal, dmdAnal' :: AnalEnv 155 -> CleanDemand -- The main one takes a *CleanDemand* 156 -> CoreExpr -> (DmdType, CoreExpr) 157 158-- The CleanDemand is always strict and not absent 159-- See Note [Ensure demand is strict] 160 161dmdAnal env d e = -- pprTrace "dmdAnal" (ppr d <+> ppr e) $ 162 dmdAnal' env d e 163 164dmdAnal' _ _ (Lit lit) = (nopDmdType, Lit lit) 165dmdAnal' _ _ (Type ty) = (nopDmdType, Type ty) -- Doesn't happen, in fact 166dmdAnal' _ _ (Coercion co) 167 = (unitDmdType (coercionDmdEnv co), Coercion co) 168 169dmdAnal' env dmd (Var var) 170 = (dmdTransform env var dmd, Var var) 171 172dmdAnal' env dmd (Cast e co) 173 = (dmd_ty `bothDmdType` mkBothDmdArg (coercionDmdEnv co), Cast e' co) 174 where 175 (dmd_ty, e') = dmdAnal env dmd e 176 177dmdAnal' env dmd (Tick t e) 178 = (dmd_ty, Tick t e') 179 where 180 (dmd_ty, e') = dmdAnal env dmd e 181 182dmdAnal' env dmd (App fun (Type ty)) 183 = (fun_ty, App fun' (Type ty)) 184 where 185 (fun_ty, fun') = dmdAnal env dmd fun 186 187-- Lots of the other code is there to make this 188-- beautiful, compositional, application rule :-) 189dmdAnal' env dmd (App fun arg) 190 = -- This case handles value arguments (type args handled above) 191 -- Crucially, coercions /are/ handled here, because they are 192 -- value arguments (#10288) 193 let 194 call_dmd = mkCallDmd dmd 195 (fun_ty, fun') = dmdAnal env call_dmd fun 196 (arg_dmd, res_ty) = splitDmdTy fun_ty 197 (arg_ty, arg') = dmdAnalStar env (dmdTransformThunkDmd arg arg_dmd) arg 198 in 199-- pprTrace "dmdAnal:app" (vcat 200-- [ text "dmd =" <+> ppr dmd 201-- , text "expr =" <+> ppr (App fun arg) 202-- , text "fun dmd_ty =" <+> ppr fun_ty 203-- , text "arg dmd =" <+> ppr arg_dmd 204-- , text "arg dmd_ty =" <+> ppr arg_ty 205-- , text "res dmd_ty =" <+> ppr res_ty 206-- , text "overall res dmd_ty =" <+> ppr (res_ty `bothDmdType` arg_ty) ]) 207 (res_ty `bothDmdType` arg_ty, App fun' arg') 208 209dmdAnal' env dmd (Lam var body) 210 | isTyVar var 211 = let 212 (body_ty, body') = dmdAnal env dmd body 213 in 214 (body_ty, Lam var body') 215 216 | otherwise 217 = let (body_dmd, defer_and_use) = peelCallDmd dmd 218 -- body_dmd: a demand to analyze the body 219 220 env' = extendSigsWithLam env var 221 (body_ty, body') = dmdAnal env' body_dmd body 222 (lam_ty, var') = annotateLamIdBndr env notArgOfDfun body_ty var 223 in 224 (postProcessUnsat defer_and_use lam_ty, Lam var' body') 225 226dmdAnal' env dmd (Case scrut case_bndr ty [(DataAlt dc, bndrs, rhs)]) 227 -- Only one alternative with a product constructor 228 | let tycon = dataConTyCon dc 229 , isJust (isDataProductTyCon_maybe tycon) 230 , Just rec_tc' <- checkRecTc (ae_rec_tc env) tycon 231 = let 232 env_w_tc = env { ae_rec_tc = rec_tc' } 233 env_alt = extendEnvForProdAlt env_w_tc scrut case_bndr dc bndrs 234 (rhs_ty, rhs') = dmdAnal env_alt dmd rhs 235 (alt_ty1, dmds) = findBndrsDmds env rhs_ty bndrs 236 (alt_ty2, case_bndr_dmd) = findBndrDmd env False alt_ty1 case_bndr 237 id_dmds = addCaseBndrDmd case_bndr_dmd dmds 238 alt_ty3 | io_hack_reqd scrut dc bndrs = deferAfterIO alt_ty2 239 | otherwise = alt_ty2 240 241 -- Compute demand on the scrutinee 242 -- See Note [Demand on scrutinee of a product case] 243 scrut_dmd = mkProdDmd id_dmds 244 (scrut_ty, scrut') = dmdAnal env scrut_dmd scrut 245 res_ty = alt_ty3 `bothDmdType` toBothDmdArg scrut_ty 246 case_bndr' = setIdDemandInfo case_bndr case_bndr_dmd 247 bndrs' = setBndrsDemandInfo bndrs id_dmds 248 in 249-- pprTrace "dmdAnal:Case1" (vcat [ text "scrut" <+> ppr scrut 250-- , text "dmd" <+> ppr dmd 251-- , text "case_bndr_dmd" <+> ppr (idDemandInfo case_bndr') 252-- , text "id_dmds" <+> ppr id_dmds 253-- , text "scrut_dmd" <+> ppr scrut_dmd 254-- , text "scrut_ty" <+> ppr scrut_ty 255-- , text "alt_ty" <+> ppr alt_ty2 256-- , text "res_ty" <+> ppr res_ty ]) $ 257 (res_ty, Case scrut' case_bndr' ty [(DataAlt dc, bndrs', rhs')]) 258 259dmdAnal' env dmd (Case scrut case_bndr ty alts) 260 = let -- Case expression with multiple alternatives 261 (alt_tys, alts') = mapAndUnzip (dmdAnalAlt env dmd case_bndr) alts 262 (scrut_ty, scrut') = dmdAnal env cleanEvalDmd scrut 263 (alt_ty, case_bndr') = annotateBndr env (foldr lubDmdType botDmdType alt_tys) case_bndr 264 -- NB: Base case is botDmdType, for empty case alternatives 265 -- This is a unit for lubDmdType, and the right result 266 -- when there really are no alternatives 267 res_ty = alt_ty `bothDmdType` toBothDmdArg scrut_ty 268 in 269-- pprTrace "dmdAnal:Case2" (vcat [ text "scrut" <+> ppr scrut 270-- , text "scrut_ty" <+> ppr scrut_ty 271-- , text "alt_tys" <+> ppr alt_tys 272-- , text "alt_ty" <+> ppr alt_ty 273-- , text "res_ty" <+> ppr res_ty ]) $ 274 (res_ty, Case scrut' case_bndr' ty alts') 275 276-- Let bindings can be processed in two ways: 277-- Down (RHS before body) or Up (body before RHS). 278-- The following case handle the up variant. 279-- 280-- It is very simple. For let x = rhs in body 281-- * Demand-analyse 'body' in the current environment 282-- * Find the demand, 'rhs_dmd' placed on 'x' by 'body' 283-- * Demand-analyse 'rhs' in 'rhs_dmd' 284-- 285-- This is used for a non-recursive local let without manifest lambdas. 286-- This is the LetUp rule in the paper “Higher-Order Cardinality Analysis”. 287dmdAnal' env dmd (Let (NonRec id rhs) body) 288 | useLetUp id 289 = (final_ty, Let (NonRec id' rhs') body') 290 where 291 (body_ty, body') = dmdAnal env dmd body 292 (body_ty', id_dmd) = findBndrDmd env notArgOfDfun body_ty id 293 id' = setIdDemandInfo id id_dmd 294 295 (rhs_ty, rhs') = dmdAnalStar env (dmdTransformThunkDmd rhs id_dmd) rhs 296 final_ty = body_ty' `bothDmdType` rhs_ty 297 298dmdAnal' env dmd (Let (NonRec id rhs) body) 299 = (body_ty2, Let (NonRec id2 rhs') body') 300 where 301 (lazy_fv, id1, rhs') = dmdAnalRhsLetDown NotTopLevel Nothing env dmd id rhs 302 env1 = extendAnalEnv NotTopLevel env id1 (idStrictness id1) 303 (body_ty, body') = dmdAnal env1 dmd body 304 (body_ty1, id2) = annotateBndr env body_ty id1 305 body_ty2 = addLazyFVs body_ty1 lazy_fv -- see Note [Lazy and unleashable free variables] 306 307 -- If the actual demand is better than the vanilla call 308 -- demand, you might think that we might do better to re-analyse 309 -- the RHS with the stronger demand. 310 -- But (a) That seldom happens, because it means that *every* path in 311 -- the body of the let has to use that stronger demand 312 -- (b) It often happens temporarily in when fixpointing, because 313 -- the recursive function at first seems to place a massive demand. 314 -- But we don't want to go to extra work when the function will 315 -- probably iterate to something less demanding. 316 -- In practice, all the times the actual demand on id2 is more than 317 -- the vanilla call demand seem to be due to (b). So we don't 318 -- bother to re-analyse the RHS. 319 320dmdAnal' env dmd (Let (Rec pairs) body) 321 = let 322 (env', lazy_fv, pairs') = dmdFix NotTopLevel env dmd pairs 323 (body_ty, body') = dmdAnal env' dmd body 324 body_ty1 = deleteFVs body_ty (map fst pairs) 325 body_ty2 = addLazyFVs body_ty1 lazy_fv -- see Note [Lazy and unleashable free variables] 326 in 327 body_ty2 `seq` 328 (body_ty2, Let (Rec pairs') body') 329 330io_hack_reqd :: CoreExpr -> DataCon -> [Var] -> Bool 331-- See Note [IO hack in the demand analyser] 332io_hack_reqd scrut con bndrs 333 | (bndr:_) <- bndrs 334 , con == tupleDataCon Unboxed 2 335 , idType bndr `eqType` realWorldStatePrimTy 336 , (fun, _) <- collectArgs scrut 337 = case fun of 338 Var f -> not (isPrimOpId f) 339 _ -> True 340 | otherwise 341 = False 342 343dmdAnalAlt :: AnalEnv -> CleanDemand -> Id -> Alt Var -> (DmdType, Alt Var) 344dmdAnalAlt env dmd case_bndr (con,bndrs,rhs) 345 | null bndrs -- Literals, DEFAULT, and nullary constructors 346 , (rhs_ty, rhs') <- dmdAnal env dmd rhs 347 = (rhs_ty, (con, [], rhs')) 348 349 | otherwise -- Non-nullary data constructors 350 , (rhs_ty, rhs') <- dmdAnal env dmd rhs 351 , (alt_ty, dmds) <- findBndrsDmds env rhs_ty bndrs 352 , let case_bndr_dmd = findIdDemand alt_ty case_bndr 353 id_dmds = addCaseBndrDmd case_bndr_dmd dmds 354 = (alt_ty, (con, setBndrsDemandInfo bndrs id_dmds, rhs')) 355 356 357{- Note [IO hack in the demand analyser] 358~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 359There's a hack here for I/O operations. Consider 360 361 case foo x s of { (# s', r #) -> y } 362 363Is this strict in 'y'? Often not! If foo x s performs some observable action 364(including raising an exception with raiseIO#, modifying a mutable variable, or 365even ending the program normally), then we must not force 'y' (which may fail 366to terminate) until we have performed foo x s. 367 368Hackish solution: spot the IO-like situation and add a virtual branch, 369as if we had 370 case foo x s of 371 (# s, r #) -> y 372 other -> return () 373So the 'y' isn't necessarily going to be evaluated 374 375A more complete example (#148, #1592) where this shows up is: 376 do { let len = <expensive> ; 377 ; when (...) (exitWith ExitSuccess) 378 ; print len } 379 380However, consider 381 f x s = case getMaskingState# s of 382 (# s, r #) -> 383 case x of I# x2 -> ... 384 385Here it is terribly sad to make 'f' lazy in 's'. After all, 386getMaskingState# is not going to diverge or throw an exception! This 387situation actually arises in GHC.IO.Handle.Internals.wantReadableHandle 388(on an MVar not an Int), and made a material difference. 389 390So if the scrutinee is a primop call, we *don't* apply the 391state hack: 392 - If it is a simple, terminating one like getMaskingState, 393 applying the hack is over-conservative. 394 - If the primop is raise# then it returns bottom, so 395 the case alternatives are already discarded. 396 - If the primop can raise a non-IO exception, like 397 divide by zero or seg-fault (eg writing an array 398 out of bounds) then we don't mind evaluating 'x' first. 399 400Note [Demand on the scrutinee of a product case] 401~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 402When figuring out the demand on the scrutinee of a product case, 403we use the demands of the case alternative, i.e. id_dmds. 404But note that these include the demand on the case binder; 405see Note [Demand on case-alternative binders] in Demand.hs. 406This is crucial. Example: 407 f x = case x of y { (a,b) -> k y a } 408If we just take scrut_demand = U(L,A), then we won't pass x to the 409worker, so the worker will rebuild 410 x = (a, absent-error) 411and that'll crash. 412 413Note [Aggregated demand for cardinality] 414~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 415We use different strategies for strictness and usage/cardinality to 416"unleash" demands captured on free variables by bindings. Let us 417consider the example: 418 419f1 y = let {-# NOINLINE h #-} 420 h = y 421 in (h, h) 422 423We are interested in obtaining cardinality demand U1 on |y|, as it is 424used only in a thunk, and, therefore, is not going to be updated any 425more. Therefore, the demand on |y|, captured and unleashed by usage of 426|h| is U1. However, if we unleash this demand every time |h| is used, 427and then sum up the effects, the ultimate demand on |y| will be U1 + 428U1 = U. In order to avoid it, we *first* collect the aggregate demand 429on |h| in the body of let-expression, and only then apply the demand 430transformer: 431 432transf[x](U) = {y |-> U1} 433 434so the resulting demand on |y| is U1. 435 436The situation is, however, different for strictness, where this 437aggregating approach exhibits worse results because of the nature of 438|both| operation for strictness. Consider the example: 439 440f y c = 441 let h x = y |seq| x 442 in case of 443 True -> h True 444 False -> y 445 446It is clear that |f| is strict in |y|, however, the suggested analysis 447will infer from the body of |let| that |h| is used lazily (as it is 448used in one branch only), therefore lazy demand will be put on its 449free variable |y|. Conversely, if the demand on |h| is unleashed right 450on the spot, we will get the desired result, namely, that |f| is 451strict in |y|. 452 453 454************************************************************************ 455* * 456 Demand transformer 457* * 458************************************************************************ 459-} 460 461dmdTransform :: AnalEnv -- The strictness environment 462 -> Id -- The function 463 -> CleanDemand -- The demand on the function 464 -> DmdType -- The demand type of the function in this context 465 -- Returned DmdEnv includes the demand on 466 -- this function plus demand on its free variables 467 468dmdTransform env var dmd 469 | isDataConWorkId var -- Data constructor 470 = dmdTransformDataConSig (idArity var) (idStrictness var) dmd 471 472 | gopt Opt_DmdTxDictSel (ae_dflags env), 473 Just _ <- isClassOpId_maybe var -- Dictionary component selector 474 = dmdTransformDictSelSig (idStrictness var) dmd 475 476 | isGlobalId var -- Imported function 477 = let res = dmdTransformSig (idStrictness var) dmd in 478-- pprTrace "dmdTransform" (vcat [ppr var, ppr (idStrictness var), ppr dmd, ppr res]) 479 res 480 481 | Just (sig, top_lvl) <- lookupSigEnv env var -- Local letrec bound thing 482 , let fn_ty = dmdTransformSig sig dmd 483 = -- pprTrace "dmdTransform" (vcat [ppr var, ppr sig, ppr dmd, ppr fn_ty]) $ 484 if isTopLevel top_lvl 485 then fn_ty -- Don't record top level things 486 else addVarDmd fn_ty var (mkOnceUsedDmd dmd) 487 488 | otherwise -- Local non-letrec-bound thing 489 = unitDmdType (unitVarEnv var (mkOnceUsedDmd dmd)) 490 491{- 492************************************************************************ 493* * 494\subsection{Bindings} 495* * 496************************************************************************ 497-} 498 499-- Recursive bindings 500dmdFix :: TopLevelFlag 501 -> AnalEnv -- Does not include bindings for this binding 502 -> CleanDemand 503 -> [(Id,CoreExpr)] 504 -> (AnalEnv, DmdEnv, [(Id,CoreExpr)]) -- Binders annotated with stricness info 505 506dmdFix top_lvl env let_dmd orig_pairs 507 = loop 1 initial_pairs 508 where 509 bndrs = map fst orig_pairs 510 511 -- See Note [Initialising strictness] 512 initial_pairs | ae_virgin env = [(setIdStrictness id botSig, rhs) | (id, rhs) <- orig_pairs ] 513 | otherwise = orig_pairs 514 515 -- If fixed-point iteration does not yield a result we use this instead 516 -- See Note [Safe abortion in the fixed-point iteration] 517 abort :: (AnalEnv, DmdEnv, [(Id,CoreExpr)]) 518 abort = (env, lazy_fv', zapped_pairs) 519 where (lazy_fv, pairs') = step True (zapIdStrictness orig_pairs) 520 -- Note [Lazy and unleashable free variables] 521 non_lazy_fvs = plusVarEnvList $ map (strictSigDmdEnv . idStrictness . fst) pairs' 522 lazy_fv' = lazy_fv `plusVarEnv` mapVarEnv (const topDmd) non_lazy_fvs 523 zapped_pairs = zapIdStrictness pairs' 524 525 -- The fixed-point varies the idStrictness field of the binders, and terminates if that 526 -- annotation does not change any more. 527 loop :: Int -> [(Id,CoreExpr)] -> (AnalEnv, DmdEnv, [(Id,CoreExpr)]) 528 loop n pairs 529 | found_fixpoint = (final_anal_env, lazy_fv, pairs') 530 | n == 10 = abort 531 | otherwise = loop (n+1) pairs' 532 where 533 found_fixpoint = map (idStrictness . fst) pairs' == map (idStrictness . fst) pairs 534 first_round = n == 1 535 (lazy_fv, pairs') = step first_round pairs 536 final_anal_env = extendAnalEnvs top_lvl env (map fst pairs') 537 538 step :: Bool -> [(Id, CoreExpr)] -> (DmdEnv, [(Id, CoreExpr)]) 539 step first_round pairs = (lazy_fv, pairs') 540 where 541 -- In all but the first iteration, delete the virgin flag 542 start_env | first_round = env 543 | otherwise = nonVirgin env 544 545 start = (extendAnalEnvs top_lvl start_env (map fst pairs), emptyDmdEnv) 546 547 ((_,lazy_fv), pairs') = mapAccumL my_downRhs start pairs 548 -- mapAccumL: Use the new signature to do the next pair 549 -- The occurrence analyser has arranged them in a good order 550 -- so this can significantly reduce the number of iterations needed 551 552 my_downRhs (env, lazy_fv) (id,rhs) 553 = ((env', lazy_fv'), (id', rhs')) 554 where 555 (lazy_fv1, id', rhs') = dmdAnalRhsLetDown top_lvl (Just bndrs) env let_dmd id rhs 556 lazy_fv' = plusVarEnv_C bothDmd lazy_fv lazy_fv1 557 env' = extendAnalEnv top_lvl env id (idStrictness id') 558 559 560 zapIdStrictness :: [(Id, CoreExpr)] -> [(Id, CoreExpr)] 561 zapIdStrictness pairs = [(setIdStrictness id nopSig, rhs) | (id, rhs) <- pairs ] 562 563{- 564Note [Safe abortion in the fixed-point iteration] 565~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 566 567Fixed-point iteration may fail to terminate. But we cannot simply give up and 568return the environment and code unchanged! We still need to do one additional 569round, for two reasons: 570 571 * To get information on used free variables (both lazy and strict!) 572 (see Note [Lazy and unleashable free variables]) 573 * To ensure that all expressions have been traversed at least once, and any left-over 574 strictness annotations have been updated. 575 576This final iteration does not add the variables to the strictness signature 577environment, which effectively assigns them 'nopSig' (see "getStrictness") 578 579-} 580 581-- Let bindings can be processed in two ways: 582-- Down (RHS before body) or Up (body before RHS). 583-- dmdAnalRhsLetDown implements the Down variant: 584-- * assuming a demand of <L,U> 585-- * looking at the definition 586-- * determining a strictness signature 587-- 588-- It is used for toplevel definition, recursive definitions and local 589-- non-recursive definitions that have manifest lambdas. 590-- Local non-recursive definitions without a lambda are handled with LetUp. 591-- 592-- This is the LetDown rule in the paper “Higher-Order Cardinality Analysis”. 593dmdAnalRhsLetDown :: TopLevelFlag 594 -> Maybe [Id] -- Just bs <=> recursive, Nothing <=> non-recursive 595 -> AnalEnv -> CleanDemand 596 -> Id -> CoreExpr 597 -> (DmdEnv, Id, CoreExpr) 598-- Process the RHS of the binding, add the strictness signature 599-- to the Id, and augment the environment with the signature as well. 600dmdAnalRhsLetDown top_lvl rec_flag env let_dmd id rhs 601 = (lazy_fv, id', rhs') 602 where 603 rhs_arity = idArity id 604 rhs_dmd 605 -- See Note [Demand analysis for join points] 606 -- See Note [Invariants on join points] invariant 2b, in CoreSyn 607 -- rhs_arity matches the join arity of the join point 608 | isJoinId id 609 = mkCallDmds rhs_arity let_dmd 610 | otherwise 611 -- NB: rhs_arity 612 -- See Note [Demand signatures are computed for a threshold demand based on idArity] 613 = mkRhsDmd env rhs_arity rhs 614 (DmdType rhs_fv rhs_dmds rhs_res, rhs') 615 = dmdAnal env rhs_dmd rhs 616 sig = mkStrictSigForArity rhs_arity (mkDmdType sig_fv rhs_dmds rhs_res') 617 id' = set_idStrictness env id sig 618 -- See Note [NOINLINE and strictness] 619 620 621 -- See Note [Aggregated demand for cardinality] 622 rhs_fv1 = case rec_flag of 623 Just bs -> reuseEnv (delVarEnvList rhs_fv bs) 624 Nothing -> rhs_fv 625 626 -- See Note [Lazy and unleashable free variables] 627 (lazy_fv, sig_fv) = splitFVs is_thunk rhs_fv1 628 629 rhs_res' = trimCPRInfo trim_all trim_sums rhs_res 630 trim_all = is_thunk && not_strict 631 trim_sums = not (isTopLevel top_lvl) -- See Note [CPR for sum types] 632 633 -- See Note [CPR for thunks] 634 is_thunk = not (exprIsHNF rhs) && not (isJoinId id) 635 not_strict 636 = isTopLevel top_lvl -- Top level and recursive things don't 637 || isJust rec_flag -- get their demandInfo set at all 638 || not (isStrictDmd (idDemandInfo id) || ae_virgin env) 639 -- See Note [Optimistic CPR in the "virgin" case] 640 641-- | @mkRhsDmd env rhs_arity rhs@ creates a 'CleanDemand' for 642-- unleashing on the given function's @rhs@, by creating a call demand of 643-- @rhs_arity@ with a body demand appropriate for possible product types. 644-- See Note [Product demands for function body]. 645-- For example, a call of the form @mkRhsDmd _ 2 (\x y -> (x, y))@ returns a 646-- clean usage demand of @C1(C1(U(U,U)))@. 647mkRhsDmd :: AnalEnv -> Arity -> CoreExpr -> CleanDemand 648mkRhsDmd env rhs_arity rhs = 649 case peelTsFuns rhs_arity (findTypeShape (ae_fam_envs env) (exprType rhs)) of 650 Just (TsProd tss) -> mkCallDmds rhs_arity (cleanEvalProdDmd (length tss)) 651 _ -> mkCallDmds rhs_arity cleanEvalDmd 652 653-- | If given the let-bound 'Id', 'useLetUp' determines whether we should 654-- process the binding up (body before rhs) or down (rhs before body). 655-- 656-- We use LetDown if there is a chance to get a useful strictness signature to 657-- unleash at call sites. LetDown is generally more precise than LetUp if we can 658-- correctly guess how it will be used in the body, that is, for which incoming 659-- demand the strictness signature should be computed, which allows us to 660-- unleash higher-order demands on arguments at call sites. This is mostly the 661-- case when 662-- 663-- * The binding takes any arguments before performing meaningful work (cf. 664-- 'idArity'), in which case we are interested to see how it uses them. 665-- * The binding is a join point, hence acting like a function, not a value. 666-- As a big plus, we know *precisely* how it will be used in the body; since 667-- it's always tail-called, we can directly unleash the incoming demand of 668-- the let binding on its RHS when computing a strictness signature. See 669-- [Demand analysis for join points]. 670-- 671-- Thus, if the binding is not a join point and its arity is 0, we have a thunk 672-- and use LetUp, implying that we have no usable demand signature available 673-- when we analyse the let body. 674-- 675-- Since thunk evaluation is memoised, we want to unleash its 'DmdEnv' of free 676-- vars at most once, regardless of how many times it was forced in the body. 677-- This makes a real difference wrt. usage demands. The other reason is being 678-- able to unleash a more precise product demand on its RHS once we know how the 679-- thunk was used in the let body. 680-- 681-- Characteristic examples, always assuming a single evaluation: 682-- 683-- * @let x = 2*y in x + x@ => LetUp. Compared to LetDown, we find out that 684-- the expression uses @y@ at most once. 685-- * @let x = (a,b) in fst x@ => LetUp. Compared to LetDown, we find out that 686-- @b@ is absent. 687-- * @let f x = x*2 in f y@ => LetDown. Compared to LetUp, we find out that 688-- the expression uses @y@ strictly, because we have @f@'s demand signature 689-- available at the call site. 690-- * @join exit = 2*y in if a then exit else if b then exit else 3*y@ => 691-- LetDown. Compared to LetUp, we find out that the expression uses @y@ 692-- strictly, because we can unleash @exit@'s signature at each call site. 693-- * For a more convincing example with join points, see Note [Demand analysis 694-- for join points]. 695-- 696useLetUp :: Var -> Bool 697useLetUp f = idArity f == 0 && not (isJoinId f) 698 699{- Note [Demand analysis for join points] 700~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 701Consider 702 g :: (Int,Int) -> Int 703 g (p,q) = p+q 704 705 f :: T -> Int -> Int 706 f x p = g (join j y = (p,y) 707 in case x of 708 A -> j 3 709 B -> j 4 710 C -> (p,7)) 711 712If j was a vanilla function definition, we'd analyse its body with 713evalDmd, and think that it was lazy in p. But for join points we can 714do better! We know that j's body will (if called at all) be evaluated 715with the demand that consumes the entire join-binding, in this case 716the argument demand from g. Whizzo! g evaluates both components of 717its argument pair, so p will certainly be evaluated if j is called. 718 719For f to be strict in p, we need /all/ paths to evaluate p; in this 720case the C branch does so too, so we are fine. So, as usual, we need 721to transport demands on free variables to the call site(s). Compare 722Note [Lazy and unleashable free variables]. 723 724The implementation is easy. When analysing a join point, we can 725analyse its body with the demand from the entire join-binding (written 726let_dmd here). 727 728Another win for join points! #13543. 729 730However, note that the strictness signature for a join point can 731look a little puzzling. E.g. 732 733 (join j x = \y. error "urk") 734 (in case v of ) 735 ( A -> j 3 ) x 736 ( B -> j 4 ) 737 ( C -> \y. blah ) 738 739The entire thing is in a C(S) context, so j's strictness signature 740will be [A]b 741meaning one absent argument, returns bottom. That seems odd because 742there's a \y inside. But it's right because when consumed in a C(1) 743context the RHS of the join point is indeed bottom. 744 745Note [Demand signatures are computed for a threshold demand based on idArity] 746~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 747We compute demand signatures assuming idArity incoming arguments to approximate 748behavior for when we have a call site with at least that many arguments. idArity 749is /at least/ the number of manifest lambdas, but might be higher for PAPs and 750trivial RHS (see Note [Demand analysis for trivial right-hand sides]). 751 752Because idArity of a function varies independently of its cardinality properties 753(cf. Note [idArity varies independently of dmdTypeDepth]), we implicitly encode 754the arity for when a demand signature is sound to unleash in its 'dmdTypeDepth' 755(cf. Note [Understanding DmdType and StrictSig] in Demand). It is unsound to 756unleash a demand signature when the incoming number of arguments is less than 757that. See Note [What are demand signatures?] for more details on soundness. 758 759Why idArity arguments? Because that's a conservative estimate of how many 760arguments we must feed a function before it does anything interesting with them. 761Also it elegantly subsumes the trivial RHS and PAP case. 762 763There might be functions for which we might want to analyse for more incoming 764arguments than idArity. Example: 765 766 f x = 767 if expensive 768 then \y -> ... y ... 769 else \y -> ... y ... 770 771We'd analyse `f` under a unary call demand C(S), corresponding to idArity 772being 1. That's enough to look under the manifest lambda and find out how a 773unary call would use `x`, but not enough to look into the lambdas in the if 774branches. 775 776On the other hand, if we analysed for call demand C(C(S)), we'd get useful 777strictness info for `y` (and more precise info on `x`) and possibly CPR 778information, but 779 780 * We would no longer be able to unleash the signature at unary call sites 781 * Performing the worker/wrapper split based on this information would be 782 implicitly eta-expanding `f`, playing fast and loose with divergence and 783 even being unsound in the presence of newtypes, so we refrain from doing so. 784 Also see Note [Don't eta expand in w/w] in WorkWrap. 785 786Since we only compute one signature, we do so for arity 1. Computing multiple 787signatures for different arities (i.e., polyvariance) would be entirely 788possible, if it weren't for the additional runtime and implementation 789complexity. 790 791Note [idArity varies independently of dmdTypeDepth] 792~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 793We used to check in CoreLint that dmdTypeDepth <= idArity for a let-bound 794identifier. But that means we would have to zap demand signatures every time we 795reset or decrease arity. That's an unnecessary dependency, because 796 797 * The demand signature captures a semantic property that is independent of 798 what the binding's current arity is 799 * idArity is analysis information itself, thus volatile 800 * We already *have* dmdTypeDepth, wo why not just use it to encode the 801 threshold for when to unleash the signature 802 (cf. Note [Understanding DmdType and StrictSig] in Demand) 803 804Consider the following expression, for example: 805 806 (let go x y = `x` seq ... in go) |> co 807 808`go` might have a strictness signature of `<S><L>`. The simplifier will identify 809`go` as a nullary join point through `joinPointBinding_maybe` and float the 810coercion into the binding, leading to an arity decrease: 811 812 join go = (\x y -> `x` seq ...) |> co in go 813 814With the CoreLint check, we would have to zap `go`'s perfectly viable strictness 815signature. 816 817Note [What are demand signatures?] 818~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 819Demand analysis interprets expressions in the abstract domain of demand 820transformers. Given an incoming demand we put an expression under, its abstract 821transformer gives us back a demand type denoting how other things (like 822arguments and free vars) were used when the expression was evaluated. 823Here's an example: 824 825 f x y = 826 if x + expensive 827 then \z -> z + y * ... 828 else \z -> z * ... 829 830The abstract transformer (let's call it F_e) of the if expression (let's call it 831e) would transform an incoming head demand <S,HU> into a demand type like 832{x-><S,1*U>,y-><L,U>}<L,U>. In pictures: 833 834 Demand ---F_e---> DmdType 835 <S,HU> {x-><S,1*U>,y-><L,U>}<L,U> 836 837Let's assume that the demand transformers we compute for an expression are 838correct wrt. to some concrete semantics for Core. How do demand signatures fit 839in? They are strange beasts, given that they come with strict rules when to 840it's sound to unleash them. 841 842Fortunately, we can formalise the rules with Galois connections. Consider 843f's strictness signature, {}<S,1*U><L,U>. It's a single-point approximation of 844the actual abstract transformer of f's RHS for arity 2. So, what happens is that 845we abstract *once more* from the abstract domain we already are in, replacing 846the incoming Demand by a simple lattice with two elements denoting incoming 847arity: A_2 = {<2, >=2} (where '<2' is the top element and >=2 the bottom 848element). Here's the diagram: 849 850 A_2 -----f_f----> DmdType 851 ^ | 852 | α γ | 853 | v 854 Demand ---F_f---> DmdType 855 856With 857 α(C1(C1(_))) = >=2 -- example for usage demands, but similar for strictness 858 α(_) = <2 859 γ(ty) = ty 860and F_f being the abstract transformer of f's RHS and f_f being the abstracted 861abstract transformer computable from our demand signature simply by 862 863 f_f(>=2) = {}<S,1*U><L,U> 864 f_f(<2) = postProcessUnsat {}<S,1*U><L,U> 865 866where postProcessUnsat makes a proper top element out of the given demand type. 867 868Note [Demand analysis for trivial right-hand sides] 869~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 870Consider 871 foo = plusInt |> co 872where plusInt is an arity-2 function with known strictness. Clearly 873we want plusInt's strictness to propagate to foo! But because it has 874no manifest lambdas, it won't do so automatically, and indeed 'co' might 875have type (Int->Int->Int) ~ T. 876 877Fortunately, CoreArity gives 'foo' arity 2, which is enough for LetDown to 878forward plusInt's demand signature, and all is well (see Note [Newtype arity] in 879CoreArity)! A small example is the test case NewtypeArity. 880 881 882Note [Product demands for function body] 883~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 884This example comes from shootout/binary_trees: 885 886 Main.check' = \ b z ds. case z of z' { I# ip -> 887 case ds_d13s of 888 Main.Nil -> z' 889 Main.Node s14k s14l s14m -> 890 Main.check' (not b) 891 (Main.check' b 892 (case b { 893 False -> I# (-# s14h s14k); 894 True -> I# (+# s14h s14k) 895 }) 896 s14l) 897 s14m } } } 898 899Here we *really* want to unbox z, even though it appears to be used boxed in 900the Nil case. Partly the Nil case is not a hot path. But more specifically, 901the whole function gets the CPR property if we do. 902 903So for the demand on the body of a RHS we use a product demand if it's 904a product type. 905 906************************************************************************ 907* * 908\subsection{Strictness signatures and types} 909* * 910************************************************************************ 911-} 912 913unitDmdType :: DmdEnv -> DmdType 914unitDmdType dmd_env = DmdType dmd_env [] topRes 915 916coercionDmdEnv :: Coercion -> DmdEnv 917coercionDmdEnv co = mapVarEnv (const topDmd) (getUniqSet $ coVarsOfCo co) 918 -- The VarSet from coVarsOfCo is really a VarEnv Var 919 920addVarDmd :: DmdType -> Var -> Demand -> DmdType 921addVarDmd (DmdType fv ds res) var dmd 922 = DmdType (extendVarEnv_C bothDmd fv var dmd) ds res 923 924addLazyFVs :: DmdType -> DmdEnv -> DmdType 925addLazyFVs dmd_ty lazy_fvs 926 = dmd_ty `bothDmdType` mkBothDmdArg lazy_fvs 927 -- Using bothDmdType (rather than just both'ing the envs) 928 -- is vital. Consider 929 -- let f = \x -> (x,y) 930 -- in error (f 3) 931 -- Here, y is treated as a lazy-fv of f, but we must `bothDmd` that L 932 -- demand with the bottom coming up from 'error' 933 -- 934 -- I got a loop in the fixpointer without this, due to an interaction 935 -- with the lazy_fv filtering in dmdAnalRhsLetDown. Roughly, it was 936 -- letrec f n x 937 -- = letrec g y = x `fatbar` 938 -- letrec h z = z + ...g... 939 -- in h (f (n-1) x) 940 -- in ... 941 -- In the initial iteration for f, f=Bot 942 -- Suppose h is found to be strict in z, but the occurrence of g in its RHS 943 -- is lazy. Now consider the fixpoint iteration for g, esp the demands it 944 -- places on its free variables. Suppose it places none. Then the 945 -- x `fatbar` ...call to h... 946 -- will give a x->V demand for x. That turns into a L demand for x, 947 -- which floats out of the defn for h. Without the modifyEnv, that 948 -- L demand doesn't get both'd with the Bot coming up from the inner 949 -- call to f. So we just get an L demand for x for g. 950 951{- 952Note [Do not strictify the argument dictionaries of a dfun] 953~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 954The typechecker can tie recursive knots involving dfuns, so we do the 955conservative thing and refrain from strictifying a dfun's argument 956dictionaries. 957-} 958 959setBndrsDemandInfo :: [Var] -> [Demand] -> [Var] 960setBndrsDemandInfo (b:bs) (d:ds) 961 | isTyVar b = b : setBndrsDemandInfo bs (d:ds) 962 | otherwise = setIdDemandInfo b d : setBndrsDemandInfo bs ds 963setBndrsDemandInfo [] ds = ASSERT( null ds ) [] 964setBndrsDemandInfo bs _ = pprPanic "setBndrsDemandInfo" (ppr bs) 965 966annotateBndr :: AnalEnv -> DmdType -> Var -> (DmdType, Var) 967-- The returned env has the var deleted 968-- The returned var is annotated with demand info 969-- according to the result demand of the provided demand type 970-- No effect on the argument demands 971annotateBndr env dmd_ty var 972 | isId var = (dmd_ty', setIdDemandInfo var dmd) 973 | otherwise = (dmd_ty, var) 974 where 975 (dmd_ty', dmd) = findBndrDmd env False dmd_ty var 976 977annotateLamIdBndr :: AnalEnv 978 -> DFunFlag -- is this lambda at the top of the RHS of a dfun? 979 -> DmdType -- Demand type of body 980 -> Id -- Lambda binder 981 -> (DmdType, -- Demand type of lambda 982 Id) -- and binder annotated with demand 983 984annotateLamIdBndr env arg_of_dfun dmd_ty id 985-- For lambdas we add the demand to the argument demands 986-- Only called for Ids 987 = ASSERT( isId id ) 988 -- pprTrace "annLamBndr" (vcat [ppr id, ppr _dmd_ty]) $ 989 (final_ty, setIdDemandInfo id dmd) 990 where 991 -- Watch out! See note [Lambda-bound unfoldings] 992 final_ty = case maybeUnfoldingTemplate (idUnfolding id) of 993 Nothing -> main_ty 994 Just unf -> main_ty `bothDmdType` unf_ty 995 where 996 (unf_ty, _) = dmdAnalStar env dmd unf 997 998 main_ty = addDemand dmd dmd_ty' 999 (dmd_ty', dmd) = findBndrDmd env arg_of_dfun dmd_ty id 1000 1001deleteFVs :: DmdType -> [Var] -> DmdType 1002deleteFVs (DmdType fvs dmds res) bndrs 1003 = DmdType (delVarEnvList fvs bndrs) dmds res 1004 1005{- 1006Note [CPR for sum types] 1007~~~~~~~~~~~~~~~~~~~~~~~~ 1008At the moment we do not do CPR for let-bindings that 1009 * non-top level 1010 * bind a sum type 1011Reason: I found that in some benchmarks we were losing let-no-escapes, 1012which messed it all up. Example 1013 let j = \x. .... 1014 in case y of 1015 True -> j False 1016 False -> j True 1017If we w/w this we get 1018 let j' = \x. .... 1019 in case y of 1020 True -> case j' False of { (# a #) -> Just a } 1021 False -> case j' True of { (# a #) -> Just a } 1022Notice that j' is not a let-no-escape any more. 1023 1024However this means in turn that the *enclosing* function 1025may be CPR'd (via the returned Justs). But in the case of 1026sums, there may be Nothing alternatives; and that messes 1027up the sum-type CPR. 1028 1029Conclusion: only do this for products. It's still not 1030guaranteed OK for products, but sums definitely lose sometimes. 1031 1032Note [CPR for thunks] 1033~~~~~~~~~~~~~~~~~~~~~ 1034If the rhs is a thunk, we usually forget the CPR info, because 1035it is presumably shared (else it would have been inlined, and 1036so we'd lose sharing if w/w'd it into a function). E.g. 1037 1038 let r = case expensive of 1039 (a,b) -> (b,a) 1040 in ... 1041 1042If we marked r as having the CPR property, then we'd w/w into 1043 1044 let $wr = \() -> case expensive of 1045 (a,b) -> (# b, a #) 1046 r = case $wr () of 1047 (# b,a #) -> (b,a) 1048 in ... 1049 1050But now r is a thunk, which won't be inlined, so we are no further ahead. 1051But consider 1052 1053 f x = let r = case expensive of (a,b) -> (b,a) 1054 in if foo r then r else (x,x) 1055 1056Does f have the CPR property? Well, no. 1057 1058However, if the strictness analyser has figured out (in a previous 1059iteration) that it's strict, then we DON'T need to forget the CPR info. 1060Instead we can retain the CPR info and do the thunk-splitting transform 1061(see WorkWrap.splitThunk). 1062 1063This made a big difference to PrelBase.modInt, which had something like 1064 modInt = \ x -> let r = ... -> I# v in 1065 ...body strict in r... 1066r's RHS isn't a value yet; but modInt returns r in various branches, so 1067if r doesn't have the CPR property then neither does modInt 1068Another case I found in practice (in Complex.magnitude), looks like this: 1069 let k = if ... then I# a else I# b 1070 in ... body strict in k .... 1071(For this example, it doesn't matter whether k is returned as part of 1072the overall result; but it does matter that k's RHS has the CPR property.) 1073Left to itself, the simplifier will make a join point thus: 1074 let $j k = ...body strict in k... 1075 if ... then $j (I# a) else $j (I# b) 1076With thunk-splitting, we get instead 1077 let $j x = let k = I#x in ...body strict in k... 1078 in if ... then $j a else $j b 1079This is much better; there's a good chance the I# won't get allocated. 1080 1081The difficulty with this is that we need the strictness type to 1082look at the body... but we now need the body to calculate the demand 1083on the variable, so we can decide whether its strictness type should 1084have a CPR in it or not. Simple solution: 1085 a) use strictness info from the previous iteration 1086 b) make sure we do at least 2 iterations, by doing a second 1087 round for top-level non-recs. Top level recs will get at 1088 least 2 iterations except for totally-bottom functions 1089 which aren't very interesting anyway. 1090 1091NB: strictly_demanded is never true of a top-level Id, or of a recursive Id. 1092 1093Note [Optimistic CPR in the "virgin" case] 1094~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1095Demand and strictness info are initialized by top elements. However, 1096this prevents from inferring a CPR property in the first pass of the 1097analyser, so we keep an explicit flag ae_virgin in the AnalEnv 1098datatype. 1099 1100We can't start with 'not-demanded' (i.e., top) because then consider 1101 f x = let 1102 t = ... I# x 1103 in 1104 if ... then t else I# y else f x' 1105 1106In the first iteration we'd have no demand info for x, so assume 1107not-demanded; then we'd get TopRes for f's CPR info. Next iteration 1108we'd see that t was demanded, and so give it the CPR property, but by 1109now f has TopRes, so it will stay TopRes. Instead, by checking the 1110ae_virgin flag at the first time round, we say 'yes t is demanded' the 1111first time. 1112 1113However, this does mean that for non-recursive bindings we must 1114iterate twice to be sure of not getting over-optimistic CPR info, 1115in the case where t turns out to be not-demanded. This is handled 1116by dmdAnalTopBind. 1117 1118 1119Note [NOINLINE and strictness] 1120~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1121The strictness analyser used to have a HACK which ensured that NOINLNE 1122things were not strictness-analysed. The reason was unsafePerformIO. 1123Left to itself, the strictness analyser would discover this strictness 1124for unsafePerformIO: 1125 unsafePerformIO: C(U(AV)) 1126But then consider this sub-expression 1127 unsafePerformIO (\s -> let r = f x in 1128 case writeIORef v r s of (# s1, _ #) -> 1129 (# s1, r #) 1130The strictness analyser will now find that r is sure to be eval'd, 1131and may then hoist it out. This makes tests/lib/should_run/memo002 1132deadlock. 1133 1134Solving this by making all NOINLINE things have no strictness info is overkill. 1135In particular, it's overkill for runST, which is perfectly respectable. 1136Consider 1137 f x = runST (return x) 1138This should be strict in x. 1139 1140So the new plan is to define unsafePerformIO using the 'lazy' combinator: 1141 1142 unsafePerformIO (IO m) = lazy (case m realWorld# of (# _, r #) -> r) 1143 1144Remember, 'lazy' is a wired-in identity-function Id, of type a->a, which is 1145magically NON-STRICT, and is inlined after strictness analysis. So 1146unsafePerformIO will look non-strict, and that's what we want. 1147 1148Now we don't need the hack in the strictness analyser. HOWEVER, this 1149decision does mean that even a NOINLINE function is not entirely 1150opaque: some aspect of its implementation leaks out, notably its 1151strictness. For example, if you have a function implemented by an 1152error stub, but which has RULES, you may want it not to be eliminated 1153in favour of error! 1154 1155Note [Lazy and unleashable free variables] 1156~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1157We put the strict and once-used FVs in the DmdType of the Id, so 1158that at its call sites we unleash demands on its strict fvs. 1159An example is 'roll' in imaginary/wheel-sieve2 1160Something like this: 1161 roll x = letrec 1162 go y = if ... then roll (x-1) else x+1 1163 in 1164 go ms 1165We want to see that roll is strict in x, which is because 1166go is called. So we put the DmdEnv for x in go's DmdType. 1167 1168Another example: 1169 1170 f :: Int -> Int -> Int 1171 f x y = let t = x+1 1172 h z = if z==0 then t else 1173 if z==1 then x+1 else 1174 x + h (z-1) 1175 in h y 1176 1177Calling h does indeed evaluate x, but we can only see 1178that if we unleash a demand on x at the call site for t. 1179 1180Incidentally, here's a place where lambda-lifting h would 1181lose the cigar --- we couldn't see the joint strictness in t/x 1182 1183 ON THE OTHER HAND 1184 1185We don't want to put *all* the fv's from the RHS into the 1186DmdType. Because 1187 1188 * it makes the strictness signatures larger, and hence slows down fixpointing 1189 1190and 1191 1192 * it is useless information at the call site anyways: 1193 For lazy, used-many times fv's we will never get any better result than 1194 that, no matter how good the actual demand on the function at the call site 1195 is (unless it is always absent, but then the whole binder is useless). 1196 1197Therefore we exclude lazy multiple-used fv's from the environment in the 1198DmdType. 1199 1200But now the signature lies! (Missing variables are assumed to be absent.) To 1201make up for this, the code that analyses the binding keeps the demand on those 1202variable separate (usually called "lazy_fv") and adds it to the demand of the 1203whole binding later. 1204 1205What if we decide _not_ to store a strictness signature for a binding at all, as 1206we do when aborting a fixed-point iteration? The we risk losing the information 1207that the strict variables are being used. In that case, we take all free variables 1208mentioned in the (unsound) strictness signature, conservatively approximate the 1209demand put on them (topDmd), and add that to the "lazy_fv" returned by "dmdFix". 1210 1211 1212Note [Lambda-bound unfoldings] 1213~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1214We allow a lambda-bound variable to carry an unfolding, a facility that is used 1215exclusively for join points; see Note [Case binders and join points]. If so, 1216we must be careful to demand-analyse the RHS of the unfolding! Example 1217 \x. \y{=Just x}. <body> 1218Then if <body> uses 'y', then transitively it uses 'x', and we must not 1219forget that fact, otherwise we might make 'x' absent when it isn't. 1220 1221 1222************************************************************************ 1223* * 1224\subsection{Strictness signatures} 1225* * 1226************************************************************************ 1227-} 1228 1229type DFunFlag = Bool -- indicates if the lambda being considered is in the 1230 -- sequence of lambdas at the top of the RHS of a dfun 1231notArgOfDfun :: DFunFlag 1232notArgOfDfun = False 1233 1234data AnalEnv 1235 = AE { ae_dflags :: DynFlags 1236 , ae_sigs :: SigEnv 1237 , ae_virgin :: Bool -- True on first iteration only 1238 -- See Note [Initialising strictness] 1239 , ae_rec_tc :: RecTcChecker 1240 , ae_fam_envs :: FamInstEnvs 1241 } 1242 1243 -- We use the se_env to tell us whether to 1244 -- record info about a variable in the DmdEnv 1245 -- We do so if it's a LocalId, but not top-level 1246 -- 1247 -- The DmdEnv gives the demand on the free vars of the function 1248 -- when it is given enough args to satisfy the strictness signature 1249 1250type SigEnv = VarEnv (StrictSig, TopLevelFlag) 1251 1252instance Outputable AnalEnv where 1253 ppr (AE { ae_sigs = env, ae_virgin = virgin }) 1254 = text "AE" <+> braces (vcat 1255 [ text "ae_virgin =" <+> ppr virgin 1256 , text "ae_sigs =" <+> ppr env ]) 1257 1258emptyAnalEnv :: DynFlags -> FamInstEnvs -> AnalEnv 1259emptyAnalEnv dflags fam_envs 1260 = AE { ae_dflags = dflags 1261 , ae_sigs = emptySigEnv 1262 , ae_virgin = True 1263 , ae_rec_tc = initRecTc 1264 , ae_fam_envs = fam_envs 1265 } 1266 1267emptySigEnv :: SigEnv 1268emptySigEnv = emptyVarEnv 1269 1270-- | Extend an environment with the strictness IDs attached to the id 1271extendAnalEnvs :: TopLevelFlag -> AnalEnv -> [Id] -> AnalEnv 1272extendAnalEnvs top_lvl env vars 1273 = env { ae_sigs = extendSigEnvs top_lvl (ae_sigs env) vars } 1274 1275extendSigEnvs :: TopLevelFlag -> SigEnv -> [Id] -> SigEnv 1276extendSigEnvs top_lvl sigs vars 1277 = extendVarEnvList sigs [ (var, (idStrictness var, top_lvl)) | var <- vars] 1278 1279extendAnalEnv :: TopLevelFlag -> AnalEnv -> Id -> StrictSig -> AnalEnv 1280extendAnalEnv top_lvl env var sig 1281 = env { ae_sigs = extendSigEnv top_lvl (ae_sigs env) var sig } 1282 1283extendSigEnv :: TopLevelFlag -> SigEnv -> Id -> StrictSig -> SigEnv 1284extendSigEnv top_lvl sigs var sig = extendVarEnv sigs var (sig, top_lvl) 1285 1286lookupSigEnv :: AnalEnv -> Id -> Maybe (StrictSig, TopLevelFlag) 1287lookupSigEnv env id = lookupVarEnv (ae_sigs env) id 1288 1289nonVirgin :: AnalEnv -> AnalEnv 1290nonVirgin env = env { ae_virgin = False } 1291 1292extendSigsWithLam :: AnalEnv -> Id -> AnalEnv 1293-- Extend the AnalEnv when we meet a lambda binder 1294extendSigsWithLam env id 1295 | isId id 1296 , isStrictDmd (idDemandInfo id) || ae_virgin env 1297 -- See Note [Optimistic CPR in the "virgin" case] 1298 -- See Note [Initial CPR for strict binders] 1299 , Just (dc,_,_,_) <- deepSplitProductType_maybe (ae_fam_envs env) $ idType id 1300 = extendAnalEnv NotTopLevel env id (cprProdSig (dataConRepArity dc)) 1301 1302 | otherwise 1303 = env 1304 1305extendEnvForProdAlt :: AnalEnv -> CoreExpr -> Id -> DataCon -> [Var] -> AnalEnv 1306-- See Note [CPR in a product case alternative] 1307extendEnvForProdAlt env scrut case_bndr dc bndrs 1308 = foldl' do_con_arg env1 ids_w_strs 1309 where 1310 env1 = extendAnalEnv NotTopLevel env case_bndr case_bndr_sig 1311 1312 ids_w_strs = filter isId bndrs `zip` dataConRepStrictness dc 1313 case_bndr_sig = cprProdSig (dataConRepArity dc) 1314 fam_envs = ae_fam_envs env 1315 1316 do_con_arg env (id, str) 1317 | let is_strict = isStrictDmd (idDemandInfo id) || isMarkedStrict str 1318 , ae_virgin env || (is_var_scrut && is_strict) -- See Note [CPR in a product case alternative] 1319 , Just (dc,_,_,_) <- deepSplitProductType_maybe fam_envs $ idType id 1320 = extendAnalEnv NotTopLevel env id (cprProdSig (dataConRepArity dc)) 1321 | otherwise 1322 = env 1323 1324 is_var_scrut = is_var scrut 1325 is_var (Cast e _) = is_var e 1326 is_var (Var v) = isLocalId v 1327 is_var _ = False 1328 1329findBndrsDmds :: AnalEnv -> DmdType -> [Var] -> (DmdType, [Demand]) 1330-- Return the demands on the Ids in the [Var] 1331findBndrsDmds env dmd_ty bndrs 1332 = go dmd_ty bndrs 1333 where 1334 go dmd_ty [] = (dmd_ty, []) 1335 go dmd_ty (b:bs) 1336 | isId b = let (dmd_ty1, dmds) = go dmd_ty bs 1337 (dmd_ty2, dmd) = findBndrDmd env False dmd_ty1 b 1338 in (dmd_ty2, dmd : dmds) 1339 | otherwise = go dmd_ty bs 1340 1341findBndrDmd :: AnalEnv -> Bool -> DmdType -> Id -> (DmdType, Demand) 1342-- See Note [Trimming a demand to a type] in Demand.hs 1343findBndrDmd env arg_of_dfun dmd_ty id 1344 = (dmd_ty', dmd') 1345 where 1346 dmd' = killUsageDemand (ae_dflags env) $ 1347 strictify $ 1348 trimToType starting_dmd (findTypeShape fam_envs id_ty) 1349 1350 (dmd_ty', starting_dmd) = peelFV dmd_ty id 1351 1352 id_ty = idType id 1353 1354 strictify dmd 1355 | gopt Opt_DictsStrict (ae_dflags env) 1356 -- We never want to strictify a recursive let. At the moment 1357 -- annotateBndr is only call for non-recursive lets; if that 1358 -- changes, we need a RecFlag parameter and another guard here. 1359 , not arg_of_dfun -- See Note [Do not strictify the argument dictionaries of a dfun] 1360 = strictifyDictDmd id_ty dmd 1361 | otherwise 1362 = dmd 1363 1364 fam_envs = ae_fam_envs env 1365 1366set_idStrictness :: AnalEnv -> Id -> StrictSig -> Id 1367set_idStrictness env id sig 1368 = setIdStrictness id (killUsageSig (ae_dflags env) sig) 1369 1370dumpStrSig :: CoreProgram -> SDoc 1371dumpStrSig binds = vcat (map printId ids) 1372 where 1373 ids = sortBy (stableNameCmp `on` getName) (concatMap getIds binds) 1374 getIds (NonRec i _) = [ i ] 1375 getIds (Rec bs) = map fst bs 1376 printId id | isExportedId id = ppr id <> colon <+> pprIfaceStrictSig (idStrictness id) 1377 | otherwise = empty 1378 1379{- Note [CPR in a product case alternative] 1380~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1381In a case alternative for a product type, we want to give some of the 1382binders the CPR property. Specifically 1383 1384 * The case binder; inside the alternative, the case binder always has 1385 the CPR property, meaning that a case on it will successfully cancel. 1386 Example: 1387 f True x = case x of y { I# x' -> if x' ==# 3 1388 then y 1389 else I# 8 } 1390 f False x = I# 3 1391 1392 By giving 'y' the CPR property, we ensure that 'f' does too, so we get 1393 f b x = case fw b x of { r -> I# r } 1394 fw True x = case x of y { I# x' -> if x' ==# 3 then x' else 8 } 1395 fw False x = 3 1396 1397 Of course there is the usual risk of re-boxing: we have 'x' available 1398 boxed and unboxed, but we return the unboxed version for the wrapper to 1399 box. If the wrapper doesn't cancel with its caller, we'll end up 1400 re-boxing something that we did have available in boxed form. 1401 1402 * Any strict binders with product type, can use 1403 Note [Initial CPR for strict binders]. But we can go a little 1404 further. Consider 1405 1406 data T = MkT !Int Int 1407 1408 f2 (MkT x y) | y>0 = f2 (MkT x (y-1)) 1409 | otherwise = x 1410 1411 For $wf2 we are going to unbox the MkT *and*, since it is strict, the 1412 first argument of the MkT; see Note [Add demands for strict constructors] 1413 in WwLib. But then we don't want box it up again when returning it! We want 1414 'f2' to have the CPR property, so we give 'x' the CPR property. 1415 1416 * It's a bit delicate because if this case is scrutinising something other 1417 than an argument the original function, we really don't have the unboxed 1418 version available. E.g 1419 g v = case foo v of 1420 MkT x y | y>0 -> ... 1421 | otherwise -> x 1422 Here we don't have the unboxed 'x' available. Hence the 1423 is_var_scrut test when making use of the strictness annotation. 1424 Slightly ad-hoc, because even if the scrutinee *is* a variable it 1425 might not be a onre of the arguments to the original function, or a 1426 sub-component thereof. But it's simple, and nothing terrible 1427 happens if we get it wrong. e.g. #10694. 1428 1429 1430Note [Initial CPR for strict binders] 1431~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1432CPR is initialized for a lambda binder in an optimistic manner, i.e, 1433if the binder is used strictly and at least some of its components as 1434a product are used, which is checked by the value of the absence 1435demand. 1436 1437If the binder is marked demanded with a strict demand, then give it a 1438CPR signature. Here's a concrete example ('f1' in test T10482a), 1439assuming h is strict: 1440 1441 f1 :: Int -> Int 1442 f1 x = case h x of 1443 A -> x 1444 B -> f1 (x-1) 1445 C -> x+1 1446 1447If we notice that 'x' is used strictly, we can give it the CPR 1448property; and hence f1 gets the CPR property too. It's sound (doesn't 1449change strictness) to give it the CPR property because by the time 'x' 1450is returned (case A above), it'll have been evaluated (by the wrapper 1451of 'h' in the example). 1452 1453Moreover, if f itself is strict in x, then we'll pass x unboxed to 1454f1, and so the boxed version *won't* be available; in that case it's 1455very helpful to give 'x' the CPR property. 1456 1457Note that 1458 1459 * We only want to do this for something that definitely 1460 has product type, else we may get over-optimistic CPR results 1461 (e.g. from \x -> x!). 1462 1463 * See Note [CPR examples] 1464 1465Note [CPR examples] 1466~~~~~~~~~~~~~~~~~~~~ 1467Here are some examples (stranal/should_compile/T10482a) of the 1468usefulness of Note [CPR in a product case alternative]. The main 1469point: all of these functions can have the CPR property. 1470 1471 ------- f1 ----------- 1472 -- x is used strictly by h, so it'll be available 1473 -- unboxed before it is returned in the True branch 1474 1475 f1 :: Int -> Int 1476 f1 x = case h x x of 1477 True -> x 1478 False -> f1 (x-1) 1479 1480 1481 ------- f2 ----------- 1482 -- x is a strict field of MkT2, so we'll pass it unboxed 1483 -- to $wf2, so it's available unboxed. This depends on 1484 -- the case expression analysing (a subcomponent of) one 1485 -- of the original arguments to the function, so it's 1486 -- a bit more delicate. 1487 1488 data T2 = MkT2 !Int Int 1489 1490 f2 :: T2 -> Int 1491 f2 (MkT2 x y) | y>0 = f2 (MkT2 x (y-1)) 1492 | otherwise = x 1493 1494 1495 ------- f3 ----------- 1496 -- h is strict in x, so x will be unboxed before it 1497 -- is rerturned in the otherwise case. 1498 1499 data T3 = MkT3 Int Int 1500 1501 f1 :: T3 -> Int 1502 f1 (MkT3 x y) | h x y = f3 (MkT3 x (y-1)) 1503 | otherwise = x 1504 1505 1506 ------- f4 ----------- 1507 -- Just like f2, but MkT4 can't unbox its strict 1508 -- argument automatically, as f2 can 1509 1510 data family Foo a 1511 newtype instance Foo Int = Foo Int 1512 1513 data T4 a = MkT4 !(Foo a) Int 1514 1515 f4 :: T4 Int -> Int 1516 f4 (MkT4 x@(Foo v) y) | y>0 = f4 (MkT4 x (y-1)) 1517 | otherwise = v 1518 1519 1520Note [Initialising strictness] 1521~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1522See section 9.2 (Finding fixpoints) of the paper. 1523 1524Our basic plan is to initialise the strictness of each Id in a 1525recursive group to "bottom", and find a fixpoint from there. However, 1526this group B might be inside an *enclosing* recursive group A, in 1527which case we'll do the entire fixpoint shebang on for each iteration 1528of A. This can be illustrated by the following example: 1529 1530Example: 1531 1532 f [] = [] 1533 f (x:xs) = let g [] = f xs 1534 g (y:ys) = y+1 : g ys 1535 in g (h x) 1536 1537At each iteration of the fixpoint for f, the analyser has to find a 1538fixpoint for the enclosed function g. In the meantime, the demand 1539values for g at each iteration for f are *greater* than those we 1540encountered in the previous iteration for f. Therefore, we can begin 1541the fixpoint for g not with the bottom value but rather with the 1542result of the previous analysis. I.e., when beginning the fixpoint 1543process for g, we can start from the demand signature computed for g 1544previously and attached to the binding occurrence of g. 1545 1546To speed things up, we initialise each iteration of A (the enclosing 1547one) from the result of the last one, which is neatly recorded in each 1548binder. That way we make use of earlier iterations of the fixpoint 1549algorithm. (Cunning plan.) 1550 1551But on the *first* iteration we want to *ignore* the current strictness 1552of the Id, and start from "bottom". Nowadays the Id can have a current 1553strictness, because interface files record strictness for nested bindings. 1554To know when we are in the first iteration, we look at the ae_virgin 1555field of the AnalEnv. 1556 1557 1558Note [Final Demand Analyser run] 1559~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1560Some of the information that the demand analyser determines is not always 1561preserved by the simplifier. For example, the simplifier will happily rewrite 1562 \y [Demand=1*U] let x = y in x + x 1563to 1564 \y [Demand=1*U] y + y 1565which is quite a lie. 1566 1567The once-used information is (currently) only used by the code 1568generator, though. So: 1569 1570 * We zap the used-once info in the worker-wrapper; 1571 see Note [Zapping Used Once info in WorkWrap] in WorkWrap. If it's 1572 not reliable, it's better not to have it at all. 1573 1574 * Just before TidyCore, we add a pass of the demand analyser, 1575 but WITHOUT subsequent worker/wrapper and simplifier, 1576 right before TidyCore. See SimplCore.getCoreToDo. 1577 1578 This way, correct information finds its way into the module interface 1579 (strictness signatures!) and the code generator (single-entry thunks!) 1580 1581Note that, in contrast, the single-call information (C1(..)) /can/ be 1582relied upon, as the simplifier tends to be very careful about not 1583duplicating actual function calls. 1584 1585Also see #11731. 1586-} 1587