1{-
2(c) The GRASP/AQUA Project, Glasgow University, 1993-1998
3
4
5                        -----------------
6                        A demand analysis
7                        -----------------
8-}
9
10{-# LANGUAGE CPP #-}
11
12module DmdAnal ( dmdAnalProgram ) where
13
14#include "HsVersions.h"
15
16import GhcPrelude
17
18import DynFlags
19import WwLib            ( findTypeShape, deepSplitProductType_maybe )
20import Demand   -- All of it
21import CoreSyn
22import CoreSeq          ( seqBinds )
23import Outputable
24import VarEnv
25import BasicTypes
26import Data.List        ( mapAccumL, sortBy )
27import DataCon
28import Id
29import CoreUtils        ( exprIsHNF, exprType, exprIsTrivial, exprOkForSpeculation )
30import TyCon
31import Type
32import Coercion         ( Coercion, coVarsOfCo )
33import FamInstEnv
34import Util
35import Maybes           ( isJust )
36import TysWiredIn
37import TysPrim          ( realWorldStatePrimTy )
38import ErrUtils         ( dumpIfSet_dyn )
39import Name             ( getName, stableNameCmp )
40import Data.Function    ( on )
41import UniqSet
42
43{-
44************************************************************************
45*                                                                      *
46\subsection{Top level stuff}
47*                                                                      *
48************************************************************************
49-}
50
51dmdAnalProgram :: DynFlags -> FamInstEnvs -> CoreProgram -> IO CoreProgram
52dmdAnalProgram dflags fam_envs binds
53  = do {
54        let { binds_plus_dmds = do_prog binds } ;
55        dumpIfSet_dyn dflags Opt_D_dump_str_signatures
56                      "Strictness signatures" $
57            dumpStrSig binds_plus_dmds ;
58        -- See Note [Stamp out space leaks in demand analysis]
59        seqBinds binds_plus_dmds `seq` return binds_plus_dmds
60    }
61  where
62    do_prog :: CoreProgram -> CoreProgram
63    do_prog binds = snd $ mapAccumL dmdAnalTopBind (emptyAnalEnv dflags fam_envs) binds
64
65-- Analyse a (group of) top-level binding(s)
66dmdAnalTopBind :: AnalEnv
67               -> CoreBind
68               -> (AnalEnv, CoreBind)
69dmdAnalTopBind env (NonRec id rhs)
70  = (extendAnalEnv TopLevel env id2 (idStrictness id2), NonRec id2 rhs2)
71  where
72    ( _, _,   rhs1) = dmdAnalRhsLetDown TopLevel Nothing env             cleanEvalDmd id rhs
73    ( _, id2, rhs2) = dmdAnalRhsLetDown TopLevel Nothing (nonVirgin env) cleanEvalDmd id rhs1
74        -- Do two passes to improve CPR information
75        -- See Note [CPR for thunks]
76        -- See Note [Optimistic CPR in the "virgin" case]
77        -- See Note [Initial CPR for strict binders]
78
79dmdAnalTopBind env (Rec pairs)
80  = (env', Rec pairs')
81  where
82    (env', _, pairs')  = dmdFix TopLevel env cleanEvalDmd pairs
83                -- We get two iterations automatically
84                -- c.f. the NonRec case above
85
86{- Note [Stamp out space leaks in demand analysis]
87~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
88The demand analysis pass outputs a new copy of the Core program in
89which binders have been annotated with demand and strictness
90information. It's tiresome to ensure that this information is fully
91evaluated everywhere that we produce it, so we just run a single
92seqBinds over the output before returning it, to ensure that there are
93no references holding on to the input Core program.
94
95This makes a ~30% reduction in peak memory usage when compiling
96DynFlags (cf #9675 and #13426).
97
98This is particularly important when we are doing late demand analysis,
99since we don't do a seqBinds at any point thereafter. Hence code
100generation would hold on to an extra copy of the Core program, via
101unforced thunks in demand or strictness information; and it is the
102most memory-intensive part of the compilation process, so this added
103seqBinds makes a big difference in peak memory usage.
104-}
105
106
107{-
108************************************************************************
109*                                                                      *
110\subsection{The analyser itself}
111*                                                                      *
112************************************************************************
113
114Note [Ensure demand is strict]
115~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
116It's important not to analyse e with a lazy demand because
117a) When we encounter   case s of (a,b) ->
118        we demand s with U(d1d2)... but if the overall demand is lazy
119        that is wrong, and we'd need to reduce the demand on s,
120        which is inconvenient
121b) More important, consider
122        f (let x = R in x+x), where f is lazy
123   We still want to mark x as demanded, because it will be when we
124   enter the let.  If we analyse f's arg with a Lazy demand, we'll
125   just mark x as Lazy
126c) The application rule wouldn't be right either
127   Evaluating (f x) in a L demand does *not* cause
128   evaluation of f in a C(L) demand!
129-}
130
131-- If e is complicated enough to become a thunk, its contents will be evaluated
132-- at most once, so oneify it.
133dmdTransformThunkDmd :: CoreExpr -> Demand -> Demand
134dmdTransformThunkDmd e
135  | exprIsTrivial e = id
136  | otherwise       = oneifyDmd
137
138-- Do not process absent demands
139-- Otherwise act like in a normal demand analysis
140-- See ↦* relation in the Cardinality Analysis paper
141dmdAnalStar :: AnalEnv
142            -> Demand   -- This one takes a *Demand*
143            -> CoreExpr -- Should obey the let/app invariatn
144            -> (BothDmdArg, CoreExpr)
145dmdAnalStar env dmd e
146  | (dmd_shell, cd) <- toCleanDmd dmd
147  , (dmd_ty, e')    <- dmdAnal env cd e
148  = ASSERT2( not (isUnliftedType (exprType e)) || exprOkForSpeculation e, ppr e )
149    -- The argument 'e' should satisfy the let/app invariant
150    -- See Note [Analysing with absent demand] in Demand.hs
151    (postProcessDmdType dmd_shell dmd_ty, e')
152
153-- Main Demand Analsysis machinery
154dmdAnal, dmdAnal' :: AnalEnv
155        -> CleanDemand         -- The main one takes a *CleanDemand*
156        -> CoreExpr -> (DmdType, CoreExpr)
157
158-- The CleanDemand is always strict and not absent
159--    See Note [Ensure demand is strict]
160
161dmdAnal env d e = -- pprTrace "dmdAnal" (ppr d <+> ppr e) $
162                  dmdAnal' env d e
163
164dmdAnal' _ _ (Lit lit)     = (nopDmdType, Lit lit)
165dmdAnal' _ _ (Type ty)     = (nopDmdType, Type ty)      -- Doesn't happen, in fact
166dmdAnal' _ _ (Coercion co)
167  = (unitDmdType (coercionDmdEnv co), Coercion co)
168
169dmdAnal' env dmd (Var var)
170  = (dmdTransform env var dmd, Var var)
171
172dmdAnal' env dmd (Cast e co)
173  = (dmd_ty `bothDmdType` mkBothDmdArg (coercionDmdEnv co), Cast e' co)
174  where
175    (dmd_ty, e') = dmdAnal env dmd e
176
177dmdAnal' env dmd (Tick t e)
178  = (dmd_ty, Tick t e')
179  where
180    (dmd_ty, e') = dmdAnal env dmd e
181
182dmdAnal' env dmd (App fun (Type ty))
183  = (fun_ty, App fun' (Type ty))
184  where
185    (fun_ty, fun') = dmdAnal env dmd fun
186
187-- Lots of the other code is there to make this
188-- beautiful, compositional, application rule :-)
189dmdAnal' env dmd (App fun arg)
190  = -- This case handles value arguments (type args handled above)
191    -- Crucially, coercions /are/ handled here, because they are
192    -- value arguments (#10288)
193    let
194        call_dmd          = mkCallDmd dmd
195        (fun_ty, fun')    = dmdAnal env call_dmd fun
196        (arg_dmd, res_ty) = splitDmdTy fun_ty
197        (arg_ty, arg')    = dmdAnalStar env (dmdTransformThunkDmd arg arg_dmd) arg
198    in
199--    pprTrace "dmdAnal:app" (vcat
200--         [ text "dmd =" <+> ppr dmd
201--         , text "expr =" <+> ppr (App fun arg)
202--         , text "fun dmd_ty =" <+> ppr fun_ty
203--         , text "arg dmd =" <+> ppr arg_dmd
204--         , text "arg dmd_ty =" <+> ppr arg_ty
205--         , text "res dmd_ty =" <+> ppr res_ty
206--         , text "overall res dmd_ty =" <+> ppr (res_ty `bothDmdType` arg_ty) ])
207    (res_ty `bothDmdType` arg_ty, App fun' arg')
208
209dmdAnal' env dmd (Lam var body)
210  | isTyVar var
211  = let
212        (body_ty, body') = dmdAnal env dmd body
213    in
214    (body_ty, Lam var body')
215
216  | otherwise
217  = let (body_dmd, defer_and_use) = peelCallDmd dmd
218          -- body_dmd: a demand to analyze the body
219
220        env'             = extendSigsWithLam env var
221        (body_ty, body') = dmdAnal env' body_dmd body
222        (lam_ty, var')   = annotateLamIdBndr env notArgOfDfun body_ty var
223    in
224    (postProcessUnsat defer_and_use lam_ty, Lam var' body')
225
226dmdAnal' env dmd (Case scrut case_bndr ty [(DataAlt dc, bndrs, rhs)])
227  -- Only one alternative with a product constructor
228  | let tycon = dataConTyCon dc
229  , isJust (isDataProductTyCon_maybe tycon)
230  , Just rec_tc' <- checkRecTc (ae_rec_tc env) tycon
231  = let
232        env_w_tc                 = env { ae_rec_tc = rec_tc' }
233        env_alt                  = extendEnvForProdAlt env_w_tc scrut case_bndr dc bndrs
234        (rhs_ty, rhs')           = dmdAnal env_alt dmd rhs
235        (alt_ty1, dmds)          = findBndrsDmds env rhs_ty bndrs
236        (alt_ty2, case_bndr_dmd) = findBndrDmd env False alt_ty1 case_bndr
237        id_dmds                  = addCaseBndrDmd case_bndr_dmd dmds
238        alt_ty3 | io_hack_reqd scrut dc bndrs = deferAfterIO alt_ty2
239                | otherwise                   = alt_ty2
240
241        -- Compute demand on the scrutinee
242        -- See Note [Demand on scrutinee of a product case]
243        scrut_dmd          = mkProdDmd id_dmds
244        (scrut_ty, scrut') = dmdAnal env scrut_dmd scrut
245        res_ty             = alt_ty3 `bothDmdType` toBothDmdArg scrut_ty
246        case_bndr'         = setIdDemandInfo case_bndr case_bndr_dmd
247        bndrs'             = setBndrsDemandInfo bndrs id_dmds
248    in
249--    pprTrace "dmdAnal:Case1" (vcat [ text "scrut" <+> ppr scrut
250--                                   , text "dmd" <+> ppr dmd
251--                                   , text "case_bndr_dmd" <+> ppr (idDemandInfo case_bndr')
252--                                   , text "id_dmds" <+> ppr id_dmds
253--                                   , text "scrut_dmd" <+> ppr scrut_dmd
254--                                   , text "scrut_ty" <+> ppr scrut_ty
255--                                   , text "alt_ty" <+> ppr alt_ty2
256--                                   , text "res_ty" <+> ppr res_ty ]) $
257    (res_ty, Case scrut' case_bndr' ty [(DataAlt dc, bndrs', rhs')])
258
259dmdAnal' env dmd (Case scrut case_bndr ty alts)
260  = let      -- Case expression with multiple alternatives
261        (alt_tys, alts')     = mapAndUnzip (dmdAnalAlt env dmd case_bndr) alts
262        (scrut_ty, scrut')   = dmdAnal env cleanEvalDmd scrut
263        (alt_ty, case_bndr') = annotateBndr env (foldr lubDmdType botDmdType alt_tys) case_bndr
264                               -- NB: Base case is botDmdType, for empty case alternatives
265                               --     This is a unit for lubDmdType, and the right result
266                               --     when there really are no alternatives
267        res_ty               = alt_ty `bothDmdType` toBothDmdArg scrut_ty
268    in
269--    pprTrace "dmdAnal:Case2" (vcat [ text "scrut" <+> ppr scrut
270--                                   , text "scrut_ty" <+> ppr scrut_ty
271--                                   , text "alt_tys" <+> ppr alt_tys
272--                                   , text "alt_ty" <+> ppr alt_ty
273--                                   , text "res_ty" <+> ppr res_ty ]) $
274    (res_ty, Case scrut' case_bndr' ty alts')
275
276-- Let bindings can be processed in two ways:
277-- Down (RHS before body) or Up (body before RHS).
278-- The following case handle the up variant.
279--
280-- It is very simple. For  let x = rhs in body
281--   * Demand-analyse 'body' in the current environment
282--   * Find the demand, 'rhs_dmd' placed on 'x' by 'body'
283--   * Demand-analyse 'rhs' in 'rhs_dmd'
284--
285-- This is used for a non-recursive local let without manifest lambdas.
286-- This is the LetUp rule in the paper “Higher-Order Cardinality Analysis”.
287dmdAnal' env dmd (Let (NonRec id rhs) body)
288  | useLetUp id
289  = (final_ty, Let (NonRec id' rhs') body')
290  where
291    (body_ty, body')   = dmdAnal env dmd body
292    (body_ty', id_dmd) = findBndrDmd env notArgOfDfun body_ty id
293    id'                = setIdDemandInfo id id_dmd
294
295    (rhs_ty, rhs')     = dmdAnalStar env (dmdTransformThunkDmd rhs id_dmd) rhs
296    final_ty           = body_ty' `bothDmdType` rhs_ty
297
298dmdAnal' env dmd (Let (NonRec id rhs) body)
299  = (body_ty2, Let (NonRec id2 rhs') body')
300  where
301    (lazy_fv, id1, rhs') = dmdAnalRhsLetDown NotTopLevel Nothing env dmd id rhs
302    env1                 = extendAnalEnv NotTopLevel env id1 (idStrictness id1)
303    (body_ty, body')     = dmdAnal env1 dmd body
304    (body_ty1, id2)      = annotateBndr env body_ty id1
305    body_ty2             = addLazyFVs body_ty1 lazy_fv -- see Note [Lazy and unleashable free variables]
306
307        -- If the actual demand is better than the vanilla call
308        -- demand, you might think that we might do better to re-analyse
309        -- the RHS with the stronger demand.
310        -- But (a) That seldom happens, because it means that *every* path in
311        --         the body of the let has to use that stronger demand
312        -- (b) It often happens temporarily in when fixpointing, because
313        --     the recursive function at first seems to place a massive demand.
314        --     But we don't want to go to extra work when the function will
315        --     probably iterate to something less demanding.
316        -- In practice, all the times the actual demand on id2 is more than
317        -- the vanilla call demand seem to be due to (b).  So we don't
318        -- bother to re-analyse the RHS.
319
320dmdAnal' env dmd (Let (Rec pairs) body)
321  = let
322        (env', lazy_fv, pairs') = dmdFix NotTopLevel env dmd pairs
323        (body_ty, body')        = dmdAnal env' dmd body
324        body_ty1                = deleteFVs body_ty (map fst pairs)
325        body_ty2                = addLazyFVs body_ty1 lazy_fv -- see Note [Lazy and unleashable free variables]
326    in
327    body_ty2 `seq`
328    (body_ty2,  Let (Rec pairs') body')
329
330io_hack_reqd :: CoreExpr -> DataCon -> [Var] -> Bool
331-- See Note [IO hack in the demand analyser]
332io_hack_reqd scrut con bndrs
333  | (bndr:_) <- bndrs
334  , con == tupleDataCon Unboxed 2
335  , idType bndr `eqType` realWorldStatePrimTy
336  , (fun, _) <- collectArgs scrut
337  = case fun of
338      Var f -> not (isPrimOpId f)
339      _     -> True
340  | otherwise
341  = False
342
343dmdAnalAlt :: AnalEnv -> CleanDemand -> Id -> Alt Var -> (DmdType, Alt Var)
344dmdAnalAlt env dmd case_bndr (con,bndrs,rhs)
345  | null bndrs    -- Literals, DEFAULT, and nullary constructors
346  , (rhs_ty, rhs') <- dmdAnal env dmd rhs
347  = (rhs_ty, (con, [], rhs'))
348
349  | otherwise     -- Non-nullary data constructors
350  , (rhs_ty, rhs') <- dmdAnal env dmd rhs
351  , (alt_ty, dmds) <- findBndrsDmds env rhs_ty bndrs
352  , let case_bndr_dmd = findIdDemand alt_ty case_bndr
353        id_dmds       = addCaseBndrDmd case_bndr_dmd dmds
354  = (alt_ty, (con, setBndrsDemandInfo bndrs id_dmds, rhs'))
355
356
357{- Note [IO hack in the demand analyser]
358~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
359There's a hack here for I/O operations.  Consider
360
361     case foo x s of { (# s', r #) -> y }
362
363Is this strict in 'y'? Often not! If foo x s performs some observable action
364(including raising an exception with raiseIO#, modifying a mutable variable, or
365even ending the program normally), then we must not force 'y' (which may fail
366to terminate) until we have performed foo x s.
367
368Hackish solution: spot the IO-like situation and add a virtual branch,
369as if we had
370     case foo x s of
371        (# s, r #) -> y
372        other      -> return ()
373So the 'y' isn't necessarily going to be evaluated
374
375A more complete example (#148, #1592) where this shows up is:
376     do { let len = <expensive> ;
377        ; when (...) (exitWith ExitSuccess)
378        ; print len }
379
380However, consider
381  f x s = case getMaskingState# s of
382            (# s, r #) ->
383          case x of I# x2 -> ...
384
385Here it is terribly sad to make 'f' lazy in 's'.  After all,
386getMaskingState# is not going to diverge or throw an exception!  This
387situation actually arises in GHC.IO.Handle.Internals.wantReadableHandle
388(on an MVar not an Int), and made a material difference.
389
390So if the scrutinee is a primop call, we *don't* apply the
391state hack:
392  - If it is a simple, terminating one like getMaskingState,
393    applying the hack is over-conservative.
394  - If the primop is raise# then it returns bottom, so
395    the case alternatives are already discarded.
396  - If the primop can raise a non-IO exception, like
397    divide by zero or seg-fault (eg writing an array
398    out of bounds) then we don't mind evaluating 'x' first.
399
400Note [Demand on the scrutinee of a product case]
401~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
402When figuring out the demand on the scrutinee of a product case,
403we use the demands of the case alternative, i.e. id_dmds.
404But note that these include the demand on the case binder;
405see Note [Demand on case-alternative binders] in Demand.hs.
406This is crucial. Example:
407   f x = case x of y { (a,b) -> k y a }
408If we just take scrut_demand = U(L,A), then we won't pass x to the
409worker, so the worker will rebuild
410     x = (a, absent-error)
411and that'll crash.
412
413Note [Aggregated demand for cardinality]
414~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
415We use different strategies for strictness and usage/cardinality to
416"unleash" demands captured on free variables by bindings. Let us
417consider the example:
418
419f1 y = let {-# NOINLINE h #-}
420           h = y
421       in  (h, h)
422
423We are interested in obtaining cardinality demand U1 on |y|, as it is
424used only in a thunk, and, therefore, is not going to be updated any
425more. Therefore, the demand on |y|, captured and unleashed by usage of
426|h| is U1. However, if we unleash this demand every time |h| is used,
427and then sum up the effects, the ultimate demand on |y| will be U1 +
428U1 = U. In order to avoid it, we *first* collect the aggregate demand
429on |h| in the body of let-expression, and only then apply the demand
430transformer:
431
432transf[x](U) = {y |-> U1}
433
434so the resulting demand on |y| is U1.
435
436The situation is, however, different for strictness, where this
437aggregating approach exhibits worse results because of the nature of
438|both| operation for strictness. Consider the example:
439
440f y c =
441  let h x = y |seq| x
442   in case of
443        True  -> h True
444        False -> y
445
446It is clear that |f| is strict in |y|, however, the suggested analysis
447will infer from the body of |let| that |h| is used lazily (as it is
448used in one branch only), therefore lazy demand will be put on its
449free variable |y|. Conversely, if the demand on |h| is unleashed right
450on the spot, we will get the desired result, namely, that |f| is
451strict in |y|.
452
453
454************************************************************************
455*                                                                      *
456                    Demand transformer
457*                                                                      *
458************************************************************************
459-}
460
461dmdTransform :: AnalEnv         -- The strictness environment
462             -> Id              -- The function
463             -> CleanDemand     -- The demand on the function
464             -> DmdType         -- The demand type of the function in this context
465        -- Returned DmdEnv includes the demand on
466        -- this function plus demand on its free variables
467
468dmdTransform env var dmd
469  | isDataConWorkId var                          -- Data constructor
470  = dmdTransformDataConSig (idArity var) (idStrictness var) dmd
471
472  | gopt Opt_DmdTxDictSel (ae_dflags env),
473    Just _ <- isClassOpId_maybe var -- Dictionary component selector
474  = dmdTransformDictSelSig (idStrictness var) dmd
475
476  | isGlobalId var                               -- Imported function
477  = let res = dmdTransformSig (idStrictness var) dmd in
478--    pprTrace "dmdTransform" (vcat [ppr var, ppr (idStrictness var), ppr dmd, ppr res])
479    res
480
481  | Just (sig, top_lvl) <- lookupSigEnv env var  -- Local letrec bound thing
482  , let fn_ty = dmdTransformSig sig dmd
483  = -- pprTrace "dmdTransform" (vcat [ppr var, ppr sig, ppr dmd, ppr fn_ty]) $
484    if isTopLevel top_lvl
485    then fn_ty   -- Don't record top level things
486    else addVarDmd fn_ty var (mkOnceUsedDmd dmd)
487
488  | otherwise                                    -- Local non-letrec-bound thing
489  = unitDmdType (unitVarEnv var (mkOnceUsedDmd dmd))
490
491{-
492************************************************************************
493*                                                                      *
494\subsection{Bindings}
495*                                                                      *
496************************************************************************
497-}
498
499-- Recursive bindings
500dmdFix :: TopLevelFlag
501       -> AnalEnv                            -- Does not include bindings for this binding
502       -> CleanDemand
503       -> [(Id,CoreExpr)]
504       -> (AnalEnv, DmdEnv, [(Id,CoreExpr)]) -- Binders annotated with stricness info
505
506dmdFix top_lvl env let_dmd orig_pairs
507  = loop 1 initial_pairs
508  where
509    bndrs = map fst orig_pairs
510
511    -- See Note [Initialising strictness]
512    initial_pairs | ae_virgin env = [(setIdStrictness id botSig, rhs) | (id, rhs) <- orig_pairs ]
513                  | otherwise     = orig_pairs
514
515    -- If fixed-point iteration does not yield a result we use this instead
516    -- See Note [Safe abortion in the fixed-point iteration]
517    abort :: (AnalEnv, DmdEnv, [(Id,CoreExpr)])
518    abort = (env, lazy_fv', zapped_pairs)
519      where (lazy_fv, pairs') = step True (zapIdStrictness orig_pairs)
520            -- Note [Lazy and unleashable free variables]
521            non_lazy_fvs = plusVarEnvList $ map (strictSigDmdEnv . idStrictness . fst) pairs'
522            lazy_fv'     = lazy_fv `plusVarEnv` mapVarEnv (const topDmd) non_lazy_fvs
523            zapped_pairs = zapIdStrictness pairs'
524
525    -- The fixed-point varies the idStrictness field of the binders, and terminates if that
526    -- annotation does not change any more.
527    loop :: Int -> [(Id,CoreExpr)] -> (AnalEnv, DmdEnv, [(Id,CoreExpr)])
528    loop n pairs
529      | found_fixpoint = (final_anal_env, lazy_fv, pairs')
530      | n == 10        = abort
531      | otherwise      = loop (n+1) pairs'
532      where
533        found_fixpoint    = map (idStrictness . fst) pairs' == map (idStrictness . fst) pairs
534        first_round       = n == 1
535        (lazy_fv, pairs') = step first_round pairs
536        final_anal_env    = extendAnalEnvs top_lvl env (map fst pairs')
537
538    step :: Bool -> [(Id, CoreExpr)] -> (DmdEnv, [(Id, CoreExpr)])
539    step first_round pairs = (lazy_fv, pairs')
540      where
541        -- In all but the first iteration, delete the virgin flag
542        start_env | first_round = env
543                  | otherwise   = nonVirgin env
544
545        start = (extendAnalEnvs top_lvl start_env (map fst pairs), emptyDmdEnv)
546
547        ((_,lazy_fv), pairs') = mapAccumL my_downRhs start pairs
548                -- mapAccumL: Use the new signature to do the next pair
549                -- The occurrence analyser has arranged them in a good order
550                -- so this can significantly reduce the number of iterations needed
551
552        my_downRhs (env, lazy_fv) (id,rhs)
553          = ((env', lazy_fv'), (id', rhs'))
554          where
555            (lazy_fv1, id', rhs') = dmdAnalRhsLetDown top_lvl (Just bndrs) env let_dmd id rhs
556            lazy_fv'              = plusVarEnv_C bothDmd lazy_fv lazy_fv1
557            env'                  = extendAnalEnv top_lvl env id (idStrictness id')
558
559
560    zapIdStrictness :: [(Id, CoreExpr)] -> [(Id, CoreExpr)]
561    zapIdStrictness pairs = [(setIdStrictness id nopSig, rhs) | (id, rhs) <- pairs ]
562
563{-
564Note [Safe abortion in the fixed-point iteration]
565~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
566
567Fixed-point iteration may fail to terminate. But we cannot simply give up and
568return the environment and code unchanged! We still need to do one additional
569round, for two reasons:
570
571 * To get information on used free variables (both lazy and strict!)
572   (see Note [Lazy and unleashable free variables])
573 * To ensure that all expressions have been traversed at least once, and any left-over
574   strictness annotations have been updated.
575
576This final iteration does not add the variables to the strictness signature
577environment, which effectively assigns them 'nopSig' (see "getStrictness")
578
579-}
580
581-- Let bindings can be processed in two ways:
582-- Down (RHS before body) or Up (body before RHS).
583-- dmdAnalRhsLetDown implements the Down variant:
584--  * assuming a demand of <L,U>
585--  * looking at the definition
586--  * determining a strictness signature
587--
588-- It is used for toplevel definition, recursive definitions and local
589-- non-recursive definitions that have manifest lambdas.
590-- Local non-recursive definitions without a lambda are handled with LetUp.
591--
592-- This is the LetDown rule in the paper “Higher-Order Cardinality Analysis”.
593dmdAnalRhsLetDown :: TopLevelFlag
594           -> Maybe [Id]   -- Just bs <=> recursive, Nothing <=> non-recursive
595           -> AnalEnv -> CleanDemand
596           -> Id -> CoreExpr
597           -> (DmdEnv, Id, CoreExpr)
598-- Process the RHS of the binding, add the strictness signature
599-- to the Id, and augment the environment with the signature as well.
600dmdAnalRhsLetDown top_lvl rec_flag env let_dmd id rhs
601  = (lazy_fv, id', rhs')
602  where
603    rhs_arity      = idArity id
604    rhs_dmd
605      -- See Note [Demand analysis for join points]
606      -- See Note [Invariants on join points] invariant 2b, in CoreSyn
607      --     rhs_arity matches the join arity of the join point
608      | isJoinId id
609      = mkCallDmds rhs_arity let_dmd
610      | otherwise
611      -- NB: rhs_arity
612      -- See Note [Demand signatures are computed for a threshold demand based on idArity]
613      = mkRhsDmd env rhs_arity rhs
614    (DmdType rhs_fv rhs_dmds rhs_res, rhs')
615                   = dmdAnal env rhs_dmd rhs
616    sig            = mkStrictSigForArity rhs_arity (mkDmdType sig_fv rhs_dmds rhs_res')
617    id'            = set_idStrictness env id sig
618        -- See Note [NOINLINE and strictness]
619
620
621    -- See Note [Aggregated demand for cardinality]
622    rhs_fv1 = case rec_flag of
623                Just bs -> reuseEnv (delVarEnvList rhs_fv bs)
624                Nothing -> rhs_fv
625
626    -- See Note [Lazy and unleashable free variables]
627    (lazy_fv, sig_fv) = splitFVs is_thunk rhs_fv1
628
629    rhs_res'  = trimCPRInfo trim_all trim_sums rhs_res
630    trim_all  = is_thunk && not_strict
631    trim_sums = not (isTopLevel top_lvl) -- See Note [CPR for sum types]
632
633    -- See Note [CPR for thunks]
634    is_thunk = not (exprIsHNF rhs) && not (isJoinId id)
635    not_strict
636       =  isTopLevel top_lvl  -- Top level and recursive things don't
637       || isJust rec_flag     -- get their demandInfo set at all
638       || not (isStrictDmd (idDemandInfo id) || ae_virgin env)
639          -- See Note [Optimistic CPR in the "virgin" case]
640
641-- | @mkRhsDmd env rhs_arity rhs@ creates a 'CleanDemand' for
642-- unleashing on the given function's @rhs@, by creating a call demand of
643-- @rhs_arity@ with a body demand appropriate for possible product types.
644-- See Note [Product demands for function body].
645-- For example, a call of the form @mkRhsDmd _ 2 (\x y -> (x, y))@ returns a
646-- clean usage demand of @C1(C1(U(U,U)))@.
647mkRhsDmd :: AnalEnv -> Arity -> CoreExpr -> CleanDemand
648mkRhsDmd env rhs_arity rhs =
649  case peelTsFuns rhs_arity (findTypeShape (ae_fam_envs env) (exprType rhs)) of
650    Just (TsProd tss) -> mkCallDmds rhs_arity (cleanEvalProdDmd (length tss))
651    _                 -> mkCallDmds rhs_arity cleanEvalDmd
652
653-- | If given the let-bound 'Id', 'useLetUp' determines whether we should
654-- process the binding up (body before rhs) or down (rhs before body).
655--
656-- We use LetDown if there is a chance to get a useful strictness signature to
657-- unleash at call sites. LetDown is generally more precise than LetUp if we can
658-- correctly guess how it will be used in the body, that is, for which incoming
659-- demand the strictness signature should be computed, which allows us to
660-- unleash higher-order demands on arguments at call sites. This is mostly the
661-- case when
662--
663--   * The binding takes any arguments before performing meaningful work (cf.
664--     'idArity'), in which case we are interested to see how it uses them.
665--   * The binding is a join point, hence acting like a function, not a value.
666--     As a big plus, we know *precisely* how it will be used in the body; since
667--     it's always tail-called, we can directly unleash the incoming demand of
668--     the let binding on its RHS when computing a strictness signature. See
669--     [Demand analysis for join points].
670--
671-- Thus, if the binding is not a join point and its arity is 0, we have a thunk
672-- and use LetUp, implying that we have no usable demand signature available
673-- when we analyse the let body.
674--
675-- Since thunk evaluation is memoised, we want to unleash its 'DmdEnv' of free
676-- vars at most once, regardless of how many times it was forced in the body.
677-- This makes a real difference wrt. usage demands. The other reason is being
678-- able to unleash a more precise product demand on its RHS once we know how the
679-- thunk was used in the let body.
680--
681-- Characteristic examples, always assuming a single evaluation:
682--
683--   * @let x = 2*y in x + x@ => LetUp. Compared to LetDown, we find out that
684--     the expression uses @y@ at most once.
685--   * @let x = (a,b) in fst x@ => LetUp. Compared to LetDown, we find out that
686--     @b@ is absent.
687--   * @let f x = x*2 in f y@ => LetDown. Compared to LetUp, we find out that
688--     the expression uses @y@ strictly, because we have @f@'s demand signature
689--     available at the call site.
690--   * @join exit = 2*y in if a then exit else if b then exit else 3*y@ =>
691--     LetDown. Compared to LetUp, we find out that the expression uses @y@
692--     strictly, because we can unleash @exit@'s signature at each call site.
693--   * For a more convincing example with join points, see Note [Demand analysis
694--     for join points].
695--
696useLetUp :: Var -> Bool
697useLetUp f = idArity f == 0 && not (isJoinId f)
698
699{- Note [Demand analysis for join points]
700~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
701Consider
702   g :: (Int,Int) -> Int
703   g (p,q) = p+q
704
705   f :: T -> Int -> Int
706   f x p = g (join j y = (p,y)
707              in case x of
708                   A -> j 3
709                   B -> j 4
710                   C -> (p,7))
711
712If j was a vanilla function definition, we'd analyse its body with
713evalDmd, and think that it was lazy in p.  But for join points we can
714do better!  We know that j's body will (if called at all) be evaluated
715with the demand that consumes the entire join-binding, in this case
716the argument demand from g.  Whizzo!  g evaluates both components of
717its argument pair, so p will certainly be evaluated if j is called.
718
719For f to be strict in p, we need /all/ paths to evaluate p; in this
720case the C branch does so too, so we are fine.  So, as usual, we need
721to transport demands on free variables to the call site(s).  Compare
722Note [Lazy and unleashable free variables].
723
724The implementation is easy.  When analysing a join point, we can
725analyse its body with the demand from the entire join-binding (written
726let_dmd here).
727
728Another win for join points!  #13543.
729
730However, note that the strictness signature for a join point can
731look a little puzzling.  E.g.
732
733    (join j x = \y. error "urk")
734    (in case v of              )
735    (     A -> j 3             )  x
736    (     B -> j 4             )
737    (     C -> \y. blah        )
738
739The entire thing is in a C(S) context, so j's strictness signature
740will be    [A]b
741meaning one absent argument, returns bottom.  That seems odd because
742there's a \y inside.  But it's right because when consumed in a C(1)
743context the RHS of the join point is indeed bottom.
744
745Note [Demand signatures are computed for a threshold demand based on idArity]
746~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
747We compute demand signatures assuming idArity incoming arguments to approximate
748behavior for when we have a call site with at least that many arguments. idArity
749is /at least/ the number of manifest lambdas, but might be higher for PAPs and
750trivial RHS (see Note [Demand analysis for trivial right-hand sides]).
751
752Because idArity of a function varies independently of its cardinality properties
753(cf. Note [idArity varies independently of dmdTypeDepth]), we implicitly encode
754the arity for when a demand signature is sound to unleash in its 'dmdTypeDepth'
755(cf. Note [Understanding DmdType and StrictSig] in Demand). It is unsound to
756unleash a demand signature when the incoming number of arguments is less than
757that. See Note [What are demand signatures?] for more details on soundness.
758
759Why idArity arguments? Because that's a conservative estimate of how many
760arguments we must feed a function before it does anything interesting with them.
761Also it elegantly subsumes the trivial RHS and PAP case.
762
763There might be functions for which we might want to analyse for more incoming
764arguments than idArity. Example:
765
766  f x =
767    if expensive
768      then \y -> ... y ...
769      else \y -> ... y ...
770
771We'd analyse `f` under a unary call demand C(S), corresponding to idArity
772being 1. That's enough to look under the manifest lambda and find out how a
773unary call would use `x`, but not enough to look into the lambdas in the if
774branches.
775
776On the other hand, if we analysed for call demand C(C(S)), we'd get useful
777strictness info for `y` (and more precise info on `x`) and possibly CPR
778information, but
779
780  * We would no longer be able to unleash the signature at unary call sites
781  * Performing the worker/wrapper split based on this information would be
782    implicitly eta-expanding `f`, playing fast and loose with divergence and
783    even being unsound in the presence of newtypes, so we refrain from doing so.
784    Also see Note [Don't eta expand in w/w] in WorkWrap.
785
786Since we only compute one signature, we do so for arity 1. Computing multiple
787signatures for different arities (i.e., polyvariance) would be entirely
788possible, if it weren't for the additional runtime and implementation
789complexity.
790
791Note [idArity varies independently of dmdTypeDepth]
792~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
793We used to check in CoreLint that dmdTypeDepth <= idArity for a let-bound
794identifier. But that means we would have to zap demand signatures every time we
795reset or decrease arity. That's an unnecessary dependency, because
796
797  * The demand signature captures a semantic property that is independent of
798    what the binding's current arity is
799  * idArity is analysis information itself, thus volatile
800  * We already *have* dmdTypeDepth, wo why not just use it to encode the
801    threshold for when to unleash the signature
802    (cf. Note [Understanding DmdType and StrictSig] in Demand)
803
804Consider the following expression, for example:
805
806    (let go x y = `x` seq ... in go) |> co
807
808`go` might have a strictness signature of `<S><L>`. The simplifier will identify
809`go` as a nullary join point through `joinPointBinding_maybe` and float the
810coercion into the binding, leading to an arity decrease:
811
812    join go = (\x y -> `x` seq ...) |> co in go
813
814With the CoreLint check, we would have to zap `go`'s perfectly viable strictness
815signature.
816
817Note [What are demand signatures?]
818~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
819Demand analysis interprets expressions in the abstract domain of demand
820transformers. Given an incoming demand we put an expression under, its abstract
821transformer gives us back a demand type denoting how other things (like
822arguments and free vars) were used when the expression was evaluated.
823Here's an example:
824
825  f x y =
826    if x + expensive
827      then \z -> z + y * ...
828      else \z -> z * ...
829
830The abstract transformer (let's call it F_e) of the if expression (let's call it
831e) would transform an incoming head demand <S,HU> into a demand type like
832{x-><S,1*U>,y-><L,U>}<L,U>. In pictures:
833
834     Demand ---F_e---> DmdType
835     <S,HU>            {x-><S,1*U>,y-><L,U>}<L,U>
836
837Let's assume that the demand transformers we compute for an expression are
838correct wrt. to some concrete semantics for Core. How do demand signatures fit
839in? They are strange beasts, given that they come with strict rules when to
840it's sound to unleash them.
841
842Fortunately, we can formalise the rules with Galois connections. Consider
843f's strictness signature, {}<S,1*U><L,U>. It's a single-point approximation of
844the actual abstract transformer of f's RHS for arity 2. So, what happens is that
845we abstract *once more* from the abstract domain we already are in, replacing
846the incoming Demand by a simple lattice with two elements denoting incoming
847arity: A_2 = {<2, >=2} (where '<2' is the top element and >=2 the bottom
848element). Here's the diagram:
849
850     A_2 -----f_f----> DmdType
851      ^                   |
852      | α               γ |
853      |                   v
854     Demand ---F_f---> DmdType
855
856With
857  α(C1(C1(_))) = >=2 -- example for usage demands, but similar for strictness
858  α(_)         =  <2
859  γ(ty)        =  ty
860and F_f being the abstract transformer of f's RHS and f_f being the abstracted
861abstract transformer computable from our demand signature simply by
862
863  f_f(>=2) = {}<S,1*U><L,U>
864  f_f(<2)  = postProcessUnsat {}<S,1*U><L,U>
865
866where postProcessUnsat makes a proper top element out of the given demand type.
867
868Note [Demand analysis for trivial right-hand sides]
869~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
870Consider
871    foo = plusInt |> co
872where plusInt is an arity-2 function with known strictness.  Clearly
873we want plusInt's strictness to propagate to foo!  But because it has
874no manifest lambdas, it won't do so automatically, and indeed 'co' might
875have type (Int->Int->Int) ~ T.
876
877Fortunately, CoreArity gives 'foo' arity 2, which is enough for LetDown to
878forward plusInt's demand signature, and all is well (see Note [Newtype arity] in
879CoreArity)! A small example is the test case NewtypeArity.
880
881
882Note [Product demands for function body]
883~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
884This example comes from shootout/binary_trees:
885
886    Main.check' = \ b z ds. case z of z' { I# ip ->
887                                case ds_d13s of
888                                  Main.Nil -> z'
889                                  Main.Node s14k s14l s14m ->
890                                    Main.check' (not b)
891                                      (Main.check' b
892                                         (case b {
893                                            False -> I# (-# s14h s14k);
894                                            True  -> I# (+# s14h s14k)
895                                          })
896                                         s14l)
897                                     s14m   }   }   }
898
899Here we *really* want to unbox z, even though it appears to be used boxed in
900the Nil case.  Partly the Nil case is not a hot path.  But more specifically,
901the whole function gets the CPR property if we do.
902
903So for the demand on the body of a RHS we use a product demand if it's
904a product type.
905
906************************************************************************
907*                                                                      *
908\subsection{Strictness signatures and types}
909*                                                                      *
910************************************************************************
911-}
912
913unitDmdType :: DmdEnv -> DmdType
914unitDmdType dmd_env = DmdType dmd_env [] topRes
915
916coercionDmdEnv :: Coercion -> DmdEnv
917coercionDmdEnv co = mapVarEnv (const topDmd) (getUniqSet $ coVarsOfCo co)
918                    -- The VarSet from coVarsOfCo is really a VarEnv Var
919
920addVarDmd :: DmdType -> Var -> Demand -> DmdType
921addVarDmd (DmdType fv ds res) var dmd
922  = DmdType (extendVarEnv_C bothDmd fv var dmd) ds res
923
924addLazyFVs :: DmdType -> DmdEnv -> DmdType
925addLazyFVs dmd_ty lazy_fvs
926  = dmd_ty `bothDmdType` mkBothDmdArg lazy_fvs
927        -- Using bothDmdType (rather than just both'ing the envs)
928        -- is vital.  Consider
929        --      let f = \x -> (x,y)
930        --      in  error (f 3)
931        -- Here, y is treated as a lazy-fv of f, but we must `bothDmd` that L
932        -- demand with the bottom coming up from 'error'
933        --
934        -- I got a loop in the fixpointer without this, due to an interaction
935        -- with the lazy_fv filtering in dmdAnalRhsLetDown.  Roughly, it was
936        --      letrec f n x
937        --          = letrec g y = x `fatbar`
938        --                         letrec h z = z + ...g...
939        --                         in h (f (n-1) x)
940        --      in ...
941        -- In the initial iteration for f, f=Bot
942        -- Suppose h is found to be strict in z, but the occurrence of g in its RHS
943        -- is lazy.  Now consider the fixpoint iteration for g, esp the demands it
944        -- places on its free variables.  Suppose it places none.  Then the
945        --      x `fatbar` ...call to h...
946        -- will give a x->V demand for x.  That turns into a L demand for x,
947        -- which floats out of the defn for h.  Without the modifyEnv, that
948        -- L demand doesn't get both'd with the Bot coming up from the inner
949        -- call to f.  So we just get an L demand for x for g.
950
951{-
952Note [Do not strictify the argument dictionaries of a dfun]
953~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
954The typechecker can tie recursive knots involving dfuns, so we do the
955conservative thing and refrain from strictifying a dfun's argument
956dictionaries.
957-}
958
959setBndrsDemandInfo :: [Var] -> [Demand] -> [Var]
960setBndrsDemandInfo (b:bs) (d:ds)
961  | isTyVar b = b : setBndrsDemandInfo bs (d:ds)
962  | otherwise = setIdDemandInfo b d : setBndrsDemandInfo bs ds
963setBndrsDemandInfo [] ds = ASSERT( null ds ) []
964setBndrsDemandInfo bs _  = pprPanic "setBndrsDemandInfo" (ppr bs)
965
966annotateBndr :: AnalEnv -> DmdType -> Var -> (DmdType, Var)
967-- The returned env has the var deleted
968-- The returned var is annotated with demand info
969-- according to the result demand of the provided demand type
970-- No effect on the argument demands
971annotateBndr env dmd_ty var
972  | isId var  = (dmd_ty', setIdDemandInfo var dmd)
973  | otherwise = (dmd_ty, var)
974  where
975    (dmd_ty', dmd) = findBndrDmd env False dmd_ty var
976
977annotateLamIdBndr :: AnalEnv
978                  -> DFunFlag   -- is this lambda at the top of the RHS of a dfun?
979                  -> DmdType    -- Demand type of body
980                  -> Id         -- Lambda binder
981                  -> (DmdType,  -- Demand type of lambda
982                      Id)       -- and binder annotated with demand
983
984annotateLamIdBndr env arg_of_dfun dmd_ty id
985-- For lambdas we add the demand to the argument demands
986-- Only called for Ids
987  = ASSERT( isId id )
988    -- pprTrace "annLamBndr" (vcat [ppr id, ppr _dmd_ty]) $
989    (final_ty, setIdDemandInfo id dmd)
990  where
991      -- Watch out!  See note [Lambda-bound unfoldings]
992    final_ty = case maybeUnfoldingTemplate (idUnfolding id) of
993                 Nothing  -> main_ty
994                 Just unf -> main_ty `bothDmdType` unf_ty
995                          where
996                             (unf_ty, _) = dmdAnalStar env dmd unf
997
998    main_ty = addDemand dmd dmd_ty'
999    (dmd_ty', dmd) = findBndrDmd env arg_of_dfun dmd_ty id
1000
1001deleteFVs :: DmdType -> [Var] -> DmdType
1002deleteFVs (DmdType fvs dmds res) bndrs
1003  = DmdType (delVarEnvList fvs bndrs) dmds res
1004
1005{-
1006Note [CPR for sum types]
1007~~~~~~~~~~~~~~~~~~~~~~~~
1008At the moment we do not do CPR for let-bindings that
1009   * non-top level
1010   * bind a sum type
1011Reason: I found that in some benchmarks we were losing let-no-escapes,
1012which messed it all up.  Example
1013   let j = \x. ....
1014   in case y of
1015        True  -> j False
1016        False -> j True
1017If we w/w this we get
1018   let j' = \x. ....
1019   in case y of
1020        True  -> case j' False of { (# a #) -> Just a }
1021        False -> case j' True of { (# a #) -> Just a }
1022Notice that j' is not a let-no-escape any more.
1023
1024However this means in turn that the *enclosing* function
1025may be CPR'd (via the returned Justs).  But in the case of
1026sums, there may be Nothing alternatives; and that messes
1027up the sum-type CPR.
1028
1029Conclusion: only do this for products.  It's still not
1030guaranteed OK for products, but sums definitely lose sometimes.
1031
1032Note [CPR for thunks]
1033~~~~~~~~~~~~~~~~~~~~~
1034If the rhs is a thunk, we usually forget the CPR info, because
1035it is presumably shared (else it would have been inlined, and
1036so we'd lose sharing if w/w'd it into a function).  E.g.
1037
1038        let r = case expensive of
1039                  (a,b) -> (b,a)
1040        in ...
1041
1042If we marked r as having the CPR property, then we'd w/w into
1043
1044        let $wr = \() -> case expensive of
1045                            (a,b) -> (# b, a #)
1046            r = case $wr () of
1047                  (# b,a #) -> (b,a)
1048        in ...
1049
1050But now r is a thunk, which won't be inlined, so we are no further ahead.
1051But consider
1052
1053        f x = let r = case expensive of (a,b) -> (b,a)
1054              in if foo r then r else (x,x)
1055
1056Does f have the CPR property?  Well, no.
1057
1058However, if the strictness analyser has figured out (in a previous
1059iteration) that it's strict, then we DON'T need to forget the CPR info.
1060Instead we can retain the CPR info and do the thunk-splitting transform
1061(see WorkWrap.splitThunk).
1062
1063This made a big difference to PrelBase.modInt, which had something like
1064        modInt = \ x -> let r = ... -> I# v in
1065                        ...body strict in r...
1066r's RHS isn't a value yet; but modInt returns r in various branches, so
1067if r doesn't have the CPR property then neither does modInt
1068Another case I found in practice (in Complex.magnitude), looks like this:
1069                let k = if ... then I# a else I# b
1070                in ... body strict in k ....
1071(For this example, it doesn't matter whether k is returned as part of
1072the overall result; but it does matter that k's RHS has the CPR property.)
1073Left to itself, the simplifier will make a join point thus:
1074                let $j k = ...body strict in k...
1075                if ... then $j (I# a) else $j (I# b)
1076With thunk-splitting, we get instead
1077                let $j x = let k = I#x in ...body strict in k...
1078                in if ... then $j a else $j b
1079This is much better; there's a good chance the I# won't get allocated.
1080
1081The difficulty with this is that we need the strictness type to
1082look at the body... but we now need the body to calculate the demand
1083on the variable, so we can decide whether its strictness type should
1084have a CPR in it or not.  Simple solution:
1085        a) use strictness info from the previous iteration
1086        b) make sure we do at least 2 iterations, by doing a second
1087           round for top-level non-recs.  Top level recs will get at
1088           least 2 iterations except for totally-bottom functions
1089           which aren't very interesting anyway.
1090
1091NB: strictly_demanded is never true of a top-level Id, or of a recursive Id.
1092
1093Note [Optimistic CPR in the "virgin" case]
1094~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1095Demand and strictness info are initialized by top elements. However,
1096this prevents from inferring a CPR property in the first pass of the
1097analyser, so we keep an explicit flag ae_virgin in the AnalEnv
1098datatype.
1099
1100We can't start with 'not-demanded' (i.e., top) because then consider
1101        f x = let
1102                  t = ... I# x
1103              in
1104              if ... then t else I# y else f x'
1105
1106In the first iteration we'd have no demand info for x, so assume
1107not-demanded; then we'd get TopRes for f's CPR info.  Next iteration
1108we'd see that t was demanded, and so give it the CPR property, but by
1109now f has TopRes, so it will stay TopRes.  Instead, by checking the
1110ae_virgin flag at the first time round, we say 'yes t is demanded' the
1111first time.
1112
1113However, this does mean that for non-recursive bindings we must
1114iterate twice to be sure of not getting over-optimistic CPR info,
1115in the case where t turns out to be not-demanded.  This is handled
1116by dmdAnalTopBind.
1117
1118
1119Note [NOINLINE and strictness]
1120~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1121The strictness analyser used to have a HACK which ensured that NOINLNE
1122things were not strictness-analysed.  The reason was unsafePerformIO.
1123Left to itself, the strictness analyser would discover this strictness
1124for unsafePerformIO:
1125        unsafePerformIO:  C(U(AV))
1126But then consider this sub-expression
1127        unsafePerformIO (\s -> let r = f x in
1128                               case writeIORef v r s of (# s1, _ #) ->
1129                               (# s1, r #)
1130The strictness analyser will now find that r is sure to be eval'd,
1131and may then hoist it out.  This makes tests/lib/should_run/memo002
1132deadlock.
1133
1134Solving this by making all NOINLINE things have no strictness info is overkill.
1135In particular, it's overkill for runST, which is perfectly respectable.
1136Consider
1137        f x = runST (return x)
1138This should be strict in x.
1139
1140So the new plan is to define unsafePerformIO using the 'lazy' combinator:
1141
1142        unsafePerformIO (IO m) = lazy (case m realWorld# of (# _, r #) -> r)
1143
1144Remember, 'lazy' is a wired-in identity-function Id, of type a->a, which is
1145magically NON-STRICT, and is inlined after strictness analysis.  So
1146unsafePerformIO will look non-strict, and that's what we want.
1147
1148Now we don't need the hack in the strictness analyser.  HOWEVER, this
1149decision does mean that even a NOINLINE function is not entirely
1150opaque: some aspect of its implementation leaks out, notably its
1151strictness.  For example, if you have a function implemented by an
1152error stub, but which has RULES, you may want it not to be eliminated
1153in favour of error!
1154
1155Note [Lazy and unleashable free variables]
1156~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1157We put the strict and once-used FVs in the DmdType of the Id, so
1158that at its call sites we unleash demands on its strict fvs.
1159An example is 'roll' in imaginary/wheel-sieve2
1160Something like this:
1161        roll x = letrec
1162                     go y = if ... then roll (x-1) else x+1
1163                 in
1164                 go ms
1165We want to see that roll is strict in x, which is because
1166go is called.   So we put the DmdEnv for x in go's DmdType.
1167
1168Another example:
1169
1170        f :: Int -> Int -> Int
1171        f x y = let t = x+1
1172            h z = if z==0 then t else
1173                  if z==1 then x+1 else
1174                  x + h (z-1)
1175        in h y
1176
1177Calling h does indeed evaluate x, but we can only see
1178that if we unleash a demand on x at the call site for t.
1179
1180Incidentally, here's a place where lambda-lifting h would
1181lose the cigar --- we couldn't see the joint strictness in t/x
1182
1183        ON THE OTHER HAND
1184
1185We don't want to put *all* the fv's from the RHS into the
1186DmdType. Because
1187
1188 * it makes the strictness signatures larger, and hence slows down fixpointing
1189
1190and
1191
1192 * it is useless information at the call site anyways:
1193   For lazy, used-many times fv's we will never get any better result than
1194   that, no matter how good the actual demand on the function at the call site
1195   is (unless it is always absent, but then the whole binder is useless).
1196
1197Therefore we exclude lazy multiple-used fv's from the environment in the
1198DmdType.
1199
1200But now the signature lies! (Missing variables are assumed to be absent.) To
1201make up for this, the code that analyses the binding keeps the demand on those
1202variable separate (usually called "lazy_fv") and adds it to the demand of the
1203whole binding later.
1204
1205What if we decide _not_ to store a strictness signature for a binding at all, as
1206we do when aborting a fixed-point iteration? The we risk losing the information
1207that the strict variables are being used. In that case, we take all free variables
1208mentioned in the (unsound) strictness signature, conservatively approximate the
1209demand put on them (topDmd), and add that to the "lazy_fv" returned by "dmdFix".
1210
1211
1212Note [Lambda-bound unfoldings]
1213~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1214We allow a lambda-bound variable to carry an unfolding, a facility that is used
1215exclusively for join points; see Note [Case binders and join points].  If so,
1216we must be careful to demand-analyse the RHS of the unfolding!  Example
1217   \x. \y{=Just x}. <body>
1218Then if <body> uses 'y', then transitively it uses 'x', and we must not
1219forget that fact, otherwise we might make 'x' absent when it isn't.
1220
1221
1222************************************************************************
1223*                                                                      *
1224\subsection{Strictness signatures}
1225*                                                                      *
1226************************************************************************
1227-}
1228
1229type DFunFlag = Bool  -- indicates if the lambda being considered is in the
1230                      -- sequence of lambdas at the top of the RHS of a dfun
1231notArgOfDfun :: DFunFlag
1232notArgOfDfun = False
1233
1234data AnalEnv
1235  = AE { ae_dflags :: DynFlags
1236       , ae_sigs   :: SigEnv
1237       , ae_virgin :: Bool    -- True on first iteration only
1238                              -- See Note [Initialising strictness]
1239       , ae_rec_tc :: RecTcChecker
1240       , ae_fam_envs :: FamInstEnvs
1241 }
1242
1243        -- We use the se_env to tell us whether to
1244        -- record info about a variable in the DmdEnv
1245        -- We do so if it's a LocalId, but not top-level
1246        --
1247        -- The DmdEnv gives the demand on the free vars of the function
1248        -- when it is given enough args to satisfy the strictness signature
1249
1250type SigEnv = VarEnv (StrictSig, TopLevelFlag)
1251
1252instance Outputable AnalEnv where
1253  ppr (AE { ae_sigs = env, ae_virgin = virgin })
1254    = text "AE" <+> braces (vcat
1255         [ text "ae_virgin =" <+> ppr virgin
1256         , text "ae_sigs =" <+> ppr env ])
1257
1258emptyAnalEnv :: DynFlags -> FamInstEnvs -> AnalEnv
1259emptyAnalEnv dflags fam_envs
1260    = AE { ae_dflags = dflags
1261         , ae_sigs = emptySigEnv
1262         , ae_virgin = True
1263         , ae_rec_tc = initRecTc
1264         , ae_fam_envs = fam_envs
1265         }
1266
1267emptySigEnv :: SigEnv
1268emptySigEnv = emptyVarEnv
1269
1270-- | Extend an environment with the strictness IDs attached to the id
1271extendAnalEnvs :: TopLevelFlag -> AnalEnv -> [Id] -> AnalEnv
1272extendAnalEnvs top_lvl env vars
1273  = env { ae_sigs = extendSigEnvs top_lvl (ae_sigs env) vars }
1274
1275extendSigEnvs :: TopLevelFlag -> SigEnv -> [Id] -> SigEnv
1276extendSigEnvs top_lvl sigs vars
1277  = extendVarEnvList sigs [ (var, (idStrictness var, top_lvl)) | var <- vars]
1278
1279extendAnalEnv :: TopLevelFlag -> AnalEnv -> Id -> StrictSig -> AnalEnv
1280extendAnalEnv top_lvl env var sig
1281  = env { ae_sigs = extendSigEnv top_lvl (ae_sigs env) var sig }
1282
1283extendSigEnv :: TopLevelFlag -> SigEnv -> Id -> StrictSig -> SigEnv
1284extendSigEnv top_lvl sigs var sig = extendVarEnv sigs var (sig, top_lvl)
1285
1286lookupSigEnv :: AnalEnv -> Id -> Maybe (StrictSig, TopLevelFlag)
1287lookupSigEnv env id = lookupVarEnv (ae_sigs env) id
1288
1289nonVirgin :: AnalEnv -> AnalEnv
1290nonVirgin env = env { ae_virgin = False }
1291
1292extendSigsWithLam :: AnalEnv -> Id -> AnalEnv
1293-- Extend the AnalEnv when we meet a lambda binder
1294extendSigsWithLam env id
1295  | isId id
1296  , isStrictDmd (idDemandInfo id) || ae_virgin env
1297       -- See Note [Optimistic CPR in the "virgin" case]
1298       -- See Note [Initial CPR for strict binders]
1299  , Just (dc,_,_,_) <- deepSplitProductType_maybe (ae_fam_envs env) $ idType id
1300  = extendAnalEnv NotTopLevel env id (cprProdSig (dataConRepArity dc))
1301
1302  | otherwise
1303  = env
1304
1305extendEnvForProdAlt :: AnalEnv -> CoreExpr -> Id -> DataCon -> [Var] -> AnalEnv
1306-- See Note [CPR in a product case alternative]
1307extendEnvForProdAlt env scrut case_bndr dc bndrs
1308  = foldl' do_con_arg env1 ids_w_strs
1309  where
1310    env1 = extendAnalEnv NotTopLevel env case_bndr case_bndr_sig
1311
1312    ids_w_strs    = filter isId bndrs `zip` dataConRepStrictness dc
1313    case_bndr_sig = cprProdSig (dataConRepArity dc)
1314    fam_envs      = ae_fam_envs env
1315
1316    do_con_arg env (id, str)
1317       | let is_strict = isStrictDmd (idDemandInfo id) || isMarkedStrict str
1318       , ae_virgin env || (is_var_scrut && is_strict)  -- See Note [CPR in a product case alternative]
1319       , Just (dc,_,_,_) <- deepSplitProductType_maybe fam_envs $ idType id
1320       = extendAnalEnv NotTopLevel env id (cprProdSig (dataConRepArity dc))
1321       | otherwise
1322       = env
1323
1324    is_var_scrut = is_var scrut
1325    is_var (Cast e _) = is_var e
1326    is_var (Var v)    = isLocalId v
1327    is_var _          = False
1328
1329findBndrsDmds :: AnalEnv -> DmdType -> [Var] -> (DmdType, [Demand])
1330-- Return the demands on the Ids in the [Var]
1331findBndrsDmds env dmd_ty bndrs
1332  = go dmd_ty bndrs
1333  where
1334    go dmd_ty []  = (dmd_ty, [])
1335    go dmd_ty (b:bs)
1336      | isId b    = let (dmd_ty1, dmds) = go dmd_ty bs
1337                        (dmd_ty2, dmd)  = findBndrDmd env False dmd_ty1 b
1338                    in (dmd_ty2, dmd : dmds)
1339      | otherwise = go dmd_ty bs
1340
1341findBndrDmd :: AnalEnv -> Bool -> DmdType -> Id -> (DmdType, Demand)
1342-- See Note [Trimming a demand to a type] in Demand.hs
1343findBndrDmd env arg_of_dfun dmd_ty id
1344  = (dmd_ty', dmd')
1345  where
1346    dmd' = killUsageDemand (ae_dflags env) $
1347           strictify $
1348           trimToType starting_dmd (findTypeShape fam_envs id_ty)
1349
1350    (dmd_ty', starting_dmd) = peelFV dmd_ty id
1351
1352    id_ty = idType id
1353
1354    strictify dmd
1355      | gopt Opt_DictsStrict (ae_dflags env)
1356             -- We never want to strictify a recursive let. At the moment
1357             -- annotateBndr is only call for non-recursive lets; if that
1358             -- changes, we need a RecFlag parameter and another guard here.
1359      , not arg_of_dfun -- See Note [Do not strictify the argument dictionaries of a dfun]
1360      = strictifyDictDmd id_ty dmd
1361      | otherwise
1362      = dmd
1363
1364    fam_envs = ae_fam_envs env
1365
1366set_idStrictness :: AnalEnv -> Id -> StrictSig -> Id
1367set_idStrictness env id sig
1368  = setIdStrictness id (killUsageSig (ae_dflags env) sig)
1369
1370dumpStrSig :: CoreProgram -> SDoc
1371dumpStrSig binds = vcat (map printId ids)
1372  where
1373  ids = sortBy (stableNameCmp `on` getName) (concatMap getIds binds)
1374  getIds (NonRec i _) = [ i ]
1375  getIds (Rec bs)     = map fst bs
1376  printId id | isExportedId id = ppr id <> colon <+> pprIfaceStrictSig (idStrictness id)
1377             | otherwise       = empty
1378
1379{- Note [CPR in a product case alternative]
1380~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1381In a case alternative for a product type, we want to give some of the
1382binders the CPR property.  Specifically
1383
1384 * The case binder; inside the alternative, the case binder always has
1385   the CPR property, meaning that a case on it will successfully cancel.
1386   Example:
1387        f True  x = case x of y { I# x' -> if x' ==# 3
1388                                           then y
1389                                           else I# 8 }
1390        f False x = I# 3
1391
1392   By giving 'y' the CPR property, we ensure that 'f' does too, so we get
1393        f b x = case fw b x of { r -> I# r }
1394        fw True  x = case x of y { I# x' -> if x' ==# 3 then x' else 8 }
1395        fw False x = 3
1396
1397   Of course there is the usual risk of re-boxing: we have 'x' available
1398   boxed and unboxed, but we return the unboxed version for the wrapper to
1399   box.  If the wrapper doesn't cancel with its caller, we'll end up
1400   re-boxing something that we did have available in boxed form.
1401
1402 * Any strict binders with product type, can use
1403   Note [Initial CPR for strict binders].  But we can go a little
1404   further. Consider
1405
1406      data T = MkT !Int Int
1407
1408      f2 (MkT x y) | y>0       = f2 (MkT x (y-1))
1409                   | otherwise = x
1410
1411   For $wf2 we are going to unbox the MkT *and*, since it is strict, the
1412   first argument of the MkT; see Note [Add demands for strict constructors]
1413   in WwLib. But then we don't want box it up again when returning it! We want
1414   'f2' to have the CPR property, so we give 'x' the CPR property.
1415
1416 * It's a bit delicate because if this case is scrutinising something other
1417   than an argument the original function, we really don't have the unboxed
1418   version available.  E.g
1419      g v = case foo v of
1420              MkT x y | y>0       -> ...
1421                      | otherwise -> x
1422   Here we don't have the unboxed 'x' available.  Hence the
1423   is_var_scrut test when making use of the strictness annotation.
1424   Slightly ad-hoc, because even if the scrutinee *is* a variable it
1425   might not be a onre of the arguments to the original function, or a
1426   sub-component thereof.  But it's simple, and nothing terrible
1427   happens if we get it wrong.  e.g. #10694.
1428
1429
1430Note [Initial CPR for strict binders]
1431~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1432CPR is initialized for a lambda binder in an optimistic manner, i.e,
1433if the binder is used strictly and at least some of its components as
1434a product are used, which is checked by the value of the absence
1435demand.
1436
1437If the binder is marked demanded with a strict demand, then give it a
1438CPR signature. Here's a concrete example ('f1' in test T10482a),
1439assuming h is strict:
1440
1441  f1 :: Int -> Int
1442  f1 x = case h x of
1443          A -> x
1444          B -> f1 (x-1)
1445          C -> x+1
1446
1447If we notice that 'x' is used strictly, we can give it the CPR
1448property; and hence f1 gets the CPR property too.  It's sound (doesn't
1449change strictness) to give it the CPR property because by the time 'x'
1450is returned (case A above), it'll have been evaluated (by the wrapper
1451of 'h' in the example).
1452
1453Moreover, if f itself is strict in x, then we'll pass x unboxed to
1454f1, and so the boxed version *won't* be available; in that case it's
1455very helpful to give 'x' the CPR property.
1456
1457Note that
1458
1459  * We only want to do this for something that definitely
1460    has product type, else we may get over-optimistic CPR results
1461    (e.g. from \x -> x!).
1462
1463  * See Note [CPR examples]
1464
1465Note [CPR examples]
1466~~~~~~~~~~~~~~~~~~~~
1467Here are some examples (stranal/should_compile/T10482a) of the
1468usefulness of Note [CPR in a product case alternative].  The main
1469point: all of these functions can have the CPR property.
1470
1471    ------- f1 -----------
1472    -- x is used strictly by h, so it'll be available
1473    -- unboxed before it is returned in the True branch
1474
1475    f1 :: Int -> Int
1476    f1 x = case h x x of
1477            True  -> x
1478            False -> f1 (x-1)
1479
1480
1481    ------- f2 -----------
1482    -- x is a strict field of MkT2, so we'll pass it unboxed
1483    -- to $wf2, so it's available unboxed.  This depends on
1484    -- the case expression analysing (a subcomponent of) one
1485    -- of the original arguments to the function, so it's
1486    -- a bit more delicate.
1487
1488    data T2 = MkT2 !Int Int
1489
1490    f2 :: T2 -> Int
1491    f2 (MkT2 x y) | y>0       = f2 (MkT2 x (y-1))
1492                  | otherwise = x
1493
1494
1495    ------- f3 -----------
1496    -- h is strict in x, so x will be unboxed before it
1497    -- is rerturned in the otherwise case.
1498
1499    data T3 = MkT3 Int Int
1500
1501    f1 :: T3 -> Int
1502    f1 (MkT3 x y) | h x y     = f3 (MkT3 x (y-1))
1503                  | otherwise = x
1504
1505
1506    ------- f4 -----------
1507    -- Just like f2, but MkT4 can't unbox its strict
1508    -- argument automatically, as f2 can
1509
1510    data family Foo a
1511    newtype instance Foo Int = Foo Int
1512
1513    data T4 a = MkT4 !(Foo a) Int
1514
1515    f4 :: T4 Int -> Int
1516    f4 (MkT4 x@(Foo v) y) | y>0       = f4 (MkT4 x (y-1))
1517                          | otherwise = v
1518
1519
1520Note [Initialising strictness]
1521~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1522See section 9.2 (Finding fixpoints) of the paper.
1523
1524Our basic plan is to initialise the strictness of each Id in a
1525recursive group to "bottom", and find a fixpoint from there.  However,
1526this group B might be inside an *enclosing* recursive group A, in
1527which case we'll do the entire fixpoint shebang on for each iteration
1528of A. This can be illustrated by the following example:
1529
1530Example:
1531
1532  f [] = []
1533  f (x:xs) = let g []     = f xs
1534                 g (y:ys) = y+1 : g ys
1535              in g (h x)
1536
1537At each iteration of the fixpoint for f, the analyser has to find a
1538fixpoint for the enclosed function g. In the meantime, the demand
1539values for g at each iteration for f are *greater* than those we
1540encountered in the previous iteration for f. Therefore, we can begin
1541the fixpoint for g not with the bottom value but rather with the
1542result of the previous analysis. I.e., when beginning the fixpoint
1543process for g, we can start from the demand signature computed for g
1544previously and attached to the binding occurrence of g.
1545
1546To speed things up, we initialise each iteration of A (the enclosing
1547one) from the result of the last one, which is neatly recorded in each
1548binder.  That way we make use of earlier iterations of the fixpoint
1549algorithm. (Cunning plan.)
1550
1551But on the *first* iteration we want to *ignore* the current strictness
1552of the Id, and start from "bottom".  Nowadays the Id can have a current
1553strictness, because interface files record strictness for nested bindings.
1554To know when we are in the first iteration, we look at the ae_virgin
1555field of the AnalEnv.
1556
1557
1558Note [Final Demand Analyser run]
1559~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1560Some of the information that the demand analyser determines is not always
1561preserved by the simplifier.  For example, the simplifier will happily rewrite
1562  \y [Demand=1*U] let x = y in x + x
1563to
1564  \y [Demand=1*U] y + y
1565which is quite a lie.
1566
1567The once-used information is (currently) only used by the code
1568generator, though.  So:
1569
1570 * We zap the used-once info in the worker-wrapper;
1571   see Note [Zapping Used Once info in WorkWrap] in WorkWrap. If it's
1572   not reliable, it's better not to have it at all.
1573
1574 * Just before TidyCore, we add a pass of the demand analyser,
1575      but WITHOUT subsequent worker/wrapper and simplifier,
1576   right before TidyCore.  See SimplCore.getCoreToDo.
1577
1578   This way, correct information finds its way into the module interface
1579   (strictness signatures!) and the code generator (single-entry thunks!)
1580
1581Note that, in contrast, the single-call information (C1(..)) /can/ be
1582relied upon, as the simplifier tends to be very careful about not
1583duplicating actual function calls.
1584
1585Also see #11731.
1586-}
1587