1{-# LANGUAGE DeriveDataTypeable #-}
2
3module ApiAnnotation (
4  getAnnotation, getAndRemoveAnnotation,
5  getAnnotationComments,getAndRemoveAnnotationComments,
6  ApiAnns,
7  ApiAnnKey,
8  AnnKeywordId(..),
9  AnnotationComment(..),
10  IsUnicodeSyntax(..),
11  unicodeAnn,
12  HasE(..),
13  LRdrName -- Exists for haddocks only
14  ) where
15
16import GhcPrelude
17
18import RdrName
19import Outputable
20import SrcLoc
21import qualified Data.Map as Map
22import Data.Data
23
24
25{-
26Note [Api annotations]
27~~~~~~~~~~~~~~~~~~~~~~
28Given a parse tree of a Haskell module, how can we reconstruct
29the original Haskell source code, retaining all whitespace and
30source code comments?  We need to track the locations of all
31elements from the original source: this includes keywords such as
32'let' / 'in' / 'do' etc as well as punctuation such as commas and
33braces, and also comments.  We collectively refer to this
34metadata as the "API annotations".
35
36Rather than annotate the resulting parse tree with these locations
37directly (this would be a major change to some fairly core data
38structures in GHC), we instead capture locations for these elements in a
39structure separate from the parse tree, and returned in the
40pm_annotations field of the ParsedModule type.
41
42The full ApiAnns type is
43
44> type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan]                  -- non-comments
45>                , Map.Map SrcSpan [Located AnnotationComment]) -- comments
46
47NON-COMMENT ELEMENTS
48
49Intuitively, every AST element directly contains a bag of keywords
50(keywords can show up more than once in a node: a semicolon i.e. newline
51can show up multiple times before the next AST element), each of which
52needs to be associated with its location in the original source code.
53
54Consequently, the structure that records non-comment elements is logically
55a two level map, from the SrcSpan of the AST element containing it, to
56a map from keywords ('AnnKeyWord') to all locations of the keyword directly
57in the AST element:
58
59> type ApiAnnKey = (SrcSpan,AnnKeywordId)
60>
61> Map.Map ApiAnnKey [SrcSpan]
62
63So
64
65> let x = 1 in 2 *x
66
67would result in the AST element
68
69  L span (HsLet (binds for x = 1) (2 * x))
70
71and the annotations
72
73  (span,AnnLet) having the location of the 'let' keyword
74  (span,AnnEqual) having the location of the '=' sign
75  (span,AnnIn)  having the location of the 'in' keyword
76
77For any given element in the AST, there is only a set number of
78keywords that are applicable for it (e.g., you'll never see an
79'import' keyword associated with a let-binding.)  The set of allowed
80keywords is documented in a comment associated with the constructor
81of a given AST element, although the ground truth is in Parser
82and RdrHsSyn (which actually add the annotations; see #13012).
83
84COMMENT ELEMENTS
85
86Every comment is associated with a *located* AnnotationComment.
87We associate comments with the lowest (most specific) AST element
88enclosing them:
89
90> Map.Map SrcSpan [Located AnnotationComment]
91
92PARSER STATE
93
94There are three fields in PState (the parser state) which play a role
95with annotations.
96
97>  annotations :: [(ApiAnnKey,[SrcSpan])],
98>  comment_q :: [Located AnnotationComment],
99>  annotations_comments :: [(SrcSpan,[Located AnnotationComment])]
100
101The 'annotations' and 'annotations_comments' fields are simple: they simply
102accumulate annotations that will end up in 'ApiAnns' at the end
103(after they are passed to Map.fromList).
104
105The 'comment_q' field captures comments as they are seen in the token stream,
106so that when they are ready to be allocated via the parser they are
107available (at the time we lex a comment, we don't know what the enclosing
108AST node of it is, so we can't associate it with a SrcSpan in
109annotations_comments).
110
111PARSER EMISSION OF ANNOTATIONS
112
113The parser interacts with the lexer using the function
114
115> addAnnotation :: SrcSpan -> AnnKeywordId -> SrcSpan -> P ()
116
117which takes the AST element SrcSpan, the annotation keyword and the
118target SrcSpan.
119
120This adds the annotation to the `annotations` field of `PState` and
121transfers any comments in `comment_q` WHICH ARE ENCLOSED by
122the SrcSpan of this element to the `annotations_comments`
123field.  (Comments which are outside of this annotation are deferred
124until later. 'allocateComments' in 'Lexer' is responsible for
125making sure we only attach comments that actually fit in the 'SrcSpan'.)
126
127The wiki page describing this feature is
128https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations
129
130-}
131-- ---------------------------------------------------------------------
132
133-- If you update this, update the Note [Api annotations] above
134type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan]
135               , Map.Map SrcSpan [Located AnnotationComment])
136
137-- If you update this, update the Note [Api annotations] above
138type ApiAnnKey = (SrcSpan,AnnKeywordId)
139
140
141-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan'
142-- of the annotated AST element, and the known type of the annotation.
143getAnnotation :: ApiAnns -> SrcSpan -> AnnKeywordId -> [SrcSpan]
144getAnnotation (anns,_) span ann
145   = case Map.lookup (span,ann) anns of
146       Nothing -> []
147       Just ss -> ss
148
149-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan'
150-- of the annotated AST element, and the known type of the annotation.
151-- The list is removed from the annotations.
152getAndRemoveAnnotation :: ApiAnns -> SrcSpan -> AnnKeywordId
153                       -> ([SrcSpan],ApiAnns)
154getAndRemoveAnnotation (anns,cs) span ann
155   = case Map.lookup (span,ann) anns of
156       Nothing -> ([],(anns,cs))
157       Just ss -> (ss,(Map.delete (span,ann) anns,cs))
158
159-- |Retrieve the comments allocated to the current 'SrcSpan'
160--
161--  Note: A given 'SrcSpan' may appear in multiple AST elements,
162--  beware of duplicates
163getAnnotationComments :: ApiAnns -> SrcSpan -> [Located AnnotationComment]
164getAnnotationComments (_,anns) span =
165  case Map.lookup span anns of
166    Just cs -> cs
167    Nothing -> []
168
169-- |Retrieve the comments allocated to the current 'SrcSpan', and
170-- remove them from the annotations
171getAndRemoveAnnotationComments :: ApiAnns -> SrcSpan
172                               -> ([Located AnnotationComment],ApiAnns)
173getAndRemoveAnnotationComments (anns,canns) span =
174  case Map.lookup span canns of
175    Just cs -> (cs,(anns,Map.delete span canns))
176    Nothing -> ([],(anns,canns))
177
178-- --------------------------------------------------------------------
179
180-- | API Annotations exist so that tools can perform source to source
181-- conversions of Haskell code. They are used to keep track of the
182-- various syntactic keywords that are not captured in the existing
183-- AST.
184--
185-- The annotations, together with original source comments are made
186-- available in the @'pm_annotations'@ field of @'GHC.ParsedModule'@.
187-- Comments are only retained if @'Opt_KeepRawTokenStream'@ is set in
188-- @'DynFlags.DynFlags'@ before parsing.
189--
190-- The wiki page describing this feature is
191-- https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations
192--
193-- Note: in general the names of these are taken from the
194-- corresponding token, unless otherwise noted
195-- See note [Api annotations] above for details of the usage
196data AnnKeywordId
197    = AnnAnyclass
198    | AnnAs
199    | AnnAt
200    | AnnBang  -- ^ '!'
201    | AnnBackquote -- ^ '`'
202    | AnnBy
203    | AnnCase -- ^ case or lambda case
204    | AnnClass
205    | AnnClose -- ^  '\#)' or '\#-}'  etc
206    | AnnCloseB -- ^ '|)'
207    | AnnCloseBU -- ^ '|)', unicode variant
208    | AnnCloseC -- ^ '}'
209    | AnnCloseQ  -- ^ '|]'
210    | AnnCloseQU -- ^ '|]', unicode variant
211    | AnnCloseP -- ^ ')'
212    | AnnCloseS -- ^ ']'
213    | AnnColon
214    | AnnComma -- ^ as a list separator
215    | AnnCommaTuple -- ^ in a RdrName for a tuple
216    | AnnDarrow -- ^ '=>'
217    | AnnDarrowU -- ^ '=>', unicode variant
218    | AnnData
219    | AnnDcolon -- ^ '::'
220    | AnnDcolonU -- ^ '::', unicode variant
221    | AnnDefault
222    | AnnDeriving
223    | AnnDo
224    | AnnDot    -- ^ '.'
225    | AnnDotdot -- ^ '..'
226    | AnnElse
227    | AnnEqual
228    | AnnExport
229    | AnnFamily
230    | AnnForall
231    | AnnForallU -- ^ Unicode variant
232    | AnnForeign
233    | AnnFunId -- ^ for function name in matches where there are
234               -- multiple equations for the function.
235    | AnnGroup
236    | AnnHeader -- ^ for CType
237    | AnnHiding
238    | AnnIf
239    | AnnImport
240    | AnnIn
241    | AnnInfix -- ^ 'infix' or 'infixl' or 'infixr'
242    | AnnInstance
243    | AnnLam
244    | AnnLarrow     -- ^ '<-'
245    | AnnLarrowU    -- ^ '<-', unicode variant
246    | AnnLet
247    | AnnMdo
248    | AnnMinus -- ^ '-'
249    | AnnModule
250    | AnnNewtype
251    | AnnName -- ^ where a name loses its location in the AST, this carries it
252    | AnnOf
253    | AnnOpen    -- ^ '(\#' or '{-\# LANGUAGE' etc
254    | AnnOpenB   -- ^ '(|'
255    | AnnOpenBU  -- ^ '(|', unicode variant
256    | AnnOpenC   -- ^ '{'
257    | AnnOpenE   -- ^ '[e|' or '[e||'
258    | AnnOpenEQ  -- ^ '[|'
259    | AnnOpenEQU -- ^ '[|', unicode variant
260    | AnnOpenP   -- ^ '('
261    | AnnOpenPE  -- ^ '$('
262    | AnnOpenPTE -- ^ '$$('
263    | AnnOpenS   -- ^ '['
264    | AnnPackageName
265    | AnnPattern
266    | AnnProc
267    | AnnQualified
268    | AnnRarrow -- ^ '->'
269    | AnnRarrowU -- ^ '->', unicode variant
270    | AnnRec
271    | AnnRole
272    | AnnSafe
273    | AnnSemi -- ^ ';'
274    | AnnSimpleQuote -- ^ '''
275    | AnnSignature
276    | AnnStatic -- ^ 'static'
277    | AnnStock
278    | AnnThen
279    | AnnThIdSplice -- ^ '$'
280    | AnnThIdTySplice -- ^ '$$'
281    | AnnThTyQuote -- ^ double '''
282    | AnnTilde -- ^ '~'
283    | AnnType
284    | AnnUnit -- ^ '()' for types
285    | AnnUsing
286    | AnnVal  -- ^ e.g. INTEGER
287    | AnnValStr  -- ^ String value, will need quotes when output
288    | AnnVbar -- ^ '|'
289    | AnnVia -- ^ 'via'
290    | AnnWhere
291    | Annlarrowtail -- ^ '-<'
292    | AnnlarrowtailU -- ^ '-<', unicode variant
293    | Annrarrowtail -- ^ '->'
294    | AnnrarrowtailU -- ^ '->', unicode variant
295    | AnnLarrowtail -- ^ '-<<'
296    | AnnLarrowtailU -- ^ '-<<', unicode variant
297    | AnnRarrowtail -- ^ '>>-'
298    | AnnRarrowtailU -- ^ '>>-', unicode variant
299    | AnnEofPos
300    deriving (Eq, Ord, Data, Show)
301
302instance Outputable AnnKeywordId where
303  ppr x = text (show x)
304
305-- ---------------------------------------------------------------------
306
307data AnnotationComment =
308  -- Documentation annotations
309    AnnDocCommentNext  String     -- ^ something beginning '-- |'
310  | AnnDocCommentPrev  String     -- ^ something beginning '-- ^'
311  | AnnDocCommentNamed String     -- ^ something beginning '-- $'
312  | AnnDocSection      Int String -- ^ a section heading
313  | AnnDocOptions      String     -- ^ doc options (prune, ignore-exports, etc)
314  | AnnLineComment     String     -- ^ comment starting by "--"
315  | AnnBlockComment    String     -- ^ comment in {- -}
316    deriving (Eq, Ord, Data, Show)
317-- Note: these are based on the Token versions, but the Token type is
318-- defined in Lexer.x and bringing it in here would create a loop
319
320instance Outputable AnnotationComment where
321  ppr x = text (show x)
322
323-- | - 'ApiAnnotation.AnnKeywordId' : 'ApiAnnotation.AnnOpen',
324--             'ApiAnnotation.AnnClose','ApiAnnotation.AnnComma',
325--             'ApiAnnotation.AnnRarrow'
326--             'ApiAnnotation.AnnTilde'
327--   - May have 'ApiAnnotation.AnnComma' when in a list
328type LRdrName = Located RdrName
329
330
331-- | Certain tokens can have alternate representations when unicode syntax is
332-- enabled. This flag is attached to those tokens in the lexer so that the
333-- original source representation can be reproduced in the corresponding
334-- 'ApiAnnotation'
335data IsUnicodeSyntax = UnicodeSyntax | NormalSyntax
336    deriving (Eq, Ord, Data, Show)
337
338-- | Convert a normal annotation into its unicode equivalent one
339unicodeAnn :: AnnKeywordId -> AnnKeywordId
340unicodeAnn AnnForall     = AnnForallU
341unicodeAnn AnnDcolon     = AnnDcolonU
342unicodeAnn AnnLarrow     = AnnLarrowU
343unicodeAnn AnnRarrow     = AnnRarrowU
344unicodeAnn AnnDarrow     = AnnDarrowU
345unicodeAnn Annlarrowtail = AnnlarrowtailU
346unicodeAnn Annrarrowtail = AnnrarrowtailU
347unicodeAnn AnnLarrowtail = AnnLarrowtailU
348unicodeAnn AnnRarrowtail = AnnRarrowtailU
349unicodeAnn AnnOpenB      = AnnOpenBU
350unicodeAnn AnnCloseB     = AnnCloseBU
351unicodeAnn AnnOpenEQ     = AnnOpenEQU
352unicodeAnn AnnCloseQ     = AnnCloseQU
353unicodeAnn ann           = ann
354
355
356-- | Some template haskell tokens have two variants, one with an `e` the other
357-- not:
358--
359-- >  [| or [e|
360-- >  [|| or [e||
361--
362-- This type indicates whether the 'e' is present or not.
363data HasE = HasE | NoE
364     deriving (Eq, Ord, Data, Show)
365