1{-# LANGUAGE DeriveDataTypeable #-} 2 3module ApiAnnotation ( 4 getAnnotation, getAndRemoveAnnotation, 5 getAnnotationComments,getAndRemoveAnnotationComments, 6 ApiAnns, 7 ApiAnnKey, 8 AnnKeywordId(..), 9 AnnotationComment(..), 10 IsUnicodeSyntax(..), 11 unicodeAnn, 12 HasE(..), 13 LRdrName -- Exists for haddocks only 14 ) where 15 16import GhcPrelude 17 18import RdrName 19import Outputable 20import SrcLoc 21import qualified Data.Map as Map 22import Data.Data 23 24 25{- 26Note [Api annotations] 27~~~~~~~~~~~~~~~~~~~~~~ 28Given a parse tree of a Haskell module, how can we reconstruct 29the original Haskell source code, retaining all whitespace and 30source code comments? We need to track the locations of all 31elements from the original source: this includes keywords such as 32'let' / 'in' / 'do' etc as well as punctuation such as commas and 33braces, and also comments. We collectively refer to this 34metadata as the "API annotations". 35 36Rather than annotate the resulting parse tree with these locations 37directly (this would be a major change to some fairly core data 38structures in GHC), we instead capture locations for these elements in a 39structure separate from the parse tree, and returned in the 40pm_annotations field of the ParsedModule type. 41 42The full ApiAnns type is 43 44> type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan] -- non-comments 45> , Map.Map SrcSpan [Located AnnotationComment]) -- comments 46 47NON-COMMENT ELEMENTS 48 49Intuitively, every AST element directly contains a bag of keywords 50(keywords can show up more than once in a node: a semicolon i.e. newline 51can show up multiple times before the next AST element), each of which 52needs to be associated with its location in the original source code. 53 54Consequently, the structure that records non-comment elements is logically 55a two level map, from the SrcSpan of the AST element containing it, to 56a map from keywords ('AnnKeyWord') to all locations of the keyword directly 57in the AST element: 58 59> type ApiAnnKey = (SrcSpan,AnnKeywordId) 60> 61> Map.Map ApiAnnKey [SrcSpan] 62 63So 64 65> let x = 1 in 2 *x 66 67would result in the AST element 68 69 L span (HsLet (binds for x = 1) (2 * x)) 70 71and the annotations 72 73 (span,AnnLet) having the location of the 'let' keyword 74 (span,AnnEqual) having the location of the '=' sign 75 (span,AnnIn) having the location of the 'in' keyword 76 77For any given element in the AST, there is only a set number of 78keywords that are applicable for it (e.g., you'll never see an 79'import' keyword associated with a let-binding.) The set of allowed 80keywords is documented in a comment associated with the constructor 81of a given AST element, although the ground truth is in Parser 82and RdrHsSyn (which actually add the annotations; see #13012). 83 84COMMENT ELEMENTS 85 86Every comment is associated with a *located* AnnotationComment. 87We associate comments with the lowest (most specific) AST element 88enclosing them: 89 90> Map.Map SrcSpan [Located AnnotationComment] 91 92PARSER STATE 93 94There are three fields in PState (the parser state) which play a role 95with annotations. 96 97> annotations :: [(ApiAnnKey,[SrcSpan])], 98> comment_q :: [Located AnnotationComment], 99> annotations_comments :: [(SrcSpan,[Located AnnotationComment])] 100 101The 'annotations' and 'annotations_comments' fields are simple: they simply 102accumulate annotations that will end up in 'ApiAnns' at the end 103(after they are passed to Map.fromList). 104 105The 'comment_q' field captures comments as they are seen in the token stream, 106so that when they are ready to be allocated via the parser they are 107available (at the time we lex a comment, we don't know what the enclosing 108AST node of it is, so we can't associate it with a SrcSpan in 109annotations_comments). 110 111PARSER EMISSION OF ANNOTATIONS 112 113The parser interacts with the lexer using the function 114 115> addAnnotation :: SrcSpan -> AnnKeywordId -> SrcSpan -> P () 116 117which takes the AST element SrcSpan, the annotation keyword and the 118target SrcSpan. 119 120This adds the annotation to the `annotations` field of `PState` and 121transfers any comments in `comment_q` WHICH ARE ENCLOSED by 122the SrcSpan of this element to the `annotations_comments` 123field. (Comments which are outside of this annotation are deferred 124until later. 'allocateComments' in 'Lexer' is responsible for 125making sure we only attach comments that actually fit in the 'SrcSpan'.) 126 127The wiki page describing this feature is 128https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations 129 130-} 131-- --------------------------------------------------------------------- 132 133-- If you update this, update the Note [Api annotations] above 134type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan] 135 , Map.Map SrcSpan [Located AnnotationComment]) 136 137-- If you update this, update the Note [Api annotations] above 138type ApiAnnKey = (SrcSpan,AnnKeywordId) 139 140 141-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan' 142-- of the annotated AST element, and the known type of the annotation. 143getAnnotation :: ApiAnns -> SrcSpan -> AnnKeywordId -> [SrcSpan] 144getAnnotation (anns,_) span ann 145 = case Map.lookup (span,ann) anns of 146 Nothing -> [] 147 Just ss -> ss 148 149-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan' 150-- of the annotated AST element, and the known type of the annotation. 151-- The list is removed from the annotations. 152getAndRemoveAnnotation :: ApiAnns -> SrcSpan -> AnnKeywordId 153 -> ([SrcSpan],ApiAnns) 154getAndRemoveAnnotation (anns,cs) span ann 155 = case Map.lookup (span,ann) anns of 156 Nothing -> ([],(anns,cs)) 157 Just ss -> (ss,(Map.delete (span,ann) anns,cs)) 158 159-- |Retrieve the comments allocated to the current 'SrcSpan' 160-- 161-- Note: A given 'SrcSpan' may appear in multiple AST elements, 162-- beware of duplicates 163getAnnotationComments :: ApiAnns -> SrcSpan -> [Located AnnotationComment] 164getAnnotationComments (_,anns) span = 165 case Map.lookup span anns of 166 Just cs -> cs 167 Nothing -> [] 168 169-- |Retrieve the comments allocated to the current 'SrcSpan', and 170-- remove them from the annotations 171getAndRemoveAnnotationComments :: ApiAnns -> SrcSpan 172 -> ([Located AnnotationComment],ApiAnns) 173getAndRemoveAnnotationComments (anns,canns) span = 174 case Map.lookup span canns of 175 Just cs -> (cs,(anns,Map.delete span canns)) 176 Nothing -> ([],(anns,canns)) 177 178-- -------------------------------------------------------------------- 179 180-- | API Annotations exist so that tools can perform source to source 181-- conversions of Haskell code. They are used to keep track of the 182-- various syntactic keywords that are not captured in the existing 183-- AST. 184-- 185-- The annotations, together with original source comments are made 186-- available in the @'pm_annotations'@ field of @'GHC.ParsedModule'@. 187-- Comments are only retained if @'Opt_KeepRawTokenStream'@ is set in 188-- @'DynFlags.DynFlags'@ before parsing. 189-- 190-- The wiki page describing this feature is 191-- https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations 192-- 193-- Note: in general the names of these are taken from the 194-- corresponding token, unless otherwise noted 195-- See note [Api annotations] above for details of the usage 196data AnnKeywordId 197 = AnnAnyclass 198 | AnnAs 199 | AnnAt 200 | AnnBang -- ^ '!' 201 | AnnBackquote -- ^ '`' 202 | AnnBy 203 | AnnCase -- ^ case or lambda case 204 | AnnClass 205 | AnnClose -- ^ '\#)' or '\#-}' etc 206 | AnnCloseB -- ^ '|)' 207 | AnnCloseBU -- ^ '|)', unicode variant 208 | AnnCloseC -- ^ '}' 209 | AnnCloseQ -- ^ '|]' 210 | AnnCloseQU -- ^ '|]', unicode variant 211 | AnnCloseP -- ^ ')' 212 | AnnCloseS -- ^ ']' 213 | AnnColon 214 | AnnComma -- ^ as a list separator 215 | AnnCommaTuple -- ^ in a RdrName for a tuple 216 | AnnDarrow -- ^ '=>' 217 | AnnDarrowU -- ^ '=>', unicode variant 218 | AnnData 219 | AnnDcolon -- ^ '::' 220 | AnnDcolonU -- ^ '::', unicode variant 221 | AnnDefault 222 | AnnDeriving 223 | AnnDo 224 | AnnDot -- ^ '.' 225 | AnnDotdot -- ^ '..' 226 | AnnElse 227 | AnnEqual 228 | AnnExport 229 | AnnFamily 230 | AnnForall 231 | AnnForallU -- ^ Unicode variant 232 | AnnForeign 233 | AnnFunId -- ^ for function name in matches where there are 234 -- multiple equations for the function. 235 | AnnGroup 236 | AnnHeader -- ^ for CType 237 | AnnHiding 238 | AnnIf 239 | AnnImport 240 | AnnIn 241 | AnnInfix -- ^ 'infix' or 'infixl' or 'infixr' 242 | AnnInstance 243 | AnnLam 244 | AnnLarrow -- ^ '<-' 245 | AnnLarrowU -- ^ '<-', unicode variant 246 | AnnLet 247 | AnnMdo 248 | AnnMinus -- ^ '-' 249 | AnnModule 250 | AnnNewtype 251 | AnnName -- ^ where a name loses its location in the AST, this carries it 252 | AnnOf 253 | AnnOpen -- ^ '(\#' or '{-\# LANGUAGE' etc 254 | AnnOpenB -- ^ '(|' 255 | AnnOpenBU -- ^ '(|', unicode variant 256 | AnnOpenC -- ^ '{' 257 | AnnOpenE -- ^ '[e|' or '[e||' 258 | AnnOpenEQ -- ^ '[|' 259 | AnnOpenEQU -- ^ '[|', unicode variant 260 | AnnOpenP -- ^ '(' 261 | AnnOpenPE -- ^ '$(' 262 | AnnOpenPTE -- ^ '$$(' 263 | AnnOpenS -- ^ '[' 264 | AnnPackageName 265 | AnnPattern 266 | AnnProc 267 | AnnQualified 268 | AnnRarrow -- ^ '->' 269 | AnnRarrowU -- ^ '->', unicode variant 270 | AnnRec 271 | AnnRole 272 | AnnSafe 273 | AnnSemi -- ^ ';' 274 | AnnSimpleQuote -- ^ ''' 275 | AnnSignature 276 | AnnStatic -- ^ 'static' 277 | AnnStock 278 | AnnThen 279 | AnnThIdSplice -- ^ '$' 280 | AnnThIdTySplice -- ^ '$$' 281 | AnnThTyQuote -- ^ double ''' 282 | AnnTilde -- ^ '~' 283 | AnnType 284 | AnnUnit -- ^ '()' for types 285 | AnnUsing 286 | AnnVal -- ^ e.g. INTEGER 287 | AnnValStr -- ^ String value, will need quotes when output 288 | AnnVbar -- ^ '|' 289 | AnnVia -- ^ 'via' 290 | AnnWhere 291 | Annlarrowtail -- ^ '-<' 292 | AnnlarrowtailU -- ^ '-<', unicode variant 293 | Annrarrowtail -- ^ '->' 294 | AnnrarrowtailU -- ^ '->', unicode variant 295 | AnnLarrowtail -- ^ '-<<' 296 | AnnLarrowtailU -- ^ '-<<', unicode variant 297 | AnnRarrowtail -- ^ '>>-' 298 | AnnRarrowtailU -- ^ '>>-', unicode variant 299 | AnnEofPos 300 deriving (Eq, Ord, Data, Show) 301 302instance Outputable AnnKeywordId where 303 ppr x = text (show x) 304 305-- --------------------------------------------------------------------- 306 307data AnnotationComment = 308 -- Documentation annotations 309 AnnDocCommentNext String -- ^ something beginning '-- |' 310 | AnnDocCommentPrev String -- ^ something beginning '-- ^' 311 | AnnDocCommentNamed String -- ^ something beginning '-- $' 312 | AnnDocSection Int String -- ^ a section heading 313 | AnnDocOptions String -- ^ doc options (prune, ignore-exports, etc) 314 | AnnLineComment String -- ^ comment starting by "--" 315 | AnnBlockComment String -- ^ comment in {- -} 316 deriving (Eq, Ord, Data, Show) 317-- Note: these are based on the Token versions, but the Token type is 318-- defined in Lexer.x and bringing it in here would create a loop 319 320instance Outputable AnnotationComment where 321 ppr x = text (show x) 322 323-- | - 'ApiAnnotation.AnnKeywordId' : 'ApiAnnotation.AnnOpen', 324-- 'ApiAnnotation.AnnClose','ApiAnnotation.AnnComma', 325-- 'ApiAnnotation.AnnRarrow' 326-- 'ApiAnnotation.AnnTilde' 327-- - May have 'ApiAnnotation.AnnComma' when in a list 328type LRdrName = Located RdrName 329 330 331-- | Certain tokens can have alternate representations when unicode syntax is 332-- enabled. This flag is attached to those tokens in the lexer so that the 333-- original source representation can be reproduced in the corresponding 334-- 'ApiAnnotation' 335data IsUnicodeSyntax = UnicodeSyntax | NormalSyntax 336 deriving (Eq, Ord, Data, Show) 337 338-- | Convert a normal annotation into its unicode equivalent one 339unicodeAnn :: AnnKeywordId -> AnnKeywordId 340unicodeAnn AnnForall = AnnForallU 341unicodeAnn AnnDcolon = AnnDcolonU 342unicodeAnn AnnLarrow = AnnLarrowU 343unicodeAnn AnnRarrow = AnnRarrowU 344unicodeAnn AnnDarrow = AnnDarrowU 345unicodeAnn Annlarrowtail = AnnlarrowtailU 346unicodeAnn Annrarrowtail = AnnrarrowtailU 347unicodeAnn AnnLarrowtail = AnnLarrowtailU 348unicodeAnn AnnRarrowtail = AnnRarrowtailU 349unicodeAnn AnnOpenB = AnnOpenBU 350unicodeAnn AnnCloseB = AnnCloseBU 351unicodeAnn AnnOpenEQ = AnnOpenEQU 352unicodeAnn AnnCloseQ = AnnCloseQU 353unicodeAnn ann = ann 354 355 356-- | Some template haskell tokens have two variants, one with an `e` the other 357-- not: 358-- 359-- > [| or [e| 360-- > [|| or [e|| 361-- 362-- This type indicates whether the 'e' is present or not. 363data HasE = HasE | NoE 364 deriving (Eq, Ord, Data, Show) 365