1\name{xmlHashTree} 2\alias{xmlHashTree} 3\title{Constructors for trees stored as flat list of nodes with 4 information about parents and children.} 5\description{ 6 7 These (and related internal) functions allow us to represent trees as 8 a simple, non-hierarchical collection of nodes along with 9 corresponding tables that identify the parent and child relationships. 10 This is different from representing a tree as a list of lists of lists 11 ... in which each node has a list of its own children. In a 12 functional language like R, it is not possible then for the children 13 to be able to identify their parents. 14 15 We use an environment to represent these flat trees. Since these are 16 mutable without requiring the change to be reassigned, we can modify a 17 part of the tree locally without having to reassign the top-level 18 object. 19 20 We can use either a list (with names) to store the nodes or a hash 21 table/associative array that uses names. There is a non-trivial 22 performance difference. 23} 24\usage{ 25xmlHashTree(nodes = list(), parents = character(), children = list(), 26 env = new.env(TRUE, parent = emptyenv())) 27} 28\arguments{ 29 \item{nodes}{ a collection of existing nodes that are to be added to 30 the tree. These are used to initialize the tree. If this is 31 specified, you must also specify \code{children} and \code{parents}. 32 } 33 \item{parents}{ the parent relationships for the nodes given by \code{nodes}.} 34 \item{children}{the children relationships for the nodes given by \code{nodes}.} 35 \item{env}{an environment in which the information for the tree will 36 be stored. This is essentially the tree object as it allows us to 37 modify parts of the tree without having to reassign the top-level 38 object. Unlike most R data types, environments are mutable. 39 } 40} 41 42\value{ 43 An \code{xmlHashTree} object has an accessor method via 44 \code{$} for accessing individual nodes within the tree. 45 One can use the node name/identifier in an expression such as 46 \code{tt$myNode} to obtain the element. 47 The name of a node is either its XML node name or if that is already 48 present in the tree, a machine generated name. 49 50 One can find the names of all the nodes using the 51 \code{objects} function since these trees are regular 52 environments in R. 53 Using the \code{all = TRUE} argument, one can also find the 54 \dQuote{hidden} elements that make define the tree's structure. 55 These are \code{.children} and \code{.parents}. 56 The former is an (hashed) environment. Each element is identified by the 57 node in the tree by the node's identifier (corresponding to the 58 name of the node in the tree's environment). 59 The value of that element is simply a character vector giving the 60 identifiers of all of the children of that node. 61 62 The \code{.parents} element is also an environemnt. 63 Each element in this gives the pair of node and parent identifiers 64 with the parent identifier being the value of the variable in the 65 environment. In other words, we look up the parent of a node 66 named 'kid' by retrieving the value of the variable 'kid' in the 67 \code{.parents} environment of this hash tree. 68 69 The function \code{.addNode} is used to insert a new node into the 70 tree. 71 72 The structure of this tree allows one to easily travers all nodes, 73 navigate up the tree from a node via its parent. Certain tasks are 74 more complex as the hierarchy is not implicit within a node. 75} 76\references{\url{http://www.w3.org/XML}} 77\author{ Duncan Temple Lang } 78 79\seealso{ 80 \code{\link{xmlTreeParse}} 81 \code{\link{xmlTree}} 82 \code{\link{xmlOutputBuffer}} 83 \code{\link{xmlOutputDOM}} 84} 85\examples{ 86 f = system.file("exampleData", "dataframe.xml", package = "XML") 87 tr = xmlHashTree() 88 xmlTreeParse(f, handlers = list(.startElement = tr[[".addNode"]])) 89 90 tr # print the tree on the screen 91 92 # Get the two child nodes of the dataframe node. 93 xmlChildren(tr$dataframe) 94 95 # Find the names of all the nodes. 96 objects(tr) 97 # Which nodes have children 98 objects(tr$.children) 99 100 # Which nodes are leaves, i.e. do not have children 101 setdiff(objects(tr), objects(tr$.children)) 102 103 # find the class of each of these leaf nodes. 104 sapply(setdiff(objects(tr), objects(tr$.children)), 105 function(id) class(tr[[id]])) 106 107 # distribution of number of children 108 sapply(tr$.children, length) 109 110 111 # Get the first A node 112 tr$A 113 114 # Get is parent node. 115 xmlParent(tr$A) 116 117 118 f = system.file("exampleData", "allNodeTypes.xml", package = "XML") 119 120 # Convert the document 121 r = xmlInternalTreeParse(f, xinclude = TRUE) 122 ht = as(r, "XMLHashTree") 123 ht 124 125 # work on the root node, or any node actually 126 as(xmlRoot(r), "XMLHashTree") 127 128 # Example of making copies of an XMLHashTreeNode object to create a separate tree. 129 f = system.file("exampleData", "simple.xml", package = "XML") 130 tt = as(xmlParse(f), "XMLHashTree") 131 132 xmlRoot(tt)[[1]] 133 xmlRoot(tt)[[1, copy = TRUE]] 134 135 table(unlist(eapply(tt, xmlName))) 136 # if any of the nodes had any attributes 137 # table(unlist(eapply(tt, xmlAttrs))) 138} 139\keyword{IO} 140\concept{XML} 141