1# Fossil and the CAP Theorem 2 3[The CAP theorem][cap] is a fundamental mathematical proof about 4distributed systems. A software system can no more get around it than a 5physical system can get past *c*, the [speed of light][sol] constant. 6 7Fossil is a distributed system, so it can be useful to think about it in 8terms of the CAP theorem. We won’t discuss the theorem itself or how you 9reason using its results here. For that, we recommend [this article][tut]. 10 11[cap]: https://en.wikipedia.org/wiki/CAP_theorem 12[sol]: https://en.wikipedia.org/wiki/Speed_of_light 13[tut]: https://www.ibm.com/cloud/learn/cap-theorem 14 15 16<a id="ap"></a> 17## Fossil Is an AP-Mode System 18 19As with all common [DVCSes][dvcs], Fossil is an AP-mode system, meaning 20that your local clone isn’t necessarily consistent with all other clones 21(C), but the system is always available for use (A) and 22partition-tolerant (P). This is what allows you to turn off Fossil’s 23autosync mode, go off-network, and continue working with Fossil, even 24though only a single node (your local repo clone) is accessible at the 25time. 26 27You may consider that going back online restores “C”, because upon sync, 28you’re now consistent with the repo you cloned from. But, if another 29user has gone offline in the meantime, and they’ve made commits to their 30disconnected repo, *you* aren’t consistent with *them.* Besides which, 31if another user commits to the central repo, that doesn’t push the 32change down to you automatically: even if all users of a Fossil system 33are online at the same instant, and they’re all using autosync, Fossil 34doesn’t guarantee consistency across the network. 35 36There’s no getting around the CAP theorem! 37 38[dvcs]: https://en.wikipedia.org/wiki/Distributed_version_control 39 40 41<a id="ca"></a> 42## CA-Mode Fossil 43 44What would it mean to redesign Fossil to be CA-mode? 45 46It means we get a system that is always consistent (C) and available (A) 47as long as there are no partitions (P). 48 49That’s basically [CVS] and [Subversion][svn]: you can only continue 50working with the repository itself as long as your connection to the central repo server functions. 51 52It’s rather trivial to talk about single-point-of-failure systems like 53CVS or Subversion as 54CA-mode. Another common example used this way is a classical RDBMS, but 55aren’t we here to talk about distributed systems? What’s a good example 56of a *distributed* CA-mode system? 57 58A better example is [Kafka], which in its default configuration assumes 59it being run on a corporate LAN in a single data center, so network 60partitions are exceedingly rare. It therefore sacrifices partition 61tolerance to get the advantages of CA-mode operation. In its particular application of 62this mode, a 63message isn’t “committed” until all running brokers have a copy of it, 64at which point the message becomes visible to the client(s). In that 65way, all clients always see the same message store as long as all of the 66Kafka servers are up and communicating. 67 68How would that work in Fossil terms? 69 70If there is only one central server and I clone it on my local laptop, 71then CA mode means I can only commit if the remote Fossil is available, 72so in that sense, it devolves to the old CVS model. 73 74What if there are three clones? Perhaps there is a central server *A*, 75the clone *B* on my laptop, and the clone *C* on your laptop. Doesn’t CA 76mode now mean that my commit on *B* doesn’t exist after I commit it to 77the central repo *A* until you, my coworker, *also* pull down the copy 78of that commit to your laptop *C*, validating the commit through the 79network? 80 81That’s one way to design the system, but another way would be to scope 82the system to only talk about proper *servers*, not about the clients. 83In that model, a CA-mode Fossil alternative might require 2+ servers to 84be running for proper replication. When I make a commit, if all of the 85configured servers aren’t online, I can’t commit. This is basically CVS 86with replication, but without any useful amount of failover. 87 88[CVS]: https://en.wikipedia.org/wiki/Concurrent_Versions_System 89[Kafka]: https://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka 90[svn]: https://en.wikipedia.org/wiki/Apache_Subversion 91 92 93<a id="cp"></a> 94## CP-Mode Fossil 95 96What if we modify our CA-mode system above with “warm spares”? We can 97say that commits must go to all of the spares as well as the active 98servers, but a loss of one active server requires that one warm spare 99come into active state, and all of the clients learn that the spare is 100now considered “active.” At this point, you have a CP-mode system, not a 101CA-mode system, because it’s now partition-tolerant (P) but it becomes 102unavailable when there aren’t enough active servers or warm 103spares to promote to active status. 104 105CP is your classical [BFT] style distributed consensus system, where the 106system is available only if the client can contact a *majority* of the 107servers. This is a formalization of the warm spare concept above: with 108*N* server nodes, you need at least ⌊*N* / 2⌋ + 1 of them to be online 109for a commit to succeed. 110 111Many distributed database systems run in CP mode because consistency (C) and 112partition-tolerance (P) is a useful combination. What you lose is 113always-available (A) operation: with a suitably bad partition, the 114system goes down for users on the small side of that partition. 115 116An optional CP mode for Fossil would be attractive in some ways since in 117some sense Fossil is a distributed DBMS, but in practical terms, it 118means Fossil would then not be a [DVCS] in the most useful sense, being 119that you could work while your client is disconnected from the remote 120Fossil it cloned from. 121 122A fraught question is whether the non-server Fossil clones count as 123“nodes” in this sense. 124 125If they do count, then if there are only two systems, the central server 126and the clone on my laptop, then it stands to reason from the formula 127above that I can only commit if the central server is available. In that 128scheme, a CP-mode Fossil is basically like CVS. 129 130But what happens if my company hires a coworker to help me with the 131project, and this person makes their own clone of the central repo? The 132equation says I still need 2 nodes to be available for a commit, so if 133my new coworker goes off-network, that doesn’t affect whether I can make 134commits. Likewise, if I go off-network, my coworker can make commits to 135the central server. 136 137But what happens if the central server goes down? The equation says we 138still have 2 nodes, so we should be able to commit, right? Sure, but 139only if my laptop and communicate directly to my coworker’s laptop! If 140it can’t, that’s also a network partition, so *N=1* on both sides in 141that case. The implication is that for a true CP-mode Fossil, we’d need 142some kind of peer-to-peer networking layer so that our laptops can 143accept commits from the other, so that when the central server comes 144online, one of us can send the results up to it to get it caught up. 145 146But doesn’t that then mean there is no security? How does [Fossil’s RBAC 147system][caps] work if peer-to-peer commits are allowed? 148 149You can instead reconceptualize the system as “node” meaning only server 150nodes, so that client-only systems don’t count. This allows you to have 151an RBAC system again. 152 153With just one central server, ⌊1/2⌋+1=1, so you get CVS-like behavior: 154if the server’s up, you can commit. 155 156If you set up 2 servers for redundancy, both must be up for commits to 157be allowed, since otherwise you could end up with half the commits going 158to the server on one side of a network partition, half going to the 159other, and no way to arbitrate among the two once the partition is 160lifted. 161 162(Today’s AP-mode Fossil has this capability, but the necessary cost is 163“C”, consistency! Once again, you can’t get around the CAP theorem.) 164 1653 servers is more sensible: any client that can see at least 2 of them 166can commit. 167 168Will there ever be a CP-mode Fossil? This author doubts it, but as I’ve 169shown, it would be useful in contexts where you’d rather have a 170guarantee of consistency than availability. 171 172[BFT]: https://en.wikipedia.org/wiki/Byzantine_fault 173[caps]: ./caps/ 174