1--- 2title: "Case study: converting a Shiny app to async" 3author: Joe Cheng (joe@rstudio.com) 4output: rmarkdown::html_vignette 5vignette: > 6 %\VignetteIndexEntry{8. Case study} 7 %\VignetteEncoding{UTF-8} 8 %\VignetteEngine{knitr::rmarkdown} 9--- 10 11In this case study, we'll work through an application of reasonable complexity, turning its slowest operations into futures/promises and modifying all the downstream reactive expressions and outputs to deal with promises. 12 13## Motivation 14 15> As a web service increases in popularity, so does the number of rogue scripts that abuse it for no apparent reason. 16> 17> _—Cheng's Law of Why We Can't Have Nice Things_ 18 19I first noticed this in 2011, when the then-new RStudio IDE was starting to gather steam. We had a dashboard that tracked how often RStudio was being downloaded, and the numbers were generally tracking smoothly upward. But once every few months, we'd have huge spikes in the download counts, ten times greater than normal—and invariably, we'd find that all of the unexpected increase could be tracked to one or two IP addresses. 20 21For hours or days we'd be inundated with thousands of downloads per hour, then just as suddenly, they'd cease. I didn't know what was happening then, and I still don't know today. Was it the world's least competent denial-of-service attempt? Did someone write a download script with an accidental `while (TRUE)` around it? 22 23Our application will let us examine downloads from CRAN for this kind of behavior. For any given day on CRAN, we'll see what the top downloaders are and how they're behaving. 24 25## Our source data 26 27RStudio maintains the popular `0-Cloud` CRAN mirror, and the log files it generates are freely available at http://cran-logs.rstudio.com/. Each day is a separate gzipped CSV file, and each row is a single package download. For privacy, IP addresses are anonymized by substituting each day's IP addresses with unique integer IDs. 28 29Here are the first few lines of http://cran-logs.rstudio.com/2018/2018-05-26.csv.gz : 30 31``` 32"date","time","size","r_version","r_arch","r_os","package","version","country","ip_id" 33"2018-05-26","20:42:23",450377,"3.4.4","x86_64","linux-gnu","lubridate","1.7.4","NL",1 34"2018-05-26","20:42:30",484348,NA,NA,NA,"homals","0.9-7","GB",2 35"2018-05-26","20:42:21",98484,"3.3.1","x86_64","darwin13.4.0","miniUI","0.1.1.1","NL",1 36"2018-05-26","20:42:27",518,"3.4.4","x86_64","linux-gnu","RCurl","1.95-4.10","US",3 37``` 38 39Fortunately for our purposes, there's no need to analyze these logs at a high level to figure out which days are affected by badly behaved download scripts. These CRAN mirrors are popular enough that, according to Cheng's Law, there should be plenty of rogue scripts hitting it every day of the year. 40 41## A tour of the app 42 43The app I built to explore this data, **cranwhales**, let us examine the behavior of the top downloaders ("whales") for any given day, at varying levels of detail. You can view this app live at https://gallery.shinyapps.io/cranwhales/, or download and run the code yourself at https://github.com/rstudio/cranwhales. 44 45When the app starts, the "All traffic" tab shows you the number of package downloads per hour for all users vs. whales. In this screenshot, you can see the proportion of files downloaded by the top six downloaders on May 28, 2018. It may not look like a huge fraction at first, but keep in mind, we are only talking about six downloaders out of 52,815 total! 46 47![](case-study-tab1.png) 48 49The "Biggest whales" tab simply shows the most prolific downloaders, with their number of downloads performed. Each anonymized IP address has been assigned an easier-to-remember name, and you can also see the country code of the original IP address. 50 51![](case-study-tab2.png) 52 53The "Whales by hour" tab shows the hourly download counts for each whale individually. In this screenshot, you can see that the Netherlands' `relieved_snake` downloaded at an extremely consistent rate during the whole day, while the American `curly_capabara` was active only during business hours in Eastern Standard Time. Still others, like `colossal_chicken` out of Hong Kong, was busy all day but at varying rates. 54 55![](case-study-tab3.png) 56 57The "Detail View" has perhaps the most illuminating information. It lets you view every download made by a given whale on the day in question. The x dimension is time and the y dimension is what package they downloaded, so you can see at a glance exactly how many packages were downloaded, and how their various package downloads relate to each other. In this case, `relieved_snake` downloaded 104 different packages, in the same order, continuously, for the entire day. 58 59![](case-study-tab4.png) 60 61Others behave very differently, like `freezing_tapir`, who downloaded `devtools`--and _only_ `devtools`--for the whole day, racking up 19,180 downloads totalling 7.9 gigabytes for that one package alone! 62 63![](case-study-tab5.png) 64 65Sadly, the app can't tell us any more than that--it can't explain _why_ these downloaders are behaving this way, nor can it tell us their street addresses so that we can send ninjas in black RStudio helicopters to make them stop. 66 67## The implementation 68 69Now that you've seen what the app does, let's talk about how it was implemented, then convert it from sync to async. 70 71### User interface 72 73The user interface is a pretty typical shinydashboard. It's important to note that the UI part of the app is entirely agnostic to whether the server is written in the sync or async style; when we port the app to async, we won't touch the UI at all. 74 75There are two major pieces of input we need from users: what **date** to examine (this app only lets us look at one day at a time) and **how many** of the most prolific downloaders to look at. We'll put these two controls in the dashboard sidebar. 76 77```r 78dashboardSidebar( 79 dateInput("date", "Date", value = Sys.Date() - 2), 80 numericInput("count", "Show top N downloaders:", 6) 81) 82``` 83 84(We set `date` to two days ago by default, because there's some lag between when a day ends and when its logs are published.) 85 86The rest of the UI code is just typical shinydashboard scaffolding, plus some `shinydashboard::valueBoxOutput`s and `plotOutputs`. These are so trivial that they're hardly worth talking about, but I'll include the code here for completeness. Finally, there's `detailViewUI`, a [Shiny module](https://shiny.rstudio.com/articles/modules.html) that just contains more of the same (value boxes and plots). 87 88```r 89 dashboardBody( 90 fluidRow( 91 tabBox(width = 12, 92 tabPanel("All traffic", 93 fluidRow( 94 valueBoxOutput("total_size", width = 4), 95 valueBoxOutput("total_count", width = 4), 96 valueBoxOutput("total_downloaders", width = 4) 97 ), 98 plotOutput("all_hour") 99 ), 100 tabPanel("Biggest whales", 101 plotOutput("downloaders", height = 500) 102 ), 103 tabPanel("Whales by hour", 104 plotOutput("downloaders_hour", height = 500) 105 ), 106 tabPanel("Detail view", 107 detailViewUI("details") 108 ) 109 ) 110 ) 111 ) 112``` 113 114### Server logic 115 116Based on these inputs and outputs, we'll write a variety of reactive expressions and output renderers to download, manipulate, and visualize the relevant log data. 117 118The reactive expressions: 119 120* `data` (`eventReactive`): Whenever `input$date` changes, the `data` reactive downloads the full log for that day from http://cran-logs.rstudio.com, and parses it. 121* `whales` (`reactive`): Reads from `data()`, tallies the number of downloads performed by each unique IP, and returns a data frame of the top `input$count` most prolific downloaders, along with their download counts. 122* `whale_downloads` (`reactive`): Joins the `data()` and `whales()` data frames, to return all of the details of the cetacean downloads. 123 124The `whales` reactive expression depends on `data`, and `whale_downloads` depends on `data` and `whales`. 125 126![](case-study-react.png) 127 128The outputs in this app are mostly either `renderPlot`s that we populate with `ggplot2`, or `shinydashboard::renderValueBox`es. They all rely on one or more of the reactive expressions we just described. We won't catalog them all here, as they're not individually interesting, but we will look at some archetypes below. 129 130## Improving performance and scalability 131 132While this article is specifically about async, this is a good time to remind you that there are lots of ways to improve the performance of a Shiny app. Async is just one tool in the toolbox, and before reaching for that hammer, take a moment to consider your other options: 133 1341. Have I used [profvis](https://rstudio.github.io/profvis/) to **profile my code** and determine what's actually taking so long? (Human intuition is a notoriously bad profiler!) 1352. Can I perform any **calculations, summarizations, and aggregations offline**, when my Shiny app isn't even running, and save the results to .rds files to be read by the app? 1363. Are there any opportunities to **cache**--that is, save the results of my calculations and use them if I get the same request later? (See [memoise](https://cran.r-project.org/package=memoise), or roll your own.) 1374. Am I effectively leveraging [reactive programming](https://rstudio.com/resources/shiny-dev-con/reactivity-pt-1-joe-cheng/) to make sure my reactives are doing as little work as possible? 1385. When deploying my app, am I load balancing across multiple R processes and/or multiple servers? ([Shiny Server Pro](https://docs.rstudio.com/shiny-server/#utilization-scheduler), [RStudio Connect](https://docs.rstudio.com/connect/admin/appendix/configuration/#Scheduler), [ShinyApps.io](https://shiny.rstudio.com/articles/scaling-and-tuning.html)) 139 140These options are more generally useful than using async techniques because they can dramatically speed up the performance of an app even if only a single user is using it. While it obviously depends on the particulars of the app itself, a few lines of precomputation or caching logic can often lead to 10X-100X better performance. Async, on the other hand, generally doesn't help make a single session faster. Instead, it helps a single Shiny process support more concurrent sessions without getting bogged down. 141 142Async can be an essential tool when there is no way around performing expensive tasks (i.e. taking multiple seconds) while the user waits. For example, an app that analyzes any user-specified Twitter profile may get too many unique queries (assuming most people specify their own Twitter handle) for caching to be much help. And applications that invite users to upload their own datasets won't have an opportunity to do any offline summarizing in advance. If you need to run apps like that and support lots of concurrent users, async can be a huge help. 143 144In that sense, the cranwhales app isn't a perfect example, because it has lots of opportunities for precomputation and caching that we'll willfully ignore today so that I can better illustrate the points I want to make about async. When you're working on your own app, though, please think carefully about _all_ of the different techniques you have for improving performance. 145 146## Converting to async 147 148To quote the article [*Using promises with Shiny*](https://rstudio.github.io/promises/articles/shiny.html), async programming with Shiny boils down to following a few steps: 149 1501. Identify slow operations in your app. 1512. Convert the slow operations into futures. 1523. Any code that relies on the result of those operations (if any), whether directly or indirectly, now must be converted to promise handlers that operate on the future object. 153 154In this case, the slow operations are easy to identify: the downloading and parsing that takes place in the `data` reactive expression can each take several long seconds. 155 156Converting the download and parsing operations into futures turns out to be the most complicated part of the process, for reasons we'll get into later. 157 158Assuming we do that successfully, the `data` reactive expression will no longer return a data frame, but a `promise` object (that resolves to a data frame). Since the `whales` and `whale_downloads` reactive expressions both rely on `data`, those will both also need to be converted to read and return `promise` objects. And therefore, because the outputs all rely on one or more reactive expressions, they will all need to know how to deal with `promise` objects. 159 160Async code is infectious like that; once you turn the heart of your app into a promise, everything downstream must become promise-aware as well, all the way through to the observers and outputs. 161 162With that overview out of the way, let's dive into the code. 163 164In the sections below, we'll take a look at the code behind some outputs and reactive expressions. For each element, we'll look first at the sync version, then the async version. 165 166In some cases, these code snippets may be slightly abridged. See the [GitHub repository](https://github.com/rstudio/cranwhales) for the full code. 167 168Until you've received an introduction to the `%...>%` operator, the async code below will make no sense, so if you haven't read [*An informal intro to async programming*](https://rstudio.github.io/promises/articles/intro.html) and/or [*Working with promises in R*](https://rstudio.github.io/promises/articles/overview.html), I highly recommend doing so before continuing! 169 170### Loading `promises` and `future` 171 172The first thing we'll do is load the basic libraries of async programming. 173 174```r 175library(promises) 176library(future) 177plan(multisession) 178``` 179 180I originally used `multiprocess` but file downloading inside a future seemed to fail on Mac. (I've found that it's usually not worth spending a lot of time trying to figure out why `multiprocess` doesn't work for some specific code; instead, just use `multisession`, since that's probably going to be the solution anyway.) 181 182### The `data` reactive: future_promise() all the things 183 184The next thing we'll do is convert the `data` event reactive to use `future` for the expensive bits. The original code looks lke this: 185 186```r 187# SYNCHRONOUS version 188 189data <- eventReactive(input$date, { 190 date <- input$date # Example: 2018-05-28 191 year <- lubridate::year(date) # Example: "2018" 192 193 url <- glue("http://cran-logs.rstudio.com/{year}/{date}.csv.gz") 194 path <- file.path("data_cache", paste0(date, ".csv.gz")) 195 196 withProgress(value = NULL, { 197 198 if (!file.exists(path)) { 199 setProgress(message = "Downloading data...") 200 download.file(url, path) 201 } 202 203 setProgress(message = "Parsing data...") 204 read_csv(path, col_types = "Dti---c-ci", progress = FALSE) 205 206 }) 207}) 208``` 209 210(Earlier, I said we wouldn't take advantage of precomputation or caching. That wasn't entirely true; in the code above, we cache the log files we download in a `data_cache` directory. I couldn't bring myself to put my internet connection through that level of abuse, as I knew I'd be running this code thousands of times as I load tested it.) 211 212For now, we'll lose the `withProgress`/`setProgress` reporting, since doing that correctly requires some more advanced techniques that we'll talk about later. We'll come back and fix this code later, but for now: 213 214```r 215# ASYNCHRONOUS version 216 217data <- eventReactive(input$date, { 218 date <- input$date 219 year <- lubridate::year(date) 220 221 url <- glue("http://cran-logs.rstudio.com/{year}/{date}.csv.gz") 222 path <- file.path("data_cache", paste0(date, ".csv.gz")) 223 224 future_promise({ 225 if (!file.exists(path)) { 226 download.file(url, path) 227 } 228 read_csv(path, col_types = "Dti---c-ci", progress = FALSE) 229 }) 230}) 231``` 232 233Pretty straightforward. This reactive now returns a future (which counts as a promise), not a data frame. 234 235Remember that we **must** read any reactive values (including `input`) and reactive expressions [from **outside** the future](https://rstudio.github.io/promises/articles/shiny.html#shiny-specific-caveats-and-limitations). (You will get an error if you attempt to read one from inside the future.) 236 237At this point, since there are no other long-running operations we want to make asynchronous, we're actually done interacting directly with the `future` package. The rest of the reactive expressions will deal with the future returned by `data` using general async functions and operators from `promises`. 238 239### The `whales` reactive: simple pipelines are simple 240 241The `whales` reactive takes the data frame from `data`, and uses dplyr to find the top `input$count` most prolific downloaders. 242 243```r 244# SYNCHRONOUS version 245 246whales <- reactive({ 247 data() %>% 248 count(ip_id) %>% 249 arrange(desc(n)) %>% 250 head(input$count) 251}) 252``` 253 254Since `data()` now returns a promise, the whole function needs to be modified to deal with promises. 255 256This is basically a best-case scenario for working with `promises`. The whole expression consists of a single magrittr pipeline. There's only one object (`data()`) that's been converted to a promise. The promise object only appears once, at the head of the pipeline. 257 258When the stars align like this, converting this code to async is literally as easy as replacing each `%>%` with `%...>%`: 259 260```r 261# ASYNCHRONOUS version 262 263whales <- reactive({ 264 data() %...>% 265 count(ip_id) %...>% 266 arrange(desc(n)) %...>% 267 head(input$count) 268}) 269``` 270 271The input (`data()`) is a promise, the resulting output object is a promise, each stage of the pipeline returns a promise; but we can read and write this code almost as easily as the synchronous version! 272 273An example this simple may seem reductive, but this best-case scenario happens surprisingly often, if your coding style is influenced by the tidyverse. In this example app, **59%** of the reactives, observers, and outputs were converted using nothing more than replacing `%>%` with `%...>%`. 274 275One last thing before we move on. In the last section, I emphasized that reactive values cannot be read from inside a future. Here, we're using `head(input$count)` inside a promise-pipeline; since `data()` is written using a future, doesn't that mean… well… isn't this wrong? 276 277Nope—this code is just fine. The prohibition is against reading reactive values/expressions from *inside* a future, because code inside a future is executed in a totally different R process. The steps in a promise-pipeline aren't futures, but promise handlers. These aren't executed in a different process; rather, they're executed back in the original R process after a promise is resolved. We're allowed and expected to access reactive values and expressions from these handlers. 278 279### The `whale_downloads` reactive: reading from multiple promises 280 281The `whale_downloads` reactive is a bit more complicated case. 282 283```r 284# SYNCHRONOUS version 285 286whale_downloads <- reactive({ 287 data() %>% 288 inner_join(whales(), "ip_id") %>% 289 select(-n) 290}) 291``` 292 293Looks simple, but we can't just do a simple replacement this time. Can you see why? 294 295```r 296# BAD VERSION DOESN'T WORK 297 298whale_downloads <- reactive({ 299 data() %...>% 300 inner_join(whales(), "ip_id") %...>% 301 select(-n) 302}) 303``` 304 305Remember, both `data()` and `whales()` now return a promise object, not a data frame. None of the dplyr verbs know how to deal with promises natively (and the same is true for almost every other R function, anywhere in the R universe). 306 307We're able to use `%...>%` with promises on the left-hand side and regular dplyr calls on the right-hand side, only because the `%...>%` operator "unwraps" the promise object for us, yielding a regular object (data frame or whatever) to be passed to dplyr. But in this case, we're passing `whales()`, which a promise object, directly to `inner_join`, and `inner_join` has no idea what to do with it. 308 309The fundamental thing to pattern-match on here, is that **we have a block of code that relies on more than one promise object**, and that means `%...>%` won't be enough. This is a pretty common situation as well, and occurs in **12%** of reactives and outputs in this example app. 310 311Here's what the real solution looks like: 312 313```r 314# ASYNCHRONOUS version 315 316whale_downloads <- reactive({ 317 promise_all(data_df = data(), whales_df = whales()) %...>% with({ 318 data_df %>% 319 inner_join(whales_df, "ip_id") %>% 320 select(-n) 321 }) 322}) 323``` 324 325#### Promises: the Gathering 326 327This solution uses the [promise gathering](https://rstudio.github.io/promises/articles/combining.html#gathering) pattern, which combines `promises_all`, `%...>%`, and `with`. 328 329* The `promise_all` function gathers multiple promise objects together, and returns a single promise object. This new promise object doesn't resolve until all the input promise objects are resolved, and it yields a list of those results. 330 331```r 332> promise_all(a = future_promise("Hello"), b = future_promise("World")) %...>% print() 333$a 334[1] "Hello" 335 336$b 337[1] "World" 338``` 339 340* The `%...>%`, as before, "unwraps" the promise object and passes the result to its right hand side. 341* The `with` function (from base R) takes a named list, and makes it into a sort of virtual parent environment while evaluating a code block you pass it. 342 343```r 344> x + y 345Error: object 'x' not found 346 347> with(list(x = 1, y = 2), { 348+ x + y 349+ }) 350[1] 3 351``` 352 353Let's once again combine the three, with the simplest possible example of the gathering pattern: 354 355```r 356> promise_all(x = future_promise("Hello"), y = future_promise("World")) %...>% 357+ with({ paste(x, y) }) %...>% 358+ print() 359[1] "Hello World" 360``` 361 362You can make use of this pattern without remembering exactly how these pieces combine. Just remember that the arguments to `promise_all` provide the promise objects (`future_promise(1)` and `future_promise(2)`), along with the names you want to use to refer to their yielded values (`x ` and `y`); and the code block you put in `with()` can refer to those names without worrying about the fact that they were ever promises to begin with. 363 364### The `total_downloaders` value box: simple pipelines are for output, too 365 366![](case-study-downloaders.png) 367 368All of the value boxes in this app ended up looking a lot like this: 369 370```r 371# SYNCHRONOUS version 372 373output$total_downloaders <- renderValueBox({ 374 data() %>% 375 pull(ip_id) %>% 376 unique() %>% 377 length() %>% 378 format(big.mark = ",") %>% 379 valueBox("unique downloaders") 380}) 381``` 382 383This is structurally no different than the `whales` best-case scenario reactive. One thing worth pointing out is that an async `renderValueBox` means you return a promise that returns a `valueBox`; you *don't* return a `valueBox` to whom you have passed a promise. 384 385Meaning, you *don't* do this: 386 387```r 388# BAD VERSION DOESN'T WORK 389 390output$total_downloaders <- renderValueBox({ 391 valueBox( 392 data() %...>% 393 pull(ip_id) %...>% 394 unique() %...>% 395 length() %...>% 396 format(big.mark = ","), 397 "unique downloaders" 398 ) 399}) 400``` 401 402Instead, you do this: 403 404```r 405# ASYNCHRONOUS version 406 407output$total_downloaders <- renderValueBox({ 408 data() %...>% 409 pull(ip_id) %...>% 410 unique() %...>% 411 length() %...>% 412 format(big.mark = ",") %...>% 413 valueBox("unique downloaders") 414}) 415``` 416 417The other trick worth nothing is the `pull` verb, which is used to retrieve a specific column of a data frame as a vector (similar to `$` or `[[`). In this case, `pull(data, ip_id)` is equivalent to `data[["ip_id"]]`. Note that `pull` is part of dplyr and isn't specific to promises. 418 419### The `biggest_whales` plot: getting untidy 420 421In a cruel twist of API design fate, one of the cornerstone packages of the tidyverse lacks a tidy API. I'm referring, of course, to `ggplot2`: 422 423```r 424# SYNCHRONOUS version 425 426output$downloaders <- renderPlot({ 427 whales() %>% 428 ggplot(aes(ip_name, n)) + 429 geom_bar(stat = "identity") + 430 ylab("Downloads on this day") 431}) 432``` 433 434While `dplyr` and other tidyverse packages are designed to link calls together with `%>%`, the older `ggplot2` package uses the `+` operator. This is mostly a small aesthetic wart when synchronous code, but it's a real problem with async, because the `promises` package doesn't currently have a promise-aware replacement for `+` like it does for `%>%`. 435 436Fortunately, there's a pretty good escape hatch for `%>%`, and `%...>%` inherited it too. Instead of a pipeline stage being a simple function call, you can put a `{` and `}` delimited code block, and inside of that code block, you can access the "it" value using a period (`.`). 437 438```r 439# ASYNCHRONOUS version 440 441output$downloaders <- renderPlot({ 442 whales() %...>% { 443 whale_df <- . 444 ggplot(whale_df, aes(ip_name, n)) + 445 geom_bar(stat = "identity") + 446 ylab("Downloads on this day") 447 } 448}) 449``` 450 451**The importance of this pattern cannot be overstated!** Using `%...>%` and simple calls alone, you're restricted to doing pipeline-compatible operations. But `%...>%` together with a curly-brace code block means your handler code can be any shape or size. Once inside that code block, you have a regular, non-promise value in `.` (if you even want to use it—sometimes you don't, as we'll see later). You can have zero, one, or more statements. You can use the `.` multiple times, in nested expressions, whatever. 452 453Tip: if you have extensive or complex code to put in a code block, start the block by creating a properly named variable to store the value of `.`. The reason for this is that `.` may acquire a different meaning than you intend as you add code to the code block. For example, if a magrittr pipeline starts with `.`, instead of evaluating the pipeline and returning a value, it creates a function that takes a single argument. So the following code wouldn't filter the resolved value of `whales()`, but instead, create an anonymous function that calls `filter(n > 1000)` on whatever you pass it. 454 455```r 456whales() %...>% { 457 . %>% filter(n > 1000) 458} 459``` 460 461This fixes it: 462 463```r 464whales() %...>% { 465 whales_df <- . 466 whales_df %>% filter(n > 1000) 467} 468``` 469 470There are other ways to work around the above problem as well, but I like this fix because it doesn't require any thought or care. Just give the `.` value a new name, and forget the `.` exists. 471 472For untidy code with a single promise object, just remember: pair a single `%...>%` with a code block and you should be able to do almost anything. 473 474### Revisiting the `data` reactive: progress support 475 476Now that we have discussed a few techniques for writing async code, let's come back to our original `data` event reactive, and this time do a more faithful async conversion that preserves the progress reporting functionality of the original. 477 478Again, here's the original sync code: 479 480```r 481# SYNCHRONOUS version 482 483data <- eventReactive(input$date, { 484 date <- input$date # Example: 2018-05-28 485 year <- lubridate::year(date) # Example: "2018" 486 487 url <- glue("http://cran-logs.rstudio.com/{year}/{date}.csv.gz") 488 path <- file.path("data_cache", paste0(date, ".csv.gz")) 489 490 withProgress(value = NULL, { 491 492 if (!file.exists(path)) { 493 setProgress(message = "Downloading data...") 494 download.file(url, path) 495 } 496 497 setProgress(message = "Parsing data...") 498 read_csv(path, col_types = "Dti---c-ci", progress = FALSE) 499 500 }) 501}) 502``` 503 504Progress reporting currently presents two challenges for future. 505 506First, the `withProgress({...})` function cannot be used with async. `withProgress` is designed to wrap a slow synchronous action, and dismisses its progress dialog when the block of code it wraps is done executing. Since the call to `future_promise()` will return immediately even though the actual task is far from done, using `withProgress` won't work; the progress dialog would be dismissed before the download even got going. 507 508It's conceivable that `withProgress` could gain promise compatibility someday, but it's not in Shiny v1.1.0. In the meantime, we can work around this by using the alternative, [object-oriented progress API](https://shiny.rstudio.com/reference/shiny/1.1.0/Progress.html) that Shiny offers. It's a bit more verbose and fiddly than `withProgress`/`setProgress`, but it is flexible enough to work with futures/promises. 509 510Second, progress messages can't be sent from futures. This is simply because futures are executed in child processes, which don't have direct access to the browser like the main Shiny process does. 511 512It's conceivable that `future` could gain the ability for child processes to communicate back to their parents, but no good solution exists at the time of this writing. In the meantime, we can work around this by taking the one future that does both downloading and parsing, and splitting it into two separate futures. After the download future has completed, we can send a progress message that parsing is beginning, and then start the parsing future. 513 514The regrettably complicated solution is below. 515 516```r 517# ASYNCHRONOUS version 518 519data <- eventReactive(input$date, { 520 date <- input$date 521 year <- lubridate::year(date) 522 523 url <- glue("http://cran-logs.rstudio.com/{year}/{date}.csv.gz") 524 path <- file.path("data_cache", paste0(date, ".csv.gz")) 525 526 p <- Progress$new() 527 p$set(value = NULL, message = "Downloading data...") 528 future_promise({ 529 if (!file.exists(path)) { 530 download.file(url, path) 531 } 532 }) %...>% 533 { p$set(message = "Parsing data...") } %...>% 534 { future_promise(read_csv(path, col_types = "Dti---c-ci", progress = FALSE)) } %>% 535 finally(~p$close()) 536}) 537``` 538 539The single future we wrote earlier has now become a pipeline of promises: 540 5411. future (download) 5422. send progress message 5433. future (parse) 5444. dismiss progress dialog 545 546Note that neither the R6 call `p$set(message = ...)` nor the second `future_promise()` call are tidy, so they use curly-brace blocks, as discussed in the above section about `biggest_whales`. 547 548The final step of dismissing the progress dialog doesn't use `%...>%` at all; because we want the progress dialog to dismiss whether the download and parse operations succeed or fail, we use the regular pipe `%>%` and `finally()` function instead. See the relevant section in [*Working with promises in R*](https://rstudio.github.io/promises/articles/overview.html#cleaning-up-with-finally) to learn more. 549 550With these changes in place, we've now covered all of the changes to the application. You can see the full changes side-by-side via [this GitHub diff](https://github.com/rstudio/cranwhales/compare/sync...async?diff=split). 551 552## Measuring scalability 553 554It was a fair amount of work to do the sync-to-async conversion. Now we'd like to know if the conversion to async had the desired effect: improved responsiveness (i.e. lower latency) when the number of simultaneous visitors increases. 555 556### Load testing with Shiny (coming soon) 557 558At the time of this writing, we are working on a suite of load testing tools for Shiny that is not publicly available yet, but was previewed by Sean Lopp during his [epic rstudio::conf 2018 talk](https://rstudio.com/resources/) about running a Shiny load test with 10,000 simulated concurrent users. 559 560You use these tools to easily **record** yourself using your Shiny app, which creates a test script; then **play back** that test script, but multiplied by dozens/hundreds/thousands of simulated concurrent users; and finally, **analyze** the timing data generated during the playback step to see what kind of latency the simulated users experienced. 561 562To examine the effects of my async refactor, I recorded a simple test script by loading up the app, waiting for the first tab to appear, then clicking through each of the other tabs, pausing for several seconds each time before moving on to the next. When using the app without any other visitors, the homepage fully loads in less than a second, and the initial loading of data and rendering of the plot on the default tab takes about 7 seconds. After that, each tab takes no more than a couple of seconds to load. Overall, the entire test script, including time where the user is thinking, takes about 40 seconds under ideal settings (i.e. only a single concurrent user). 563 564I then used this test script to generate load against the Shiny app running in my local RStudio. With the settings I chose, the playback tool introduced one new "user" session every 5 seconds, until 50 sessions total had been launched; then it waited until all the sessions were complete. I ran this test on both the sync and async versions in turn, which generated the following results. 565 566### Sync vs. async performance 567 568![](case-study-gantt-async.png) 569 570In this plot, each row represents a single session, and the x dimension represents time. Each of the rectangles represents a single "step" in the test script, be it downloading the HTML for the homepage, fetching one of the two dozen JavaScript/CSS files, or waiting for the server to update outputs. So the wider a rectangle is, the longer the user had to wait. (The empty gaps between rectangles represents time the app is waiting for the user to click an input; their widths are hard-coded into the test script.) 571 572Of particular importance are the red and pink rectangles, as these represent the initial page load. While these are taking place, the user is staring at a blank page, probably wondering if the server is down. Long waits during this stage are not only undesirable, but surprising and incomprehensible to the user; whereas the same user is probably prepared to wait a little while for a complicated visualization to be rendered in response to an input change. 573 574And as you can see from this plot, the behavior of the async app is much improved in the critical metric of homepage/JS/CSS loading time. The sync version of the app starts displaying unacceptably long red/pink loading times as early as session 15, and by session #44 the maximum page load time has exceeded one minute. The async version at that point is showing 25 second load times, which is far from great, but still a significant step in the right direction. 575 576### Further optimizations 577 578I was surprised that the async version's page load times weren't even faster, and even more surprised to see that the blue rectangles were just as wide as the sync version. Why isn't the async version way faster? The sync version does all of its work on a single thread, and I specifically designed this app to be a nightmare for scalability by having each session kick off by parsing hundreds of megabytes of CSV, an operation that is quite expensive. The async version gets to spread these jobs across several workers. Why aren't we seeing a greater time savings? 579 580Mostly, it's because calling `future_promise(read_csv("big_file.csv"))` is almost a worst-case scenario for future and async. `read_csv` is generally fast, but because the CRAN log files are so big, `read_csv("big_file.csv")` is slow. The value it returns is a very large data frame, that has now been loaded not into the Shiny process, but a `future` worker process. In order to return that data frame to the Shiny process, that data must first be serialized (I believe `future` essentially uses `saveRDS` for this), transmitted to the Shiny process, and then deserialized; to make matters worse, the transmitting and deserialization steps happen on the main R thread that we're working so hard to try to keep idle. **The larger the data we send back and forth to the future, the more performance suffers,** and in this case we're sending back quite a lot of data. 581 582We can make our code significantly faster by doing more summarizing, aggregation, and filtering _inside_ the future; not only does this make more of the work happen in parallel, but by returning the data in already-processed form, we can have much less data to transfer from the worker process back to the Shiny process. (For example, the data for May 31, 2018 weighs 75MB before optimization, and 8.8MB afterwards.) 583 584Compare all three runs in the image below (the newly optimized version is labelled "async2"). The homepage load times have dropped further, and the calculation times are now dramatically faster than the sync code. 585 586![](case-study-gantt-async2.png) 587 588Looking at the "async2" graph, the leading (bottom-left) edge has the same shape as before, as that's simply the rate at which the load testing tool launches new sessions. But notice how much more closely the trailing (upper-right) edge matches the leading edge! It means that even as the number of active sessions ramped up, the amount of latency didn't get dramatically worse, unlike with the "sync" and "async" versions. And each of the individual blue rectangles in the "async2" are comparatively tiny, meaning that users never have to wait more than a dozen seconds at the most for plots to update. 589 590This last plot shows the same data as above, but with the sessions aligned by start time. You can clearly see how the sessions are both shorter and less variable in "async2" compared to the others. I've added a yellow vertical line at the 10 second mark; if the page load (red/pink) has not completed at this point, it's likely that your visitor has left in disgust. While "async" does better than "sync", they both break through the 10 second mark early and often. In contrast, the "async2" version just barely peeks over the line three times. 591 592![](case-study-gantt-aligned.png) 593 594To get a visceral sense for what it feels like to use the app under load, here's a video that shows what it's like to browse the app while the load test is running at its peak. The left side of the screen shows "sync", the right shows "async2". In both cases, I navigated to the app when session #40 was started. 595 596<p class="embed-responsive embed-responsive-16by9"> 597<iframe class="embed-responsive-item" src="https://www.youtube-nocookie.com/embed/HsjdEZMnb0w?rel=0&showinfo=0&ecver=1" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe> 598</p> 599 600Take a look at the [code diff for async vs. async2](https://github.com/rstudio/cranwhales/compare/async...async2?diff=split). While the code has not changed very dramatically, it has lost a little elegance and maintainability: the code for each of the affected outputs now has one foot in the the render function and one foot in the future. If your app's total audience is a team of a hundred analysts and execs, you may choose to forgo the extra performance and stick with the original async (or even sync) code. But if you have serious scaling needs, the refactoring is probably a small price to pay. 601 602Let's get real for a second, though. If this weren't an example app written for exposition purposes, but a real production app that was intended to scale to thousands of concurrent users across dozens of R processes, we wouldn't download and parse CSV files on the fly. Instead, we'd establish a proper [ETL procedure](https://solutions.rstudio.com/examples/apps/twitter-etl/) to run every night and put the results into a properly indexed database table, or RDS files with just the data we need. As I said [earlier](#improving-performance-and-scalability), a little precomputation and caching can make a huge difference! 603 604Much of the remaining latency for the async2 branch is from ggplot2 plotting. [Sean's talk](https://rstudio.com/resources/) alluded to some upcoming plot caching features we're adding to Shiny, and I imagine they will have as dramatic an effect for this test as they did for Sean. 605 606## Summing up 607 608With async programming, expensive computations and tasks no longer need to be the scalability killers that they once were for Shiny. Armed with this and other common techniques like precomputation, caching, and load balancing, it's possible to write responsive and scalable Shiny applications that can be safely deployed to thousands of concurrent users. 609