• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

README.mdH A D05-Jun-20212.1 KiB5040

nlohmann_json.hH A D05-Jun-20211,011 3927

rapidjson.hH A D05-Jun-20212.7 KiB7047

sajson.hH A D05-Jun-20212.3 KiB6246

simdjson_dom.hH A D05-Jun-2021937 3927

simdjson_ondemand.hH A D05-Jun-20212.8 KiB8150

top_tweet.hH A D05-Jun-20211.8 KiB6852

yyjson.hH A D05-Jun-20212.3 KiB6951

README.md

1# Top Tweet Benchmark
2
3The top_tweet benchmark finds the most-retweeted tweet in a twitter API response.
4
5## Purpose
6
7This scenario tends to measure an implementation's laziness: its ability to avoid parsing unneeded
8values, without knowing beforehand which values are needed.
9
10To find the top tweet, an implementation needs to iterate through all tweets, remembering which one
11had the highest retweet count. While it scans, it will find many "candidate" tweets with the highest
12retweet count *up to that point.* However, While the implementation iterates through tweets, it will
13have many "candidate" tweets. Essentially, it has to keep track of the "top tweet so far" while it
14searches. However, only the text and screen_name of the *final* top tweet need to be parsed.
15Therefore, JSON parsers that can only parse values on the first pass (such as DOM or streaming
16parsers) will be forced to parse text and screen_name of every candidate (if not every single
17tweet). Parsers which can delay parsing of values until later will therefore shine in scenarios like
18this.
19
20## Rules
21
22The benchmark will be called with `run(padded_string &json, int64_t max_retweet_count, top_tweet_result &result)`.
23The benchmark must:
24- Find the tweet with the highest retweet_count at the top level of the "statuses" array.
25- Find the *last* such tweet: if multiple tweets have the same top retweet_count, the last one
26  should be returned.
27- Exclude tweets with retweet_count above max_retweet_count. This restriction is solely here because
28  the default twitter.json has a rather high retweet count in the third tweet, and to test laziness
29  the matching tweet needs to be further down in the file.
30- Fill in top_tweet_result with the corresponding fields from the matching tweet.
31
32### Abridged Schema
33
34The abridged schema (objects contain more fields than listed here):
35
36```json
37{
38  "statuses": [
39    {
40      "text": "i like to tweet", // text containing UTF-8 and escape characters
41      "user": {
42        "screen_name": "AlexanderHamilton" // string containing UTF-8 (and escape characters?)
43      },
44      "retweet_count": 2, // uint32
45    },
46    ...
47  ]
48}
49```
50