1namespace tf { 2 3/** @page release-3-2-0 Release 3.2.0 (2021/07/29) 4 5%Taskflow 3.2.0 is the 3rd release in the 3.x line! 6This release includes several new changes such as CPU-GPU tasking, algorithm collection, 7enhanced web-based profiler, documentation, and unit tests. 8 9@tableofcontents 10 11@section release-3-2-0_download Download 12 13%Taskflow 3.2.0 can be downloaded from <a href="https://github.com/taskflow/taskflow/releases/tag/v3.2.0">here</a>. 14 15@section release-3-2-0_system_requirements System Requirements 16 17To use %Taskflow v3.2.0, you need a compiler that supports C++17: 18 19@li GNU C++ Compiler at least v8.4 with -std=c++17 20@li Clang C++ Compiler at least v6.0 with -std=c++17 21@li Microsoft Visual Studio at least v19.27 with /std:c++17 22@li AppleClang Xode Version at least v12.0 with -std=c++17 23@li Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17 24@li Intel C++ Compiler at least v19.0.1 with -std=c++17 25@li Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20 26 27%Taskflow works on Linux, Windows, and Mac OS X. 28 29@section release-3-2-0_working_items Working Items 30 31@li enhancing support for SYCL with Intel DPC++ 32@li enhancing parallel CPU and GPU algorithms 33@li designing pipeline interface and its scheduling algorithms 34 35@section release-3-2-0_new_features New Features 36 37@subsection release-3-2-0_taskflow_core Taskflow Core 38 39@li added tf::SmallVector optimization for optimizing the dependency storage in a graph 40@li added move constructor and move assignment operator for tf::Taskflow 41 + tf::Taskflow::Taskflow(Taskflow&&) 42 + tf::Taskflow::operator=(Taskflow&&) 43@li added moved run in tf::Executor for automatically managing taskflow's lifetimes 44 + tf::Executor::run(Taskflow&&) 45 + tf::Executor::run(Taskflow&&, C&&) 46 + tf::Executor::run_n(Taskflow&&, size_t) 47 + tf::Executor::run_n(Taskflow&&, size_t, C&&) 48 + tf::Executor::run_until(Taskflow&&, P&&) 49 + tf::Executor::run_until(Taskflow&&, P&&, C&&) 50 51@subsection release-3-2-0_cudaflow cudaFlow 52 53@li improved the execution flow of tf::cudaFlowCapturer when updates involve 54 55New algorithms in tf::cudaFlow and tf::cudaFlowCapturer: 56 57@li added tf::cudaFlow::reduce 58@li added tf::cudaFlow::transform_reduce 59@li added tf::cudaFlow::uninitialized_reduce 60@li added tf::cudaFlow::transform_uninitialized_reduce 61@li added tf::cudaFlow::inclusive_scan 62@li added tf::cudaFlow::exclusive_scan 63@li added tf::cudaFlow::transform_inclusive_scan 64@li added tf::cudaFlow::transform_exclusive_scan 65@li added tf::cudaFlow::merge 66@li added tf::cudaFlow::merge_by_key 67@li added tf::cudaFlow::sort 68@li added tf::cudaFlow::sort_by_key 69@li added tf::cudaFlow::find_if 70@li added tf::cudaFlow::min_element 71@li added tf::cudaFlow::max_element 72@li added tf::cudaFlowCapturer::reduce 73@li added tf::cudaFlowCapturer::transform_reduce 74@li added tf::cudaFlowCapturer::uninitialized_reduce 75@li added tf::cudaFlowCapturer::transform_uninitialized_reduce 76@li added tf::cudaFlowCapturer::inclusive_scan 77@li added tf::cudaFlowCapturer::exclusive_scan 78@li added tf::cudaFlowCapturer::transform_inclusive_scan 79@li added tf::cudaFlowCapturer::transform_exclusive_scan 80@li added tf::cudaFlowCapturer::merge 81@li added tf::cudaFlowCapturer::merge_by_key 82@li added tf::cudaFlowCapturer::sort 83@li added tf::cudaFlowCapturer::sort_by_key 84@li added tf::cudaFlowCapturer::find_if 85@li added tf::cudaFlowCapturer::min_element 86@li added tf::cudaFlowCapturer::max_element 87@li added tf::cudaLinearCapturing 88 89@subsection release-3-2-0_syclflow syclFlow 90 91@subsection release-3-2-0_cuda_std_algorithms CUDA Standard Parallel Algorithms 92 93@li added tf::cuda_for_each 94@li added tf::cuda_for_each_index 95@li added tf::cuda_transform 96@li added tf::cuda_reduce 97@li added tf::cuda_uninitialized_reduce 98@li added tf::cuda_transform_reduce 99@li added tf::cuda_transform_uninitialized_reduce 100@li added tf::cuda_inclusive_scan 101@li added tf::cuda_exclusive_scan 102@li added tf::cuda_transform_inclusive_scan 103@li added tf::cuda_transform_exclusive_scan 104@li added tf::cuda_merge 105@li added tf::cuda_merge_by_key 106@li added tf::cuda_sort 107@li added tf::cuda_sort_by_key 108@li added tf::cuda_find_if 109@li added tf::cuda_min_element 110@li added tf::cuda_max_element 111 112@subsection release-3-2-0_utilities Utilities 113 114@li added CUDA meta programming 115@li added SYCL meta programming 116 117@subsection release-3-2-0_profiler Taskflow Profiler (TFProf) 118 119@section release-3-2-0_bug_fixes Bug Fixes 120 121@li fixed compilation errors in constructing tf::cudaRoundRobinCapturing 122@li fixed compilation errors of TLS worker pointer in tf::Executor 123@li fixed compilation errors of nvcc v11.3 in auto template deduction 124 + std::scoped_lock 125 + tf::Serializer and tf::Deserializer 126@li fixed memory leak when moving a tf::Taskflow 127 128@section release-3-2-0_breaking_changes Breaking Changes 129 130There are no breaking changes in this release. 131 132@section release-3-2-0_deprecated_items Deprecated and Removed Items 133 134@li removed tf::cudaFlow::kernel_on method 135@li removed explicit partitions in parallel iterations and reductions 136@li removed tf::cudaFlowCapturerBase 137@li removed tf::cublasFlowCapturer 138@li renamed update and rebind methods in tf::cudaFlow and tf::cudaFlowCapturer 139 to overloads 140 141@section release-3-2-0_documentation Documentation 142 143@li revised @ref StaticTasking 144 + @ref MoveATaskflow 145@li revised @ref ExecuteTaskflow 146 + @ref ExecuteATaskflowWithTransferredOwnership 147@li revised @ref cudaFlowReduce 148@li added @ref cudaFlowAlgorithms 149 + @ref cudaFlowReduce 150 + @ref cudaFlowScan 151 + @ref cudaFlowMerge 152 + @ref cudaFlowSort 153@li added @ref cudaStandardAlgorithms 154 + @ref CUDASTDExecutionPolicy 155 + @ref CUDASTDReduce 156 + @ref CUDASTDScan 157 + @ref CUDASTDMerge 158 + @ref CUDASTDSort 159 + @ref CUDASTDFind 160 161@section release-3-2-0_miscellaneous_items Miscellaneous Items 162 163We have published tf::cudaFlow in the following conference: 164 + Dian-Lun Lin and Tsung-Wei Huang, "Efficient GPU Computation using %Task Graph Parallelism," <i>European Conference on Parallel and Distributed Computing (EuroPar)</i>, 2021 165 166*/ 167 168} 169