1.. _scipy-roadmap-detailed: 2 3Detailed SciPy Roadmap 4====================== 5 6Most of this roadmap is intended to provide a high-level view on what is 7most needed per SciPy submodule in terms of new functionality, bug fixes, etc. 8Besides important "business as usual" changes, it contains ideas for major new 9features - those are marked as such, and are expected to take significant 10dedicated effort. Things not mentioned in this roadmap are 11not necessarily unimportant or out of scope, however we (the SciPy developers) 12want to provide to our users and contributors a clear picture of where SciPy is 13going and where help is needed most. 14 15.. note:: This is the detailed roadmap. A very high-level overview with only 16 the most important ideas is :ref:`scipy-roadmap`. 17 18 19General 20------- 21This roadmap will be evolving together with SciPy. Updates can be submitted as 22pull requests. For large or disruptive changes you may want to discuss 23those first on the scipy-dev mailing list. 24 25 26API changes 27``````````` 28In general, we want to evolve the API to remove known warts as much as possible, 29*however as much as possible without breaking backwards compatibility*. 30 31Also, it should be made (even) more clear what is public and what is private in 32SciPy. Everything private should be named starting with an underscore as much 33as possible. 34 35 36Test coverage 37````````````` 38Test coverage of code added in the last few years is quite good, and we aim for 39a high coverage for all new code that is added. However, there is still a 40significant amount of old code for which coverage is poor. Bringing that up to 41the current standard is probably not realistic, but we should plug the biggest 42holes. 43 44Besides coverage there is also the issue of correctness - older code may have a 45few tests that provide decent statement coverage, but that doesn't necessarily 46say much about whether the code does what it says on the box. Therefore code 47review of some parts of the code (``stats``, ``signal`` and ``ndimage`` in 48particular) is necessary. 49 50 51Documentation 52````````````` 53The main website, scipy.org, needs to be rewritten. As discussed in the `mail 54list 55<https://mail.python.org/pipermail/scipy-dev/2019-September/023731.html>`_, the 56SciPy stack is not relevant anymore and this website should be made about 57SciPy only following the example of numpy.org. There is a lot of new content 58to write 59 60Otherwise, the documentation is in good shape. Expanding of current docstrings 61and putting them in the standard NumPy format should continue, so the number of 62reST errors and glitches in the html docs decreases. Most modules also have a 63tutorial in the reference guide that is a good introduction, however there are 64a few missing or incomplete tutorials - this should be fixed. 65 66 67Benchmarks 68`````````` 69The ``asv``-based benchmark system is in reasonable shape. It is quite easy to 70add new benchmarks, however running the benchmarks is not very intuitive. 71Making this easier is a priority. In addition, we should run them in our CI 72(gh-8779 is an ongoing attempt at this). 73 74 75Use of Cython 76````````````` 77Regarding Cython code: 78 79- It's not clear how much functionality can be Cythonized without making the 80 .so files too large. This needs measuring. 81- Cython's old syntax for using NumPy arrays should be removed and replaced 82 with Cython memoryviews. 83 84 85Windows build issues 86```````````````````` 87SciPy critically relies on Fortran code. This is still problematic on Windows. 88There are currently only two options: using Intel Fortran, or using MSVC + 89gfortran. The former is expensive, while the latter works (it's what we use 90for releases) but is quite hard to do correctly. For allowing contributors and 91end users to reliably build SciPy on Windows, using the Flang compiler looks 92like the best way forward long-term. 93 94 95Continuous integration 96`````````````````````` 97Continuous integration is in good shape, it currently covers the Windows, macOS 98and Linux, ARM64 and ppc64le platforms, as well as a range of versions of our 99dependencies and building release quality wheels. 100 101 102Size of binaries 103```````````````` 104SciPy binaries are quite large (e.g. an unzipped manylinux wheel for 1.4.1 is 10591 MB), and this can be problematic - for example for use in AWS Lambda, which 106has a 250 MB size limit. We aim to keep binary size as low as possible; when 107adding new compiled extensions, this needs checking. Stripping of debug symbols 108in ``multibuild`` can likely be improved (see `this issue 109<https://github.com/matthew-brett/multibuild/issues/162>`__). 110 111 112Modules 113------- 114 115cluster 116``````` 117This module is in good shape. 118 119 120constants 121````````` 122This module is basically done, low-maintenance and without open issues. 123 124 125fft 126```` 127This module is in good shape. 128 129 130integrate 131````````` 132Needed for ODE solvers: 133 134- Documentation is pretty bad, needs fixing 135- A new ODE solver interface (``solve_ivp``) was added in SciPy 1.0.0. 136 In the future we can consider (soft-)deprecating the older API. 137 138The numerical integration functions are in good shape. Support for integrating 139complex-valued functions and integrating multiple intervals (see `gh-3325 140<https://github.com/scipy/scipy/issues/3325>`__) could be added. 141 142 143interpolate 144``````````` 145 146Ideas for new features: 147 148- Spline fitting routines with better user control. 149- Transparent tensor-product splines. 150- NURBS support. 151- Mesh refinement and coarsening of B-splines and corresponding tensor products. 152 153io 154`` 155wavfile: 156 157- PCM float will be supported, for anything else use ``audiolab`` or other 158 specialized libraries. 159- Raise errors instead of warnings if data not understood. 160 161Other sub-modules (matlab, netcdf, idl, harwell-boeing, arff, matrix market) 162are in good shape. 163 164 165linalg 166`````` 167``scipy.linalg`` is in good shape. 168 169Needed: 170 171- Reduce duplication of functions with ``numpy.linalg``, make APIs consistent. 172- ``get_lapack_funcs`` should always use ``flapack`` 173- Wrap more LAPACK functions 174- One too many funcs for LU decomposition, remove one 175 176Ideas for new features: 177 178- Add type-generic wrappers in the Cython BLAS and LAPACK 179- Make many of the linear algebra routines into gufuncs 180 181**BLAS and LAPACK** 182 183The Python and Cython interfaces to BLAS and LAPACK in ``scipy.linalg`` are one 184of the most important things that SciPy provides. In general ``scipy.linalg`` 185is in good shape, however we can make a number of improvements: 186 1871. Library support. Our released wheels now ship with OpenBLAS, which is 188 currently the only feasible performant option (ATLAS is too slow, MKL cannot 189 be the default due to licensing issues, Accelerate support is dropped 190 because Apple doesn't update Accelerate anymore). OpenBLAS isn't very stable 191 though, sometimes its releases break things and it has issues with threading 192 (currently the only issue for using SciPy with PyPy3). We need at the very 193 least better support for debugging OpenBLAS issues, and better documentation 194 on how to build SciPy with it. An option is to use BLIS for a BLAS 195 interface (see `numpy gh-7372 <https://github.com/numpy/numpy/issues/7372>`__). 196 1972. Support for newer LAPACK features. In SciPy 1.2.0 we increased the minimum 198 supported version of LAPACK to 3.4.0. Now that we dropped Python 2.7, we 199 can increase that version further (MKL + Python 2.7 was the blocker for 200 >3.4.0 previously) and start adding support for new features in LAPACK. 201 202 203misc 204```` 205``scipy.misc`` will be removed as a public module. Most functions in it have 206been moved to another submodule or deprecated. The few that are left: 207 208- ``info``, ``who`` : these are NumPy functions 209- ``derivative``, ``central_diff_weight`` : remove, possibly replacing them 210 with more extensive functionality for numerical differentiation. 211 212 213ndimage 214``````` 215Underlying ``ndimage`` is a powerful interpolation engine. Users come 216with an expectation of one of two models: a pixel model with ``(1, 2171)`` elements having centers ``(0.5, 0.5)``, or a data point model, 218where values are defined at points on a grid. Over time, we've become 219convinced that the data point model is better defined and easier to 220implement, but this should be clearly communicated in the documentation. 221 222More importantly, still, SciPy implements one *variant* of this data 223point model, where datapoints at any two extremes of an axis share a 224spatial location under *periodic wrapping* mode. E.g., in a 1D array, 225you would have ``x[0]`` and ``x[-1]`` co-located. A very common 226use-case, however, is for signals to be periodic, with equal spacing 227between the first and last element along an axis (instead of zero 228spacing). Wrapping modes for this use-case were added in 229`gh-8537 <https://github.com/scipy/scipy/pull/8537>`__, next the 230interpolation routines should be updated to use those modes. 231This should address several issues, including gh-1323, gh-1903, gh-2045 232and gh-2640. 233 234The morphology interface needs to be standardized: 235 236- binary dilation/erosion/opening/closing take a "structure" argument, 237 whereas their grey equivalent take size (has to be a tuple, not a scalar), 238 footprint, or structure. 239- a scalar should be acceptable for size, equivalent to providing that same 240 value for each axis. 241- for binary dilation/erosion/opening/closing, the structuring element is 242 optional, whereas it's mandatory for grey. Grey morphology operations 243 should get the same default. 244- other filters should also take that default value where possible. 245 246 247odr 248``` 249This module is in reasonable shape, although it could use a bit more 250maintenance. No major plans or wishes here. 251 252 253optimize 254```````` 255Overall this module is in good shape. Two good global optimizers were added in 2561.2.0; large-scale optimizers is still a gap that could be filled. Other 257things that are needed: 258 259- Many ideas for additional functionality (e.g. integer constraints, sparse 260 matrix support, performance improvements) in ``linprog``, see 261 `gh-9269 <https://github.com/scipy/scipy/issues/9269>`__. 262- Add functionality to the benchmark suite to compare results more easily 263 (e.g. with summary plots). 264- deprecate the ``fmin_*`` functions in the documentation, ``minimize`` is 265 preferred. 266- ``scipy.optimize`` has an extensive set of benchmarks for accuracy and speed of 267 the global optimizers. That has allowed adding new optimizers (``shgo`` and 268 ``dual_annealing``) with significantly better performance than the existing 269 ones. The ``optimize`` benchmark system itself is slow and hard to use 270 however; we need to make it faster and make it easier to compare performance of 271 optimizers via plotting performance profiles. 272 273 274signal 275`````` 276*Convolution and correlation*: (Relevant functions are convolve, correlate, 277fftconvolve, convolve2d, correlate2d, and sepfir2d.) Eliminate the overlap with 278`ndimage` (and elsewhere). From ``numpy``, ``scipy.signal`` and ``scipy.ndimage`` 279(and anywhere else we find them), pick the "best of class" for 1-D, 2-D and n-d 280convolution and correlation, put the implementation somewhere, and use that 281consistently throughout SciPy. 282 283*B-splines*: (Relevant functions are bspline, cubic, quadratic, gauss_spline, 284cspline1d, qspline1d, cspline2d, qspline2d, cspline1d_eval, and spline_filter.) 285Move the good stuff to `interpolate` (with appropriate API changes to match how 286things are done in `interpolate`), and eliminate any duplication. 287 288*Filter design*: merge `firwin` and `firwin2` so `firwin2` can be removed. 289 290*Continuous-Time Linear Systems*: remove `lsim2`, `impulse2`, `step2`. The 291`lsim`, `impulse` and `step` functions now "just work" for any input system. 292Further improve the performance of ``ltisys`` (fewer internal transformations 293between different representations). Fill gaps in lti system conversion functions. 294 295*Second Order Sections*: Make SOS filtering equally capable as existing 296methods. This includes ltisys objects, an `lfiltic` equivalent, and numerically 297stable conversions to and from other filter representations. SOS filters could 298be considered as the default filtering method for ltisys objects, for their 299numerical stability. 300 301*Wavelets*: what's there now doesn't make much sense. Continuous wavelets 302only at the moment - decide whether to completely rewrite or remove them. 303Discrete wavelet transforms are out of scope (PyWavelets does a good job 304for those). 305 306 307sparse 308`````` 309The sparse matrix formats are mostly feature-complete, however the main issue 310is that they act like ``numpy.matrix`` (which will be deprecated in NumPy at 311some point). What we want is sparse arrays, that act like ``numpy.ndarray``. 312This is being worked on in https://github.com/pydata/sparse, which is quite far 313along. The tentative plan is: 314 315- Start depending on ``pydata/sparse`` once it's feature-complete enough (it 316 still needs a CSC/CSR equivalent) and okay performance-wise. 317- Add support for ``pydata/sparse`` to ``scipy.sparse.linalg`` (and perhaps to 318 ``scipy.sparse.csgraph`` after that). 319- Indicate in the documentation that for new code users should prefer 320 ``pydata/sparse`` over sparse matrices. 321- When NumPy deprecates ``numpy.matrix``, vendor that or maintain it as a 322 stand-alone package. 323 324Regarding the different sparse matrix formats: there are a lot of them. These 325should be kept, but improvements/optimizations should go into CSR/CSC, which 326are the preferred formats. LIL may be the exception, it's inherently 327inefficient. It could be dropped if DOK is extended to support all the 328operations LIL currently provides. 329 330 331sparse.csgraph 332`````````````` 333This module is in good shape. 334 335 336sparse.linalg 337````````````` 338Arpack is in good shape. 339 340isolve: 341 342- callback keyword is inconsistent 343- tol keyword is broken, should be relative tol 344- Fortran code not re-entrant (but we don't solve, maybe re-use from 345 PyKrilov) 346 347dsolve: 348 349- add sparse Cholesky or incomplete Cholesky 350- add sparse QR 351- improve interface to SuiteSparse UMFPACK 352- add interfaces to SuiteSparse CHOLMOD and SPQR 353 354Ideas for new features: 355 356- Wrappers for PROPACK for faster sparse SVD computation. 357 358 359spatial 360``````` 361QHull wrappers are in good shape, as is ``cKDTree``. 362 363Needed: 364 365- ``KDTree`` will be removed, and ``cKDTree`` will be renamed to ``KDTree`` 366 in a backwards-compatible way. 367- ``distance_wrap.c`` needs to be cleaned up (maybe rewrite in Cython). 368 369 370special 371``````` 372Though there are still a lot of functions that need improvements in precision, 373probably the only show-stoppers are hypergeometric functions, parabolic cylinder 374functions, and spheroidal wave functions. Three possible ways to handle this: 375 3761. Get good double-precision implementations. This is doable for parabolic 377 cylinder functions (in progress). I think it's possible for hypergeometric 378 functions, though maybe not in time. For spheroidal wavefunctions this is 379 not possible with current theory. 380 3812. Port Boost's arbitrary precision library and use it under the hood to get 382 double precision accuracy. This might be necessary as a stopgap measure 383 for hypergeometric functions; the idea of using arbitrary precision has 384 been suggested before by @nmayorov and in 385 `gh-5349 <https://github.com/scipy/scipy/issues/5349>`__. Likely 386 necessary for spheroidal wave functions, this could be reused: 387 https://github.com/radelman/scattering. 388 3893. Add clear warnings to the documentation about the limits of the existing 390 implementations. 391 392 393stats 394````` 395 396The ``scipy.stats`` subpackage aims to provide fundamental statistical 397methods as might be covered in standard statistics texts such as Johnson's 398"Miller & Freund's Probability and Statistics for Engineers", Sokal & Rohlf's 399"Biometry", or Zar's "Biostatistical Analysis". It does not seek to duplicate 400the advanced functionality of downstream packages (e.g. StatsModels, 401LinearModels, PyMC3); instead, it can provide a solid foundation on which 402they can build. (Note that these are rough guidelines, not strict rules. 403"Advanced" is an ill-defined and subjective term, and "advanced" methods 404may also be included in SciPy, especially if no other widely used and 405well-supported package covers the topic. Also note that *some* duplication 406with downstream projects is inevitable and not necessarily a bad thing.) 407 408The following improvements will help SciPy better serve this role. 409 410- Add fundamental and widely used hypothesis tests: 411 412 - Tukey-Kramer test 413 - Dunnett's test 414 - the various types of analysis of variance (ANOVA): 415 416 - two-way ANOVA (single replicate, uniform number of replicates, variable 417 number of replicates) 418 - multiway ANOVA (i.e. generalize two-way ANOVA) 419 - nested ANOVA 420 - analysis of covariance (ANCOVA) 421 422- Add additional tools for meta-analysis; currently we have just `combine_pvalues`. 423- Enhance the `fit` method of the continuous probability distributions: 424 425 - Expand the options for fitting to include: 426 427 - maximal product spacings 428 - method of L-moments / probability weighted moments 429 430 - Include measures of goodness-of-fit in the results 431 - Handle censored data (e.g. merge `gh-13699 <https://github.com/scipy/scipy/pull/13699>`__) 432 433- Implement additional widely used continuous and discrete probability 434 distributions: 435 436 - multivariate t distribution 437 - mixture distributions 438 439- Improve the core calculations provided by SciPy's probability distributions 440 so they can robustly handle wide ranges of parameter values. Specifically, 441 replace many of the PDF and CDF methods from the Fortran library CDFLIB 442 used in scipy.special with Boost implementations as in 443 `gh-13328 <https://github.com/scipy/scipy/pull/13328>`__. 444 445In addition, we should: 446 447- Continue work on making the function signatures of ``stats`` and 448 ``stats.mstats`` more consistent, and add tests to ensure that that 449 remains the case. 450- Improve statistical tests: consistently provide options for one- and 451 two-sided alternative hypotheses where applicable, return confidence 452 intervals for the test statistic, and implement exact p-value calculations - 453 considering the possibility of ties - where computationally feasible. 454