1.. _scipy-roadmap-detailed:
2
3Detailed SciPy Roadmap
4======================
5
6Most of this roadmap is intended to provide a high-level view on what is
7most needed per SciPy submodule in terms of new functionality, bug fixes, etc.
8Besides important "business as usual" changes, it contains ideas for major new
9features - those are marked as such, and are expected to take significant
10dedicated effort.  Things not mentioned in this roadmap are
11not necessarily unimportant or out of scope, however we (the SciPy developers)
12want to provide to our users and contributors a clear picture of where SciPy is
13going and where help is needed most.
14
15.. note:: This is the detailed roadmap.  A very high-level overview with only
16   the most important ideas is :ref:`scipy-roadmap`.
17
18
19General
20-------
21This roadmap will be evolving together with SciPy.  Updates can be submitted as
22pull requests.  For large or disruptive changes you may want to discuss
23those first on the scipy-dev mailing list.
24
25
26API changes
27```````````
28In general, we want to evolve the API to remove known warts as much as possible,
29*however as much as possible without breaking backwards compatibility*.
30
31Also, it should be made (even) more clear what is public and what is private in
32SciPy.  Everything private should be named starting with an underscore as much
33as possible.
34
35
36Test coverage
37`````````````
38Test coverage of code added in the last few years is quite good, and we aim for
39a high coverage for all new code that is added.  However, there is still a
40significant amount of old code for which coverage is poor.  Bringing that up to
41the current standard is probably not realistic, but we should plug the biggest
42holes.
43
44Besides coverage there is also the issue of correctness - older code may have a
45few tests that provide decent statement coverage, but that doesn't necessarily
46say much about whether the code does what it says on the box.  Therefore code
47review of some parts of the code (``stats``, ``signal`` and ``ndimage`` in
48particular) is necessary.
49
50
51Documentation
52`````````````
53The main website, scipy.org, needs to be rewritten. As discussed in the `mail
54list
55<https://mail.python.org/pipermail/scipy-dev/2019-September/023731.html>`_, the
56SciPy stack is not relevant anymore and this website should be made about
57SciPy only following the example of numpy.org. There is a lot of new content
58to write
59
60Otherwise, the documentation is in good shape. Expanding of current docstrings
61and putting them in the standard NumPy format should continue, so the number of
62reST errors and glitches in the html docs decreases.  Most modules also have a
63tutorial in the reference guide that is a good introduction, however there are
64a few missing or incomplete tutorials - this should be fixed.
65
66
67Benchmarks
68``````````
69The ``asv``-based benchmark system is in reasonable shape.  It is quite easy to
70add new benchmarks, however running the benchmarks is not very intuitive.
71Making this easier is a priority.  In addition, we should run them in our CI
72(gh-8779 is an ongoing attempt at this).
73
74
75Use of Cython
76`````````````
77Regarding Cython code:
78
79- It's not clear how much functionality can be Cythonized without making the
80  .so files too large.  This needs measuring.
81- Cython's old syntax for using NumPy arrays should be removed and replaced
82  with Cython memoryviews.
83
84
85Windows build issues
86````````````````````
87SciPy critically relies on Fortran code. This is still problematic on Windows.
88There are currently only two options: using Intel Fortran, or using MSVC +
89gfortran.  The former is expensive, while the latter works (it's what we use
90for releases) but is quite hard to do correctly.  For allowing contributors and
91end users to reliably build SciPy on Windows, using the Flang compiler looks
92like the best way forward long-term.
93
94
95Continuous integration
96``````````````````````
97Continuous integration is in good shape, it currently covers the Windows, macOS
98and Linux, ARM64 and ppc64le platforms, as well as a range of versions of our
99dependencies and building release quality wheels.
100
101
102Size of binaries
103````````````````
104SciPy binaries are quite large (e.g. an unzipped manylinux wheel for 1.4.1 is
10591 MB), and this can be problematic - for example for use in AWS Lambda, which
106has a 250 MB size limit. We aim to keep binary size as low as possible; when
107adding new compiled extensions, this needs checking. Stripping of debug symbols
108in ``multibuild`` can likely be improved (see `this issue
109<https://github.com/matthew-brett/multibuild/issues/162>`__).
110
111
112Modules
113-------
114
115cluster
116```````
117This module is in good shape.
118
119
120constants
121`````````
122This module is basically done, low-maintenance and without open issues.
123
124
125fft
126````
127This module is in good shape.
128
129
130integrate
131`````````
132Needed for ODE solvers:
133
134- Documentation is pretty bad, needs fixing
135- A new ODE solver interface  (``solve_ivp``) was added in SciPy 1.0.0.
136  In the future we can consider (soft-)deprecating the older API.
137
138The numerical integration functions are in good shape.  Support for integrating
139complex-valued functions and integrating multiple intervals (see `gh-3325
140<https://github.com/scipy/scipy/issues/3325>`__) could be added.
141
142
143interpolate
144```````````
145
146Ideas for new features:
147
148- Spline fitting routines with better user control.
149- Transparent tensor-product splines.
150- NURBS support.
151- Mesh refinement and coarsening of B-splines and corresponding tensor products.
152
153io
154``
155wavfile:
156
157- PCM float will be supported, for anything else use ``audiolab`` or other
158  specialized libraries.
159- Raise errors instead of warnings if data not understood.
160
161Other sub-modules (matlab, netcdf, idl, harwell-boeing, arff, matrix market)
162are in good shape.
163
164
165linalg
166``````
167``scipy.linalg`` is in good shape.
168
169Needed:
170
171- Reduce duplication of functions with ``numpy.linalg``, make APIs consistent.
172- ``get_lapack_funcs`` should always use ``flapack``
173- Wrap more LAPACK functions
174- One too many funcs for LU decomposition, remove one
175
176Ideas for new features:
177
178- Add type-generic wrappers in the Cython BLAS and LAPACK
179- Make many of the linear algebra routines into gufuncs
180
181**BLAS and LAPACK**
182
183The Python and Cython interfaces to BLAS and LAPACK in ``scipy.linalg`` are one
184of the most important things that SciPy provides. In general ``scipy.linalg``
185is in good shape, however we can make a number of improvements:
186
1871. Library support. Our released wheels now ship with OpenBLAS, which is
188   currently the only feasible performant option (ATLAS is too slow, MKL cannot
189   be the default due to licensing issues, Accelerate support is dropped
190   because Apple doesn't update Accelerate anymore). OpenBLAS isn't very stable
191   though, sometimes its releases break things and it has issues with threading
192   (currently the only issue for using SciPy with PyPy3).  We need at the very
193   least better support for debugging OpenBLAS issues, and better documentation
194   on how to build SciPy with it.  An option is to use BLIS for a BLAS
195   interface (see `numpy gh-7372 <https://github.com/numpy/numpy/issues/7372>`__).
196
1972. Support for newer LAPACK features.  In SciPy 1.2.0 we increased the minimum
198   supported version of LAPACK to 3.4.0.  Now that we dropped Python 2.7, we
199   can increase that version further (MKL + Python 2.7 was the blocker for
200   >3.4.0 previously) and start adding support for new features in LAPACK.
201
202
203misc
204````
205``scipy.misc`` will be removed as a public module.  Most functions in it have
206been moved to another submodule or deprecated.  The few that are left:
207
208- ``info``, ``who`` : these are NumPy functions
209- ``derivative``, ``central_diff_weight`` : remove, possibly replacing them
210  with more extensive functionality for numerical differentiation.
211
212
213ndimage
214```````
215Underlying ``ndimage`` is a powerful interpolation engine.  Users come
216with an expectation of one of two models: a pixel model with ``(1,
2171)`` elements having centers ``(0.5, 0.5)``, or a data point model,
218where values are defined at points on a grid.  Over time, we've become
219convinced that the data point model is better defined and easier to
220implement, but this should be clearly communicated in the documentation.
221
222More importantly, still, SciPy implements one *variant* of this data
223point model, where datapoints at any two extremes of an axis share a
224spatial location under *periodic wrapping* mode.  E.g., in a 1D array,
225you would have ``x[0]`` and ``x[-1]`` co-located.  A very common
226use-case, however, is for signals to be periodic, with equal spacing
227between the first and last element along an axis (instead of zero
228spacing).  Wrapping modes for this use-case were added in
229`gh-8537 <https://github.com/scipy/scipy/pull/8537>`__, next the
230interpolation routines should be updated to use those modes.
231This should address several issues, including gh-1323, gh-1903, gh-2045
232and gh-2640.
233
234The morphology interface needs to be standardized:
235
236- binary dilation/erosion/opening/closing take a "structure" argument,
237  whereas their grey equivalent take size (has to be a tuple, not a scalar),
238  footprint, or structure.
239- a scalar should be acceptable for size, equivalent to providing that same
240  value for each axis.
241- for binary dilation/erosion/opening/closing, the structuring element is
242  optional, whereas it's mandatory for grey.  Grey morphology operations
243  should get the same default.
244- other filters should also take that default value where possible.
245
246
247odr
248```
249This module is in reasonable shape, although it could use a bit more
250maintenance.  No major plans or wishes here.
251
252
253optimize
254````````
255Overall this module is in good shape. Two good global optimizers were added in
2561.2.0; large-scale optimizers is still a gap that could be filled.  Other
257things that are needed:
258
259- Many ideas for additional functionality (e.g. integer constraints, sparse
260  matrix support, performance improvements) in ``linprog``, see
261  `gh-9269 <https://github.com/scipy/scipy/issues/9269>`__.
262- Add functionality to the benchmark suite to compare results more easily
263  (e.g. with summary plots).
264- deprecate the ``fmin_*`` functions in the documentation, ``minimize`` is
265  preferred.
266- ``scipy.optimize`` has an extensive set of benchmarks for accuracy and speed of
267  the global optimizers. That has allowed adding new optimizers (``shgo`` and
268  ``dual_annealing``) with significantly better performance than the existing
269  ones.  The ``optimize`` benchmark system itself is slow and hard to use
270  however; we need to make it faster and make it easier to compare performance of
271  optimizers via plotting performance profiles.
272
273
274signal
275``````
276*Convolution and correlation*: (Relevant functions are convolve, correlate,
277fftconvolve, convolve2d, correlate2d, and sepfir2d.) Eliminate the overlap with
278`ndimage` (and elsewhere).  From ``numpy``, ``scipy.signal`` and ``scipy.ndimage``
279(and anywhere else we find them), pick the "best of class" for 1-D, 2-D and n-d
280convolution and correlation, put the implementation somewhere, and use that
281consistently throughout SciPy.
282
283*B-splines*: (Relevant functions are bspline, cubic, quadratic, gauss_spline,
284cspline1d, qspline1d, cspline2d, qspline2d, cspline1d_eval, and spline_filter.)
285Move the good stuff to `interpolate` (with appropriate API changes to match how
286things are done in `interpolate`), and eliminate any duplication.
287
288*Filter design*: merge `firwin` and `firwin2` so `firwin2` can be removed.
289
290*Continuous-Time Linear Systems*: remove `lsim2`, `impulse2`, `step2`.  The
291`lsim`, `impulse` and `step` functions now "just work" for any input system.
292Further improve the performance of ``ltisys`` (fewer internal transformations
293between different representations). Fill gaps in lti system conversion functions.
294
295*Second Order Sections*: Make SOS filtering equally capable as existing
296methods. This includes ltisys objects, an `lfiltic` equivalent, and numerically
297stable conversions to and from other filter representations. SOS filters could
298be considered as the default filtering method for ltisys objects, for their
299numerical stability.
300
301*Wavelets*: what's there now doesn't make much sense.  Continuous wavelets
302only at the moment - decide whether to completely rewrite or remove them.
303Discrete wavelet transforms are out of scope (PyWavelets does a good job
304for those).
305
306
307sparse
308``````
309The sparse matrix formats are mostly feature-complete, however the main issue
310is that they act like ``numpy.matrix`` (which will be deprecated in NumPy at
311some point).  What we want is sparse arrays, that act like ``numpy.ndarray``.
312This is being worked on in https://github.com/pydata/sparse, which is quite far
313along.  The tentative plan is:
314
315- Start depending on ``pydata/sparse`` once it's feature-complete enough (it
316  still needs a CSC/CSR equivalent) and okay performance-wise.
317- Add support for ``pydata/sparse`` to ``scipy.sparse.linalg`` (and perhaps to
318  ``scipy.sparse.csgraph`` after that).
319- Indicate in the documentation that for new code users should prefer
320  ``pydata/sparse`` over sparse matrices.
321- When NumPy deprecates ``numpy.matrix``, vendor that or maintain it as a
322  stand-alone package.
323
324Regarding the different sparse matrix formats: there are a lot of them.  These
325should be kept, but improvements/optimizations should go into CSR/CSC, which
326are the preferred formats.  LIL may be the exception, it's inherently
327inefficient.  It could be dropped if DOK is extended to support all the
328operations LIL currently provides.
329
330
331sparse.csgraph
332``````````````
333This module is in good shape.
334
335
336sparse.linalg
337`````````````
338Arpack is in good shape.
339
340isolve:
341
342- callback keyword is inconsistent
343- tol keyword is broken, should be relative tol
344- Fortran code not re-entrant (but we don't solve, maybe re-use from
345  PyKrilov)
346
347dsolve:
348
349- add sparse Cholesky or incomplete Cholesky
350- add sparse QR
351- improve interface to SuiteSparse UMFPACK
352- add interfaces to SuiteSparse CHOLMOD and SPQR
353
354Ideas for new features:
355
356- Wrappers for PROPACK for faster sparse SVD computation.
357
358
359spatial
360```````
361QHull wrappers are in good shape, as is ``cKDTree``.
362
363Needed:
364
365- ``KDTree`` will be removed, and ``cKDTree`` will be renamed to ``KDTree``
366  in a backwards-compatible way.
367- ``distance_wrap.c`` needs to be cleaned up (maybe rewrite in Cython).
368
369
370special
371```````
372Though there are still a lot of functions that need improvements in precision,
373probably the only show-stoppers are hypergeometric functions, parabolic cylinder
374functions, and spheroidal wave functions. Three possible ways to handle this:
375
3761. Get good double-precision implementations. This is doable for parabolic
377   cylinder functions (in progress). I think it's possible for hypergeometric
378   functions, though maybe not in time. For spheroidal wavefunctions this is
379   not possible with current theory.
380
3812. Port Boost's arbitrary precision library and use it under the hood to get
382   double precision accuracy. This might be necessary as a stopgap measure
383   for hypergeometric functions; the idea of using arbitrary precision has
384   been suggested before by @nmayorov and in
385   `gh-5349 <https://github.com/scipy/scipy/issues/5349>`__.  Likely
386   necessary for spheroidal wave functions, this could be reused:
387   https://github.com/radelman/scattering.
388
3893. Add clear warnings to the documentation about the limits of the existing
390   implementations.
391
392
393stats
394`````
395
396The ``scipy.stats`` subpackage aims to provide fundamental statistical
397methods as might be covered in standard statistics texts such as Johnson's
398"Miller & Freund's Probability and Statistics for Engineers", Sokal & Rohlf's
399"Biometry", or Zar's "Biostatistical Analysis".  It does not seek to duplicate
400the advanced functionality of downstream packages (e.g. StatsModels,
401LinearModels, PyMC3); instead, it can provide a solid foundation on which
402they can build.  (Note that these are rough guidelines, not strict rules.
403"Advanced" is an ill-defined and subjective term, and "advanced" methods
404may also be included in SciPy, especially if no other widely used and
405well-supported package covers the topic.  Also note that *some* duplication
406with downstream projects is inevitable and not necessarily a bad thing.)
407
408The following improvements will help SciPy better serve this role.
409
410- Add fundamental and widely used hypothesis tests:
411
412  - Tukey-Kramer test
413  - Dunnett's test
414  - the various types of analysis of variance (ANOVA):
415
416    - two-way ANOVA (single replicate, uniform number of replicates, variable
417      number of replicates)
418    - multiway ANOVA (i.e. generalize two-way ANOVA)
419    - nested ANOVA
420    - analysis of covariance (ANCOVA)
421
422- Add additional tools for meta-analysis; currently we have just `combine_pvalues`.
423- Enhance the `fit` method of the continuous probability distributions:
424
425  - Expand the options for fitting to include:
426
427    - maximal product spacings
428    - method of L-moments / probability weighted moments
429
430  - Include measures of goodness-of-fit in the results
431  - Handle censored data (e.g. merge `gh-13699 <https://github.com/scipy/scipy/pull/13699>`__)
432
433- Implement additional widely used continuous and discrete probability
434  distributions:
435
436  - multivariate t distribution
437  - mixture distributions
438
439- Improve the core calculations provided by SciPy's probability distributions
440  so they can robustly handle wide ranges of parameter values.  Specifically,
441  replace many of the PDF and CDF methods from the Fortran library CDFLIB
442  used in scipy.special with Boost implementations as in
443  `gh-13328 <https://github.com/scipy/scipy/pull/13328>`__.
444
445In addition, we should:
446
447- Continue work on making the function signatures of ``stats`` and
448  ``stats.mstats`` more consistent, and add tests to ensure that that
449  remains the case.
450- Improve statistical tests: consistently provide options for one- and
451  two-sided alternative hypotheses where applicable, return confidence
452  intervals for the test statistic, and implement exact p-value calculations -
453  considering the possibility of ties - where computationally feasible.
454