• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.github/H25-Apr-2017-116

R/H25-Apr-2017-27,95313,582

assembly/H25-Apr-2017-337278

bin/H03-May-2022-945551

build/H25-Apr-2017-550359

common/H25-Apr-2017-20,91612,952

conf/H25-Apr-2017-379256

core/H25-Apr-2017-167,316110,089

data/H25-Apr-2017-8583

dev/H03-May-2022-6,4404,422

docs/H25-Apr-2017-34,16927,442

examples/H25-Apr-2017-35,62017,287

external/H25-Apr-2017-23,13315,102

graphx/H25-Apr-2017-8,8904,718

launcher/H25-Apr-2017-4,3942,721

licenses/H03-May-2022-

mesos/H25-Apr-2017-5,7413,933

mllib/H25-Apr-2017-105,02062,098

mllib-local/H25-Apr-2017-5,1383,425

project/H25-Apr-2017-1,9871,592

python/H25-Apr-2017-53,70742,108

repl/H25-Apr-2017-6,1584,066

sbin/H03-May-2022-1,162458

sql/H25-Apr-2017-608,597486,510

streaming/H25-Apr-2017-33,71621,121

tools/H25-Apr-2017-235146

yarn/H25-Apr-2017-10,0116,448

.gitattributesH A D25-Apr-201740 32

.travis.ymlH A D25-Apr-20171.7 KiB5244

CONTRIBUTING.mdH A D25-Apr-2017995 1713

LICENSEH A D25-Apr-201717.4 KiB300251

NOTICEH A D25-Apr-201724.1 KiB662490

README.mdH A D25-Apr-20173.7 KiB10565

appveyor.ymlH A D25-Apr-20171.8 KiB5745

pom.xmlH A D25-Apr-201798.4 KiB2,8182,614

scalastyle-config.xmlH A D25-Apr-201716.7 KiB337208

README.md

1# Apache Spark
2
3Spark is a fast and general cluster computing system for Big Data. It provides
4high-level APIs in Scala, Java, Python, and R, and an optimized engine that
5supports general computation graphs for data analysis. It also supports a
6rich set of higher-level tools including Spark SQL for SQL and DataFrames,
7MLlib for machine learning, GraphX for graph processing,
8and Spark Streaming for stream processing.
9
10<http://spark.apache.org/>
11
12
13## Online Documentation
14
15You can find the latest Spark documentation, including a programming
16guide, on the [project web page](http://spark.apache.org/documentation.html).
17This README file only contains basic setup instructions.
18
19## Building Spark
20
21Spark is built using [Apache Maven](http://maven.apache.org/).
22To build Spark and its example programs, run:
23
24    build/mvn -DskipTests clean package
25
26(You do not need to do this if you downloaded a pre-built package.)
27
28You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
29More detailed documentation is available from the project site, at
30["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
31
32For general development tips, including info on developing Spark using an IDE, see
33[http://spark.apache.org/developer-tools.html](the Useful Developer Tools page).
34
35## Interactive Scala Shell
36
37The easiest way to start using Spark is through the Scala shell:
38
39    ./bin/spark-shell
40
41Try the following command, which should return 1000:
42
43    scala> sc.parallelize(1 to 1000).count()
44
45## Interactive Python Shell
46
47Alternatively, if you prefer Python, you can use the Python shell:
48
49    ./bin/pyspark
50
51And run the following command, which should also return 1000:
52
53    >>> sc.parallelize(range(1000)).count()
54
55## Example Programs
56
57Spark also comes with several sample programs in the `examples` directory.
58To run one of them, use `./bin/run-example <class> [params]`. For example:
59
60    ./bin/run-example SparkPi
61
62will run the Pi example locally.
63
64You can set the MASTER environment variable when running examples to submit
65examples to a cluster. This can be a mesos:// or spark:// URL,
66"yarn" to run on YARN, and "local" to run
67locally with one thread, or "local[N]" to run locally with N threads. You
68can also use an abbreviated class name if the class is in the `examples`
69package. For instance:
70
71    MASTER=spark://host:7077 ./bin/run-example SparkPi
72
73Many of the example programs print usage help if no params are given.
74
75## Running Tests
76
77Testing first requires [building Spark](#building-spark). Once Spark is built, tests
78can be run using:
79
80    ./dev/run-tests
81
82Please see the guidance on how to
83[run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).
84
85## A Note About Hadoop Versions
86
87Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
88storage systems. Because the protocols have changed in different versions of
89Hadoop, you must build Spark against the same version that your cluster runs.
90
91Please refer to the build documentation at
92["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
93for detailed guidance on building for a particular distribution of Hadoop, including
94building for particular Hive and Hive Thriftserver distributions.
95
96## Configuration
97
98Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
99in the online documentation for an overview on how to configure Spark.
100
101## Contributing
102
103Please review the [Contribution to Spark guide](http://spark.apache.org/contributing.html)
104for information on how to get started contributing to the project.
105