
上QQ阅读APP看书,第一时间看更新
How it works...
Our goal in this section is to integrate Spark directly into a Jupyter notebook so that we are not doing our development at the terminal and instead utilizing the benefits of developing within a notebook. This section explains how the Spark integration within a Jupyter notebook takes place.
- We will create a command function, sparknotebook, that we can call from the terminal to open up a Spark session through Jupyter notebooks from the Anaconda installation. This requires two settings to be set in the .bashrc file:
- PySpark Python be set to python 3
- PySpark driver for python to be set to Jupyter
- The sparknotebook function can now be accessed directly from the terminal by executing the following command:
$ sparknotebook
- The function should then initiate a brand new Jupyter notebook session through the default web browser. A new Python script within Jupyter notebooks with a .ipynb extension can be created by clicking on the New button on the right-hand side and by selecting Python 3 under Notebook: as seen in the following screenshot:

- Once again, just as was done at the terminal level for Spark, a simple script of sc will be executed within the notebook to confirm that Spark is up and running through Jupyter:

- Ideally, the Version, Master, and AppName should be identical to the earlier output when sc was executed at the terminal. If this is the case, then PySpark has been successfully installed and configured to work with Jupyter notebooks.