Using the user guide

[1]:

import pandas as pd

import holoviews as hv
hv.extension('bokeh')

import bokeh.io
bokeh.io.output_notebook()

Loading BokehJS ...

In the following pages, we demonstrate usage of four of the five modules of the bebi103 package, omitting the image module (which will eventually be added to the documentation; image processing is not featured in the 2020 edition of the BE/Bi 103 a course).

We will demonstrate usage of the hv, bootstrap and stan modules in following pages. The viz module is always used in the context of statistical analysis, so its contents will be demonstrated within the bootstrap and stan sections.

In all graphics displays that use HoloViews, plots are rendered using Bokeh, like bokeh.io.show(hv.render(holoviews_graphic)). This is due to some issues with how Sphinx (which generates the documentation) renders HoloViews plots.

Sample data

In all of the demonstrations, we will use the same sample data set of \(x,y\) values taken from three different trials. The sample data were numerically generated using the following hierarchical generative model.

\begin{align} &n = (20, 25, 18)^\mathsf{T},\\[1em] &\theta = (3, 7)^\mathsf{T},\\[1em] &\mathsf{T} = \begin{pmatrix}1 & 0 \\ 0 & 16\end{pmatrix}, \\[1em] &\sigma = (2, 3)^\mathsf{T},\\[1em] &\rho = 0.6,\\[1em] &\mathsf{\Sigma} = \begin{pmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{pmatrix}, \\[1em] &\theta_{1, i} \sim \text{Norm}(\theta, \mathsf{T}) \;\forall\,i\in\{1, 2, 3\},\\[1em] & \begin{pmatrix} x_{i, j} \\ y_{i,j} \end{pmatrix} \sim \text{Norm}(\theta_i, \mathsf{\Sigma})\;\forall\,i \in \{1, 2, 3\}, \; j \in \{1, \ldots, n_i \}. \end{align}

Unless doing Bayesian modeling with Stan, for simplicity we will ignore the hierarchical nature of the data set.

To get familiar with the data set, let’s load it in and take a look.

[2]:

df = pd.read_csv('sample_data.csv')

df

[2]:

	x	y	trial
0	1.762886	11.955616	1
1	4.364957	11.136633	1
2	3.457626	12.301267	1
3	-0.839319	10.401899	1
4	4.694602	11.925334	1
...	...	...	...
58	0.799388	3.618826	3
59	-0.011088	4.408958	3
60	3.195386	11.213607	3
61	0.720947	6.987121	3
62	4.811848	8.647437	3

63 rows × 3 columns

And we will also take a look at a scatter plot of the \(x, y\) values colored by trial number.

[3]:

plot = (
    hv.Points(data=df, kdims=["x", "y"], vdims="trial")
    .groupby("trial")
    .overlay()
    .opts(
        frame_height=200,
        frame_width=200,
        legend_offset=(10, 60),
        legend_position="right",
        show_grid=True,
        toolbar="above",
    )
)

bokeh.io.show(hv.render(plot))

For much of the analysis, we will consider univariate x-data. It is useful to visualize the x-data as an ECDF.

[4]:

df["ECDF"] = df["x"].rank(method="first") / len(df)

ecdf_plot = (
    hv.Scatter(df, kdims="x", vdims=["ECDF", "trial"],)
    .groupby("trial")
    .overlay()
    .opts(
        frame_height=200,
        frame_width=300,
        legend_offset=(10, 60),
        legend_position="right",
        show_grid=True,
        toolbar="above",
    )
)

bokeh.io.show(hv.render(ecdf_plot))