Using the user guide


[1]:
import pandas as pd

import holoviews as hv
hv.extension('bokeh')

import bokeh.io
bokeh.io.output_notebook()
Loading BokehJS ...

In the following pages, we demonstrate usage of four of the five modules of the bebi103 package, omitting the image module (which will eventually be added to the documentation; image processing is not featured in the 2020 edition of the BE/Bi 103 a course).

We will demonstrate usage of the hv, bootstrap and stan modules in following pages. The viz module is always used in the context of statistical analysis, so its contents will be demonstrated within the bootstrap and stan sections.

In all graphics displays that use HoloViews, plots are rendered using Bokeh, like bokeh.io.show(hv.render(holoviews_graphic)). This is due to some issues with how Sphinx (which generates the documentation) renders HoloViews plots.

Sample data

In all of the demonstrations, we will use the same sample data set of \(x,y\) values taken from three different trials. The sample data were numerically generated using the following hierarchical generative model.

\begin{align} &n = (20, 25, 18)^\mathsf{T},\\[1em] &\theta = (3, 7)^\mathsf{T},\\[1em] &\mathsf{T} = \begin{pmatrix}1 & 0 \\ 0 & 16\end{pmatrix}, \\[1em] &\sigma = (2, 3)^\mathsf{T},\\[1em] &\rho = 0.6,\\[1em] &\mathsf{\Sigma} = \begin{pmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{pmatrix}, \\[1em] &\theta_{1, i} \sim \text{Norm}(\theta, \mathsf{T}) \;\forall\,i\in\{1, 2, 3\},\\[1em] & \begin{pmatrix} x_{i, j} \\ y_{i,j} \end{pmatrix} \sim \text{Norm}(\theta_i, \mathsf{\Sigma})\;\forall\,i \in \{1, 2, 3\}, \; j \in \{1, \ldots, n_i \}. \end{align}

Unless doing Bayesian modeling with Stan, for simplicity we will ignore the hierarchical nature of the data set.

To get familiar with the data set, let’s load it in and take a look.

[2]:
df = pd.read_csv('sample_data.csv')

df
[2]:
x y trial
0 1.762886 11.955616 1
1 4.364957 11.136633 1
2 3.457626 12.301267 1
3 -0.839319 10.401899 1
4 4.694602 11.925334 1
... ... ... ...
58 0.799388 3.618826 3
59 -0.011088 4.408958 3
60 3.195386 11.213607 3
61 0.720947 6.987121 3
62 4.811848 8.647437 3

63 rows × 3 columns

And we will also take a look at a scatter plot of the \(x, y\) values colored by trial number.

[3]:
plot = (
    hv.Points(data=df, kdims=["x", "y"], vdims="trial")
    .groupby("trial")
    .overlay()
    .opts(
        frame_height=200,
        frame_width=200,
        legend_offset=(10, 60),
        legend_position="right",
        show_grid=True,
        toolbar="above",
    )
)

bokeh.io.show(hv.render(plot))

For much of the analysis, we will consider univariate x-data. It is useful to visualize the x-data as an ECDF.

[4]:
df["ECDF"] = df["x"].rank(method="first") / len(df)

ecdf_plot = (
    hv.Scatter(df, kdims="x", vdims=["ECDF", "trial"],)
    .groupby("trial")
    .overlay()
    .opts(
        frame_height=200,
        frame_width=300,
        legend_offset=(10, 60),
        legend_position="right",
        show_grid=True,
        toolbar="above",
    )
)

bokeh.io.show(hv.render(ecdf_plot))