Using the user guide
[1]:
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import bokeh.io
bokeh.io.output_notebook()
In the following pages, we demonstrate usage of four of the five modules of the bebi103 package, omitting the image
module (which will eventually be added to the documentation; image processing is not featured in the 2020 edition of the BE/Bi 103 a course).
We will demonstrate usage of the hv
, bootstrap
and stan
modules in following pages. The viz
module is always used in the context of statistical analysis, so its contents will be demonstrated within the bootstrap
and stan
sections.
In all graphics displays that use HoloViews, plots are rendered using Bokeh, like bokeh.io.show(hv.render(holoviews_graphic))
. This is due to some issues with how Sphinx (which generates the documentation) renders HoloViews plots.
Sample data
In all of the demonstrations, we will use the same sample data set of \(x,y\) values taken from three different trials. The sample data were numerically generated using the following hierarchical generative model.
\begin{align} &n = (20, 25, 18)^\mathsf{T},\\[1em] &\theta = (3, 7)^\mathsf{T},\\[1em] &\mathsf{T} = \begin{pmatrix}1 & 0 \\ 0 & 16\end{pmatrix}, \\[1em] &\sigma = (2, 3)^\mathsf{T},\\[1em] &\rho = 0.6,\\[1em] &\mathsf{\Sigma} = \begin{pmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{pmatrix}, \\[1em] &\theta_{1, i} \sim \text{Norm}(\theta, \mathsf{T}) \;\forall\,i\in\{1, 2, 3\},\\[1em] & \begin{pmatrix} x_{i, j} \\ y_{i,j} \end{pmatrix} \sim \text{Norm}(\theta_i, \mathsf{\Sigma})\;\forall\,i \in \{1, 2, 3\}, \; j \in \{1, \ldots, n_i \}. \end{align}
Unless doing Bayesian modeling with Stan, for simplicity we will ignore the hierarchical nature of the data set.
To get familiar with the data set, let’s load it in and take a look.
[2]:
df = pd.read_csv('sample_data.csv')
df
[2]:
x | y | trial | |
---|---|---|---|
0 | 1.762886 | 11.955616 | 1 |
1 | 4.364957 | 11.136633 | 1 |
2 | 3.457626 | 12.301267 | 1 |
3 | -0.839319 | 10.401899 | 1 |
4 | 4.694602 | 11.925334 | 1 |
... | ... | ... | ... |
58 | 0.799388 | 3.618826 | 3 |
59 | -0.011088 | 4.408958 | 3 |
60 | 3.195386 | 11.213607 | 3 |
61 | 0.720947 | 6.987121 | 3 |
62 | 4.811848 | 8.647437 | 3 |
63 rows × 3 columns
And we will also take a look at a scatter plot of the \(x, y\) values colored by trial number.
[3]:
plot = (
hv.Points(data=df, kdims=["x", "y"], vdims="trial")
.groupby("trial")
.overlay()
.opts(
frame_height=200,
frame_width=200,
legend_offset=(10, 60),
legend_position="right",
show_grid=True,
toolbar="above",
)
)
bokeh.io.show(hv.render(plot))
For much of the analysis, we will consider univariate x-data. It is useful to visualize the x-data as an ECDF.
[4]:
df["ECDF"] = df["x"].rank(method="first") / len(df)
ecdf_plot = (
hv.Scatter(df, kdims="x", vdims=["ECDF", "trial"],)
.groupby("trial")
.overlay()
.opts(
frame_height=200,
frame_width=300,
legend_offset=(10, 60),
legend_position="right",
show_grid=True,
toolbar="above",
)
)
bokeh.io.show(hv.render(ecdf_plot))