Exploration of Student Data with Andromeda

Data contains 25 dimensions that describe students in a class. Andromeda is a V2PI version of Weighted Multidimensional Scaling.

Be the Data

“Be the Data” is a physical and immersive approach to visual analytics designed for teaching abstract statistical analysis concepts to students. In particular, it addresses the problem of exploring alternative projections of high-dimensional data points using interactive dimension reduction techniques. In our system, each student literally embodies a data point in a dataset that is visualized in the room of students; coordinates in the room are coordinates in a two-dimensional plane to which the high-dimensional data are projected. Students can explore statistical support for alternative projections by physically moving themselves, and hence their data points, in the space. Thus, students can pose hypotheses and discover hidden structure in the data. Students also gain an understanding of the underlying statistical and algorithmic methods used in data analytics. As an immersive system, it is developed to initiate collaborative and social learning strategies for concepts including, data points, variables, dimension weights, and projections. “Be the Data” utilizes Virginia Tech’s new Cube facility for large-scale tracking of many students in a large physical space and immersive visualization. We regularly conduct “Be the Data” workshops for educational outreach programs. For example, during a AWC women in computer science outreach day, 62 seventh-grade female students explored data about animals. Videos and survey data from the workshop indicate that Be the Data successfully provided an effective and engaging learning experience.

Simulated Data

Simulated data had three clusters, which were not resolved using classical Principal Components Analysis (PCA). A BaVA-tized Probabilistic-PCA algorithm helps us identify the three clusters.

Semi-Supervised Analysis of Gene Data from Yeast

The gene data in this example is high dimensional with 79 attributes (expression levels). Part of the data is annotated based on gene function (proteins). A BaVA-tized Probabilistic-PCA approach uses this partial information to identify clusters.

Education Data with Brush Tool

A BaVA-tized Probabilistic-PCA example exploring clustering in state level data of SAT scores. There are eight attributes that are used for clustering. The Brush tool was utilized to help identify data points that are probably different in the high dimensional feature space.