I am a second year undergraduate at The University of Texas at Austin.
I am working as a summer REU student at the Clemson University Visualization Lab, working on a project entitled “Visualization as the Interface for Machine Learning”.
Here, I will maintain a brief weekly blog describing my research intern experience. I will document both the events we’ve attended as a group, as well as the progress I’ve made on my individual project.
While Virtual Reality (VR) research has a long history, VR technology has only become commonplace very recently. As a result, little work has been done to describe and classify VR users. Specifically, while some users struggle to adapt to a VR environment, others adapt with relative ease. I will use clustering algorithms to group user data into discrete groups, with the aim of improving understanding of human interaction with VR systems.
Such machine learning techniques are rapidly gaining in popularity, but their high dimensionality and black-box nature make human comprehension and understanding of machine learning outputs difficult, diminishing the usefulness of such techniques. Dimensionality reduction algorithms, such as Sammon Mapping, seek to address this problem in different ways.
In this project, I will apply machine learning algorithms to the previously described clustering and subsequent visualization problems in VR research.
I am working with several phenomenal members of the Clemson faculty: Dr. Wole Oyekoya is our REU mentor and supervisor, and Dr. Jerome McClendon and Dr. Andrew Robb are my faculty mentors for this project.
Week 1: Preparation
This week, we toured the campus and attended lectures focused on preparing us for presenting and publishing research work. I became familiar with Adobe Creative Cloud products, specifically the video-editing software Premiere Pro, as well as with LaTeX typesetting software, bundled through the MiKTeX distribution.
This week, my work was centered on setting up my development environment and getting the hang of the tools I’ll use for the remainder of the research project. I tried out the HTC VIVE VR system, produced a simple game using Unity, started working with C#, and explored various options in Axis Neuron, the software we will use for motion capture.
Next week, I hope to have finished the data pipeline from our raw motion capture data to the data processing environment (Perception Neuron -> Axis Neuron -> Unity -> C# -> CSV (tentatively) -> Python 3) , and to begin analysis on preliminary data. I’m also looking forward to checking out the Palmetto Cluster, Clemson’s high-performance computing environment, on Tuesday!
Week 2: Development
Events this week were focused on two areas: research computing and visualization tools. Early in the week, we reviewed BASH and then learned how to access the Palmetto Cluster. In the latter portion, we studied several visualization tools, including scientific visualization software, data visualization tools , information analytics tools, and GIS systems.
I completed my goal of setting up a data pipeline and formatting. The project repo is also now online, so feel free to git-checkout! I also studied the theory and implementation of several machine learning algorithms for cluster analysis on our newly collected data. Finally, I’ve written most of the introduction for my project’s final report, and have become more comfortable working with LaTeX.
Next week, I hope to begin data collection with real people. I also hope to select a dimensionality reduction method for the final portion of the project.
Week 3: Development Cont.
Our morning events this week were varied in nature. Early in the week, we were invited to attend Python workshops, targeted at students without Python experience. On Wednesday, Joe James, a previous REU student and Clemson Mechanical Engineering graduate who is serving as our logistics coordinator presented his past work in Unity VR programming. As the week came to a close, we learned to use IPUMS, a powerful census and survey database for social science research, and ArcGIS, a commonly used GIS software package.
While I had initially intended to spend this week on data capture, I spent most of this week working on dimensionality reduction instead.
A quick primer – dimensionality reduction is the problem of reducing data from a high dimensional space to a lower dimensional space, and is generally accomplished through feature selection or feature extraction. They serve two important purposes: (1)creating data that machine learning algorithms can consume and (2)visualizing processes in high dimensional space.
Specifically, I’ve been implementing several dimensionality reduction algorithms, including Principal Component Analysis (PCA), Sammon Mapping, and t-Distributed Stochastic Neighbor Embedding (t-SNE). You can check out this part of the project here.
The change in plans was necessary because of the high dimensions of my motion capture dataset. Working on dimensionality reduction earlier also keeps the scope of the project under control and increases the chance that I have useful results by the end of the summer.
I also spent a few days working on report formatting and contents in LaTeX.
I’ll continue working on studying and implementing dimensionality reduction techniques into next week. By the end of next week, I hope to have implemented the three algorithms described above on motion capture data to see which technique works best for my specific use-case.
Week 4: Gearing up for Presentations
Our morning workshops this week were minimal – I attended a couple of R workshops, but that was about it.
I spent much of this week continuing my work with dimensionality reduction. I learned how to use some of the many machine learning libraries in Python, so I could stop implementing everything from scratch. I also spent some time preparing for our midterm presentation.
My goal for next week is to have working t-SNE, PCA, and Sammon’s Mapping models on real motion capture data – at the moment, my models have only been successful on toy datasets.
Week 5: Midterm Presentation
Because of the July 4th holiday, this was a short week – work only began on Wednesday, when we had our midterm presentations. After that, we didn’t have too many of our normal events, though we did meet as a group to discuss our projects and progress.
While this was a shorter week, I was able to make a lot of progress. I implemented t-SNE on motion capture data, did a lot of data refining and reshaping, and then implemented it a few more times with different results. Here’s my most recent result, a t-distributed stochastic neighbor embedding projection of 700 or so frames (recorded at about 30 fps) of a tai chi master.
You can see how the points located closely together in time (points with the same color) are generally clustered together in the mapping.
For next week, I need to learn and implement a few more algorithms – specifically, I’m looking at growing neural gas and one of its variants, a self organizing neural network.
Week 6: The Final Push Begins!
We began this week with a tour of Clemson’s data center and the Palmetto cluster. The sheer scale of the university’s cyber-infrastructure was impressive, and it was cool to see how, at that scale, even relatively simple issues can require creative and robust solutions (substituting wiring for ducts filled with copper plates, for instance).
I made a significant amount of progress in two areas this week.
First, I completed most of the proof-of-concept experiments I’ve been running for analyzing and visualizing motion capture data.
Second, Dr. Robb and I completed the remaining work for the final experimental protocol.
While many of the experiments of this project will be completed after my REU work, I do need to produce a final report and presentation, so next week, I’ll spend most of my time working on that. On a related note, our REU will also be attending several sessions in the coming weeks about presenting with power, etc.
In addition, now that we are ready to conduct trials with participants, the methods and software I’ve used for visualization and analysis needs to be updated to work with the new data formats, so I’ll spend a lot of time on that as well.
Finally, the phenomenal Tania Roy, a PhD student in Human Centered Computing, is going to present on her dissertation work, so I’ll be there, too. Tania and the rest of the graduate students I’ve met this summer have really helped make me feel at home here at Clemson.
Week 7: It’s Crunch Time!
This week, our events were focused on providing us with skills to succeed in academia. We worked on elevator speeches, management styles, networking techniques, and understanding the power of presence.
These workshops, while different from the technical training we had received earlier, were still valuable, especially for our group of fairly quiet, unassuming computer science research interns.
Sadly, I fell sick on Sunday, so my progress has really slowed.
However, I did conduct a pilot trial, as well as two full trials with participants! I’ve had redo large parts of the system I’ve developed over the past summer to account for all of the modifications and updates that Dr. Robb and I have put into place while iterating on the original experiment protocol.
After making those changes, I’ve now started working on re-implementing all of my past proof-of-concept experiments on the new production data.
Finally, I’ve also continued to work on both the final report, as well as the presentation and poster for the project.
Next week is our last week, so I’ll be working hard to finish up. By the end of next week, I hope to have finished:
- Collecting more production data
- Refining and processing production data
- Visualizing and analyzing that data
- Complete the final report
- Complete the project poster
- Prepare for the final presentation
Week 8: Wrapping Up
This was our last week at Clemson, and it was a rough one!
I completed several more trials for my experiments, analyzed that data, and produced a paper, a poster, and a final presentation. We presented at the Visualization Symposium, organized by Dr. Wole, and the presentations went off well. I’m looking forward to extending the work I’ve done while in Clemson this summer, and hope to produce at least one publication.