top of page
hst_helix_nebula.tiff

KOI

Robert O'Dell, Kerry P. Handron (Rice University, Houston, Texas) and NASA

Purpose of the Project

The purpose of creating this project was to demonstrate machine-learning topics and models for UCSB's PSTAT 131 course, Introduction to Statistical Machine Learning, taught by Dr. Katie Coburn.

This may be useful because there are more than 9,000 potential exoplanets in NASA's data base. To be able to accurately separate KOIs as candidates, confirmed, or false positives for the purpose of diverting more attention and resources to planets that are more promising will greatly save researchers time.

Content

  • Introduction: Explains KOI and why the report is useful. Additionally, there is some required data wrangling and clean up.

  • Exploratory Data Analysis: This section features some data exploration.

  • Pre-Model Set-up: Creation of recipe, as well as training/testing data splits and K-fold cross validation set-up.

  • PCA: This section was not as fleshed out, due to a lack of necessity and appropriateness for PCA due to the loss of interpretability, but was included for demonstration.

  • Models: All models used were K-Nearest Neighbors, Decision Trees, Boosted Trees, Random Forests. Use of parameter tuning with the help of cross-validation. Most workflows used were inspired by a popular Tidymodels framework.

  • Best Model: Additional Exploration of the best model and other metrics to test.

  • Conclusion.

Authors and Work Distribution

TJ Sipin:

  • Introduction

  • Exploratory Data Analysis: Correlation

  • Pre-Model Set-up

  • PCA

  • Models: KNN and Random Forests

  • Best Model

  • Conclusion

Preeti Kulkarni:

  • Introduction

  • Exploratory Data Analysis: General summary of data set; Graphs of univariate, multivariate relationships between outcome and predictor(s) or between predictors; Histograms; QQ Plot

  • Models: Tree and Boosted Tree

  • Conclusion

Golden Gate Bridge

Coastal Water Productivity

Purpose of the Project

The purpose of creating this project was to investigate the relationships between nutrient and primary production levels in a broad exploratory analysis of a data set by the EPA's National Aquatic Resource Surveys.

Questions Being Answered

  1. What is the apparent relationship between nutrient availability and productivity?

  2. Are there any notable differences in available nutrients among U.S. coastal regions?

  3. Based on the 2010 data, does productivity seem to vary geographically in some way? 

  4. How does primary productivity in California coastal waters change seasonally in 2010, if at all?

  5. What's the relationship between water depth and productivity?

Skills Assessed

  • Tidying data sets using pandas

  • Creating appropriate and engaging visuals in Altair

  • Drawing conclusions based on statistical evidence and background research

bottom of page