In general, 'Big Data' and Data Science are already here in many fields of STEM. This link goes to a good summary article and video from Northwestern University, one of the leaders in Data Science.
It is important to realize that
this type of research requires computer programming
knowledge and skills.
One of the most popular languages presently for data collection and analysis is Python, and is the language many graduate students in research areas recommend learning; there are countless tutorials, YouTube videos, and pieces of information online for Python. The most popular free, online course to learn Python (as well as some other languages) is through Codecademy, although there are many other sources and tutorials online. If you are new to programming, open an account and start a self-paced course in Python!Just click on a topic of interest, and it will take you to a separate post about that specific topic. You will find background information, relevant links to articles, vocabulary, data accessibility methods, videos, and so on. Hopefully you will find enough information to actually be able to do the project!
Online Data Set Topics and Research Questions:
Datasets Galore - Multiple Topics
For datasets on just about any topic you can imagine, check out over 1200 found at https://www.kaggle.com/datasets.
Astrophysics/Astronomy
Examples of student research papers using online astronomical datasets:
- Morphological classification of Post-Starburst Galaxies
- Albedo and Heat Recirculation of Hot Jupiter Exoplanets - Effect of Phase Shifts
- The Effect of the Asteroid Belt on the LISA Mission
- Environment and Variability of XBONGs (type of X-ray galaxy)
NASA Open Data Portal
https://data.nasa.gov/ This is NASA's primary site where we can access numerous datasets from different astrophysical experiments! There are ports to find code, as well. An incredibly rich resource for those interested in doing any type of data-based astrophysical projects! Other specific experimental sites are listed below. Astronomical Experiments and Datasets:
European Space Agency Planetary Science Archive
http://www.rssd.esa.int/index.php?project=PSA Central data repository for ESA missions: currently Giotto,Huygens, Mars Express, Rosetta, SMART-1, and Venus Express, as well as several ground-based cometary observations
Exoplanets.org
http://exoplanets.org/ One of the premiere sites, with up-to-date data on thousands of exosolar planets and candidates.
Global Telescope Network
http://gtn.sonoma.edu/data_reduction/index.php Data reduction site
Global Telescope Network
http://gtn.sonoma.edu/data_reduction/index.php Data reduction site
Kepler databases - the search for extrasolar planets; Kepler Planet Candidate Data Explorer, which is called Planetquest; the main Kepler site
NASA Exoplanet Archive
http://exoplanetarchive.ipac.caltech.edu/ Another site with thousands of datasets for exosolar planets.
NASA Space Science Data Coordinated Archive
Each of the following have numerous links to individual projects/missions:
Sloan Digital Sky Survey (SDSS)
Spitzer Science Center
http://ssc.spitzer.caltech.edu/ space telescope with infrared
Variable Star Data
https://www.aavso.org/zapper
Variable Star Data
https://www.aavso.org/zapper
Zooniverse (general)
Over 40 different citizen science projects, where you can help scientists in a variety of fields make sense of enormous data sets!
https://www.zooniverse.org/
Two other citizen scientist sites are:
Long List of Astronomy Programs with Real Data
http://nitarp.ipac.caltech.edu/page/other_epo_programs
Two other citizen scientist sites are:
Some instructions for accessing certain Astrophysics Datasets:
1. Create Chandra Images from Raw Data
Basic steps:
1. Download .FITS data files either from X-ray or Multi-wavelength images that be found here http://chandra.harvard.edu/photo/openFITS/xray_data.htmland http://chandra.harvard.edu/photo/openFITS/multiwavelength_data.html respectively
2. Download free image editor program from https://www.gimp.org/
3. Create Chandra images from raw data as described in an example here http://chandra.si.edu/photo/openFITS/crab.html
2. X-ray spectroscopy of supernova remnants
Basic steps:
1. Download, install and open ds9 program
2. Download ds9 image data files of supernova remnants
2. Follow some basic analysis instructions described in detail here http://www.chandra.si.edu/edu/formal/snr/ds9.html
3. Classify a supernova event as type Ia or type II the spectra and compare the result with the information in the Photo Album http://chandra.harvard.edu/photo/category/snr.html
3. Galaxy classification and evolution with GalaxyZoo
1. Use the data from https://data.galaxyzoo.org/
2. Galaxy evolution activity http://www.zooteach.org/lessons/3-galaxy-evolution-with-galaxy-zoo
3. Galaxy classification activity http://www.zooteach.org/lessons/62-classifying-galaxies-as-an-early-inquiry-activity-in-an-introductory-honors-physics-class
More on galaxy zoo https://www.galaxyzoo.org/ and https://www.galaxyzoo.org/?_ga=2.24027962.1808757446.1505861686-797462615.1505861686
4.Interpreting data with photometric transits
5. Tracking Jupiter's moons using image processing software to analyze observatory images of Jupiter and its moons
6. Exoplanet transits using telescope images, image processing software, and data from the Internet to determine the size and orbital period of an exoplanet
9. More classroom activities on planet finding
10. More activities based on galaxyzoo
11. More activities on Chandra X-ray observatory
Long List of Astronomy Programs with Real Data
http://nitarp.ipac.caltech.edu/page/other_epo_programs
Genomics/Medical
Concord Consortium has a user-friendly site for NOAA Data Analysis
HIV Databases: Includes Sequence, Vaccine, Immunology databases for HIV, and data for other viruses (Hepatitis C and Hemorrhagic fever). Includes some tools to look at data.
https://www.hiv.lanl.gov/content/index
Human Genome Resources and Databases
https://www.hiv.lanl.gov/content/index
Human Genome Resources and Databases
NIH Cancer Institute
http://www.cancer.gov/research/resources/data-catalog data collections from NCI initiatives
Geoscience & Climate Science
Concord Consortium has a user-friendly site for NOAA Data Analysis
Check out a data portal for NOAA data and an analysis platform for it (CODAP). Instructions are here.
Examples of student research papers using online geoscience datasets:
- Antarctic Sea Ice Fractal Analysis (using fractal dimension as measure of change due to warming)
- Scaling law for Strength and Frequency of High-Energy Storms
https://earthdata.nasa.gov/
Particle Physics
Examples of student research papers using online particle datasets:
- Organizational Properties of Baryonic Decays
The CERN CMS experiment has put an incredibly large dataset online, free for anyone to use. Check this article out for more details.
Social Networks
Examples of student research papers using online datasets of social and professional networks:
- Antarctic Sea Ice Fractal Analysis (using fractal dimension as measure of change due to warming)
- Scaling law for Strength and Frequency of High-Energy Storms
The NOAA site on paleoclimate data:
https://www.ncdc.noaa.gov/data-access/paleoclimatology-data
There is also one for seismic data:
http://ds.iris.edu/ds/nodes/dmc/data/types/events/
And of course, there is always NASA, which holds a lot of earth science data:
http://science.nasa.gov/earth-science/earth-science-data/ https://earthdata.nasa.gov/
There is a journal that specifically publishes earth system science data sets:
http://www.earth-system-science-data.net/
And Stanford compiled a list of earth science data sources:
http://library.stanford.edu/guides/earth-science-data-repositories
EdGCM Climate Modeling site
http://edgcm.columbia.edu/ This is a professional level, but user friendly, simulation for modeling climate change. It is meant to be something high school students can use to model climate into the future. Click here for one example from a student.
USGS Geomagnetism Program
http://edgcm.columbia.edu/ This is a professional level, but user friendly, simulation for modeling climate change. It is meant to be something high school students can use to model climate into the future. Click here for one example from a student.
USGS Geomagnetism Program
NOAA Geomagnetism Site
http://www.ngdc.noaa.gov/geomag/
NOAA: Climate interests
NCEI is the world’s largest provider of weather and climate data. Land-based, marine, model, radar, weather balloon, satellite, and paleoclimatic are just a few of the types of datasets available. Detailed descriptions of the available products and platforms are below.
- These links provide quick access to many of NCEI's climate and weather datasets, products, and various web pages and resources.
- Land-based, or surface, observations include temperature, dew point, relative humidity, precipitation, wind speed and direction, visibility, atmospheric pressure, and types of weather occurrences such as hail, fog, and thunder collected for locations on every continent.
- Geostationary and polar-orbiting satellites provide raw radiance data collected by ground stations to help monitor and predict weather and environmental events.
- An acronym for Radio Detection and Ranging, a radar is an object-detection system that uses radio waves to determine the range, altitude, direction of movement, and speed of objects producing raw data as well as generating analysis products.
- Access to near-real-time, high-volume numerical weather prediction and global climate models and data. Looking into the past, present, and future to assist in the analysis of multidisciplinary datasets and promote interoperable data analysis.
- Weather data from the atmosphere, beginning at three meters above the Earth’s surface. These data are obtained from radiosondes, which are instrument packages tethered to balloons that transmit data back to the receiving station.
- Meteorological data transmitted from ships at sea, moored and drifting buoys, coastal stations, rigs, and platforms. The data may include weather as well as ocean state information.
- Past climate and environmental data, derived from natural sources such as tree rings, ice cores, corals, and ocean and lake sediments, extend the archive of weather and climate back hundreds of millions of years.
- Archive of destructive storm or weather data and information, which includes local, intense, and damaging events such as thunderstorms, hailstorms, and tornadoes. It can also describe more widespread events such as tropical systems, blizzards, nor’easters, and derechos.
- Are there correlations or any discernible relationships between weather patterns in the US and/or Canada and changes in Arctic ice loss? One might look at rainfall, wind speeds, wind direction, jet stream flow, temperatures, snow fall, frequency of storms, drought conditions and lengths, and any other parameters for which there are data.
- Will be working this year to develop some help resources for running a real, professional climate model program...it is a simpler, older simulation, but it should be wonderful for curious high school students! More to come...
Particle Physics
Examples of student research papers using online particle datasets:
- Organizational Properties of Baryonic Decays
The CERN CMS experiment has put an incredibly large dataset online, free for anyone to use. Check this article out for more details.
CERN Open Data
Fermilab - for classes to use as lessons
https://ed.fnal.gov/data/ Social Networks
Examples of student research papers using online datasets of social and professional networks:
Hi,
ReplyDeleteA couple of thoughts:
1. With the wide availability of deep learning technology now available in open source, one area of research is to apply neural networks or other machine learning techniques to find patterns or classifications in these data sets.
2. Here are some more public data sets: https://github.com/caesar0301/awesome-public-datasets
Mark Morris
ETHS 1978
Thank you for this wonderful suggestion, and also a very good listing of datasets! This page is obviously a work in progress, and the hope is a group I am working with at NU will get a NSF grant so we can develop the help and 'how to' resources so anyone who would like to do research on such datasets will be able to, along with specific, possible research questions to pursue (starting off with astrophysics). If you are aware of such resources for neural networks and machine learning, I would love to add them. Thanks again, and Go Kits!
Delete