Thursday, 1 February 2024: 9:30 AM
336 (The Baltimore Convention Center)
Jessica Cristina dos Santos Souza, Texas Tech
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
The Earth sciences have become substantially data-driven with an unprecedented rise in raw unprocessed data. This is aligned with the significant increase in the use of machine learning models in the discipline. This project is an initiative to provide new datasets intended for machine learning applications using Unidata’s THREDDS Data Server (TDS). When working with machine learning models, significant preprocessing of the data is required to improve the performance of the model, which usually involves scaling the dataset. The goal is to perform dataset preprocessing, in this case re-scaling, before access by the users. We selected two common types of re-scaling, standardization and normalization, for implementation. Through the NetCDF Markup Language (NcML), the TDS processes the dataset with the specified re-scaling and creates a virtual dataset that is returned to the user without altering the original data or requiring additional disk usage. The initial datasets chosen for preprocessing on the THEDDS test server are forecast (GFS) and satellite (GOES 18) data, due to their frequent use in Earth sciences with artificial intelligence methods. In addition to providing access to the preprocessed datasets on the test server, we include Jupyter notebooks for visualization of the preprocessed dataset. Furthermore, tests on the performance difference with the use of the re-scaling are also evaluated. Future work will assess the extension of this preprocessing to more datasets relevant to users targeting machine learning applications while optimizing performance.


See more of: Cloud-Based User Services to Support Data Use in the User Community
See more of: 40th Conference on Environmental Information Processing Technologies
See more of: 40th Conference on Environmental Information Processing Technologies
<< Previous Abstract
|
Next Abstract