Friday, October 12, 2012: 8:00 PM
6C/6E (WSCC)
Scientific projects often require data that scale beyond geographic and discipline-specific boundaries, as well as organization life spans. Scientists need to be able to assess data for reuse in their projects. A framework is presented that supports scientists who produce data in using formal knowledge representation technologies to document their processes of data collection and transformation and leverages the documentation to record provenance of generated data products. Provenance provides evidence of how, what, and who was involved in creating a data product, as well as when it happened. In science, provenance of data is useful to assess the credibility of data and support decisions about its appropriateness for reuse. Additionally, formal knowledge representation technologies support computing tasks that require reasoning capabilities, allowing scientists who want to reuse data from others to use computers to evaluate data. The framework has been used in multiple projects from the geosciences and environmental sciences domains, and it is currently being integrated into larger data management efforts for big-data science projects.