Project: Data Architecture Management Services
The Challenge
Within the wildland fire community, data and information supports activities that occur before, during and after a wildfire. Planning, preparedness, mitigation and education activities target reducing the risk of wildfire. Wildfire response activities address public and firefighter safety and land management objectives. Post fire activities target rehabilitation of ecosystems and reducing risk of flooding and erosion. All these activities cross land and agency boundaries. Sharing data and information during each stage promotes efficiency and effectiveness.
Sharing data and information has traditionally been accomplished through phone calls and meetings, but the community has a growing dependence on computer applications and technology.
Wildland fire is subject to the same massive increase in data that everyone else is experiencing. Data is available from satellites, sensors on aircraft, vehicles and firefighters, models and algorithms to predict weather and fire behavior, decision support tools, and as a product of program activities. The data must be synthesized to produce dashboards and reports that inform the public, the wildland fire community members, and our local, state, and federal governments who provide funding for the programs.
The wildland fire interagency collective is enormous. Today, wildland fire partners include:
Five federal land management agencies: USDA Forest Service (FS), the Bureau of Land Management (BLM), the National Park Service (NPS), the Fish and Wildlife Service (FWS) and the Bureau of Indian Affairs (BIA).
59 states and territories represented by the National Association of State Foresters
Over 3000 county and local governments, represented by the International Association of Fire Chiefs
Over 600 tribal entities, represented by the Intertribal Timber Council
Other federal agencies like Department of Defense, Department of Commerce, National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA), US Geological Survey (USGS), and others
Public and private sector industry
Non-profit organizations
Within the Wildland Fire community, there is a need to establish a Data Architecture for documenting, defining and managing data to support activities across the landscape and agency programs.
The Solution
The Interagency Data Management Environment (IDME), formerly known as the Data Cache, is the name of the interagency enterprise data architecture (EDA) that will address current and future data management needs mentioned above. To support the wildland fire enterprise, the IDME needs to support the following characteristics:
Built for end-users to consume. Enables end users to determine what data they need for business decisions and data architects to design data access that delivers what they need.
Automated with data pipelines and data flows. Automate data management processes so data flows smoothly and freely when and where it is needed in the organization while maintaining data governance. Data integration design (via the EDA) is key to making sure that every part of the whole connects.
Scalable to meet unpredictable demands. Data needs to scale up and down, automatically and affordably to meet user demands.
Shareable for trusted collaboration. Shared data is critical to ensure that everyone works from the same data source of truth regardless of agency affiliation. This is also critical as we look to implement edge computing and asynchronous data collection.
Secure by design. Modern data architecture ensures data security with controlled data access and authorization, as well as compliance with DOI and USDA OCIO requirements.
Integrates Geospatial Data. Wildland fire is inherently spatial in nature and historically, even at departmental levels, our geospatial data has been managed separately from all other data.
Curated by Artificial Intelligence/Machine Learning (AI/ML). To harness the power of AI and ML to automate data processing, recognize new data types, cleanse data, fix data quality issues, perform data mining, ensure data standards are maintained, and surface data analytics and insights requires conscious design and implementation of an EDA.
Our Role
In September 2022, the Department of Interior’s Office of Wildland Fire selected Darkhorse Geospatial as the prime contractor to lead the IDME design and three proof of concepts.
PoC #1 Field Sample (fuel moisture)was focused on providing Fuel Moisture sample data from a stand alone server to the wildland fire community with a series of real-time dashboards depicting current fuel moisture to historical values.
POC #2 focused on delivering reference data from the wildland fire Enterprise Data Governance tool to the business users so multiple applications can reference the same authoritative reference data.
POC#3 Fire Occurrence Analytics 1.0 focused on:
1. 5 year & 10 year averages compared to multiple date options (maybe a slider). Examples:
Year to date (YTD) count of incidents
Entire month (for example - the entire month of June)
Current month fires so far
2. Incidents by GACC, Dispatch Boundaries, and/or Jurisdiction using he authoritative services for GACC , Dispatch Boundaries & Jurisdictional Unit Polygons on the NIFC Open Data Site.
3. Fuel moisture compared with Fire Occurrence - (highlighting low moisture fuels within close proximity (not sure what the number would be) to fires at the same time)
4. Data with lat/lon errors (for the purpose of QA/QCing data)
5. Showcase the new time functionality
Our Work
The data architecture for the IDME consists of three major components: data areas, data processing zones, and data platforms.
Data Areas. A data architecture consists of three major layers:
Data ingestion. Connects to source systems, extract relevant data, and moves it to the analytical environment.
Data transformation. Transforms, aggregates, and integrates source data and models it for easy access and fast queries.
Data access. Tools and services that enable business users or applications to query the transformed data.
Data Processing Zones. A data architecture consists of multiple zones of data processing that transform data from its raw state in a source system to its target state. The number of zones required to process data depends on the complexity and diversity of source data and the business requirements for the consumption of that data.
Typical zones are:
Landing Zone: A place to land and permanently store raw data from various sources
Refined Zone: Validates & cleans raw data; integrates like data into subject areas (i.e., customer). The building blocks of higher-level datasets.
Trusted Zone: This is data from the refined zone that is put through an extra layer of validations and checks before it is exposed to developers and business users.
Analytic Zone: Subject-area data from the refined zone is modeled to support specific use cases and applications (i.e., a data mart)
Specialized zones that support niche use cases are:
Integration Zone: Also called real-time delivery zone, this zone pulls data from the landing zone and other zones in near real-time to support operational use cases. (i.e., an operational data store)
Formatted Zone: This pulls data from other zones and standardizes formats, e.g., merges y/n, 1/2.
Discovery Zone: A place where authorized users can run queries and models against enterprise data to test hypotheses and run experiments.
Data Platforms. The processing zones run on a data platform, which can reside on-premises or in the cloud. Most data architectures run on a single data platform, but they can run multiple data platforms, and in rare cases, they can span multiple clouds or cloud and on-premises environments. A data platform is a commercial product that typically stores and processes large volumes of analytical data.
The data platforms used in a data architecture depend on the nature of data processing and the availability of systems within a customer’s data environment. Organizations that implement data platforms to support multiple zones of processing refer to them as a “data warehouse”, a “data hub”, a “data lake”, a “data lakehouse”, or a “data refinery”. Most organizations are modernizing their data architectures by migrating them from on-premises data platforms to cloud data platforms, such as Snowflake or Databricks.
A data architecture consists of data models that align with the data consumer’s needs, a target data platform that runs those data models, and the flow of data from one or more source systems to the target environment. This data architecture is easy to work with, fast to update, and quick to enhance.
Our design also includes a data marketplace where consumers can discover, evaluate, and access data assets, products, services, and applications that they need.
Our Results
Our team presented the IDME architecture and the first two proof of concepts during the Annual Spring Data Summit in College Station, Texas in April of 2022 to an overwhelmingly supportive National Wildfire Coordinating Group (NWCG) Data Management Committee and Wildland Fire Information & Technology (WFIT) Data Management Program. We plan to continue to develop the IDME working with different subject areas and applications to onboard critical data into the IDME pipeline.