Skip to content

Data Ingestion Overview

MIRROR supports the ingestion of diverse data types critical for building geospatial digital twins. Whether your data is spatial, tabular, or semantic, our platform provides robust CLI tools to handle ingestion seamlessly into the system’s database.


βœ… Supported Data Types

We currently support ingestion of:

Data Type Refined Description
Graph Data (AGE) Node-edge models representing relationships, ingested into Apache AGE (PostgreSQL-based graph DB)
IFC CSVs Building and infrastructure models exported from BIM tools into structured CSVs
GIS Vector Formats Geospatial vector data formats: Shapefile (.shp), GeoJSON, GPKG, etc.
Raster Data Georeferenced imagery formats such as TIFF/GeoTIFF, used for elevation, satellite, heatmaps
Tabular CSVs Generic tabular data (e.g., projects, tasks, budgets) with or without date fields
Glossary (CSV) Searchable glossary of domain-specific terms and acronyms

Folder Structure Convention

All data must be placed under the root-level folder: ./ingestion_data/.


πŸš€ Ingestion Commands

Each section below provides copy-paste-ready CLI commands for ingesting data into the platform.


πŸ–ΌοΈ Ingesting Raster Data (GeoTIFF, TIFF)

Raster datasets are pixel-based images where each pixel holds a value β€” such as elevation, temperature, or land classification. Twinspace Mirror supports raster ingestion for overlaying high-resolution datasets on the map (e.g., heatmaps, satellite imagery, terrain).


Folder & ZIP Structure

Zip all raster files into one archive before ingesting:

Example Folder Structure

./ingestion_data/
    └── {SITE-NAME}/
        └── ingestion/
            └── rasters/
                └── rasters.zip/
                    β”œβ”€β”€ elevation.tif
                    β”œβ”€β”€ flood_zones.tif
                    └── satellite_view.tif
Replace SITE-NAME with your own site name (e.g., citywest_data, navy_data, etc.) as needed.

🌍 Ingesting GIS Vector Format Data (Shapefile, GeoJSON, GPKG, ZIP)

Mirror supports ingestion of standard spatial vector formats via CLI. These include:

  • Shapefile (.shp)
  • GeoJSON (.geojson, .json)
  • GPKG (.gpkg)
  • ZIP archives containing one or more supported files

Shapefiles are a classic GIS vector format consisting of .shp (geometry), .shx (index), .dbf (attributes), and optionally .prj (projection). All components must be zipped before ingestion.

./ingestion_data/
  └── {SITE-NAME}/
      └── ingestion/
            └── spatial/
                └── buildings_data.zip
                    β”œβ”€β”€ buildings.shp
                    β”œβ”€β”€ buildings.shx
                    β”œβ”€β”€ buildings.dbf
                    └── buildings.prj
Docker command to Ingest Vector Data ZIP.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest.py \
-i /ingestion_data/yap_data/ingestion/spatial/buildings_data.zip -nln ata_ --overwrite

πŸ“¦ Ingesting GPKG (GeoPackage) Files

GeoPackage (GPKG) is a modern, open standard format that stores both vector and raster spatial data. It’s more efficient and portable than Shapefiles, and it can hold multiple layers (tables) in a single file.

Twinspace Mirror supports ingestion of GPKG files using the same CLI pattern. Each layer inside the GPKG will be handled individually and converted into corresponding spatial tables.


πŸ“ Example Folder Structure

./ingestion_data/
  └── {SITE-NAME}/
      └── ingestion/
            └── spatial/
                └── buildings_data.zip
                    └── city_buildings.gpkg
Docker command to Ingest GPKG.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest.py \
-i /ingestion_data/yap_data/ingestion/spatial/buildings_data.zip -nln ata_ --overwrite

🧠 Ingesting Knowledge Graphs (Nodes + Edges)

Twinspace Mirror's infrastructure natively supports graph database via Apache AGE and provides CLI tooling to ingest nodes and edges from CSV files to build a knowledge graph representation of your environment (e.g., buildings, assets, relationships, flows).

To support multi-site deployments, you can organize data by site name.

Example Folder Structure

./ingestion_data/
    └── {SITE-NAME}/
        └── ingestion/
            └── graphs/
                β”œβ”€β”€ nodes.csv
                └── edges.csv
Replace SITE-NAME with your own site name (e.g., citywest_data, navy_data, etc.) as needed.

Docker command to Ingest Raster Data.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest.py \
-i /ingestion_data/yap_data/ingestion/rasters/rasters.zip -nln ata_
Flag Description
-i Path to the zipped raster files
-nln New Layer Name prefix (ata_) β€” each file gets its own table/layer
--overwrite (Optional) Use if replacing existing layers

GeoTIFF Requirement

Make sure your raster files are in GeoTIFF format (with embedded geospatial reference) for proper rendering and querying.

πŸš€ Graph Ingestion Command

Once your nodes.csv and edges.csv files are placed inside the appropriate site’s graphs/ folder, run the following command:

Docker command to ingest graph data.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest_age.py \
-i /ingestion_data/yap_data/graphs \
-nln zip_alb_ \
--overwrite \
-kgn ntrp
Flag Description
-i Path to the folder containing your node and edge CSVs
-nln Prefix used when creating new tables (e.g., zip_alb_nodes, zip_alb_edges)
--overwrite Overwrite existing tables if they already exist
-kgn Graph name used internally in the AGE database (e.g., ntrp)

πŸ—οΈ Ingesting IFC Data (Building Models in CSV Format)

IFC (Industry Foundation Classes) is an open standard for representing building and infrastructure data. It is widely used in Building Information Modeling (BIM) workflows to capture spatial, structural, and semantic aspects of built environments.

While IFC data is typically stored in .ifc or .ifcxml formats, Twinspace Mirror supports direct ingestion of IFC-exported CSV files, enabling lightweight and flexible integration into our spatial database backend.

Folder Structure Convention

🏷️ Replace {SITE-NAME} with your site name (e.g., yap_data, navy_data, citywest_data).

πŸ—οΈ The folder House_11 is a custom name and can be anything β€” you may create separate folders for each building or model (e.g., Building_A, Garage_01, etc.). Inside each folder, place the corresponding IFC component CSVs (e.g., Wall.csv, Slab.csv, Door.csv, etc.).

Example Folder Structure

./ingestion_data/
  └── {SITE-NAME}/
      └── ingestion/
          └── tables/
              └── IFC/
                  └── House_11/
                      β”œβ”€β”€ Header.csv
                      β”œβ”€β”€ Wall.csv
                      β”œβ”€β”€ Slab.csv
                      β”œβ”€β”€ Door.csv
                      └── Window.csv

πŸš€ IFC CSV Ingestion Command

Docker command to Ingest Single IFC CSV.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest_ifc_csv.py \
/ingestion_data/yap_data/ingestion/tables/IFC/House_11/Header.csv --overwrite
Docker command to Ingest Folder of IFC CSV's.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest_ifc_csv.py \
ingestion_data/yap_data/ingestion/tables/IFC/House_11 --folder --overwrite

πŸ“˜ Ingesting Glossary CSV (Searchable Terms & Acronyms)

The glossary is a special type of CSV that powers a dedicated frontend UI for exploring domain-specific terms and acronyms.

Once ingested, users can:

  • πŸ” Search by term or acronym
  • πŸ“– View full definitions
  • 🧩 Link glossary terms with spatial or tabular data

Example Folder Structure

For example:

./ingestion_data/
  └── {SITE-NAME}/
      └── ingestion/
          └── tables/
              └── Glossary/
                  └──Glossary_NTRP_Acronyms_Planning.csv

πŸš€ Glossary CSV Ingestion Command

Docker command to Ingest Glossary Data.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest_glossary_csv.py  ingestion_data/glossary/ --folder --overwrite

Ingesting Vanilla CSVs (Tabular Data)

Twinspace Mirror supports ingestion of general-purpose tabular CSV files. These may include datasets such as project plans, task assignments, schedules, or any structured flat file.


Example Folder Structure

./ingestion_data/
  └── {SITE-NAME}/
      └── ingestion/
            └── tables/
                └── Non_Spatial_CSV/ 
                    └── Project.csv
                    └── Task.csv

Folder Structure Convention

🏷️ Replace {SITE-NAME} with your actual site name (e.g., yap_data, navy_data, citywest_data).

πŸ—‚οΈ You may organize multiple CSVs under Non_Spatial_CSV/ β€” such as Project.csv, Task.csv, Budget.csv, etc. Each file will be mapped to its own table in the database.


πŸš€ Vanilla CSV Ingestion Command

If you have multiple CSV files inside a folder (e.g., Project.csv, Task.csv, Budget.csv, etc.), you can ingest them all at once using the following command:

Docker command to Ingest CSV folder.
docker exec -it citymap-webapp-geodjango-1 python vtp/ingest_csv_folder.py \
/ingestion_data/yap_data/ingestion/tables/Non_Spatial_CSV --folder --overwrite
Table Name Argument

The second argument (projects) is the database table name that will be created or updated.

Use --overwrite

This will replace any existing table with the same name.

CSV Format

Ensure your CSV uses a proper header row, and valid delimiter (usually ,). Invalid files may cause ingestion failure.

βš™οΈ Important Notes for CLI Ingestion

Overwrite vs Append

If the target table already exists, use the --overwrite flag to drop and recreate it.
If you omit --overwrite, the new data will be appended to the existing table.

Column Data Types

During CSV ingestion, the CLI will prompt you to choose a data type for each column.
- You will see a numbered list (e.g., 1: text, 2: bigint, etc.) - If your desired data type is not listed, enter 0 to manually type it in (e.g., numeric, uuid, etc.)

EPSG Code for Geometry

You will be prompted to enter an EPSG code for geometry columns.
- Default is 4326 (WGS 84) - Enter another code only if your spatial data uses a different CRS

Geometry Type Selection

You will be asked to choose a geometry type: - 1: Point
- 2: Line
- 3: Polygon
Choose the one that matches your spatial data’s structure.


πŸ†˜ Get Help

To see available CLI options for any script, you can run:

python ingest_csv.py --help

πŸ—‚οΈ Filebrowser (Web-Based File Manager)

Filebrowser is a lightweight web-based file management interface that allows you to browse, upload, delete, and manage files inside the project folders β€” all from a clean browser UI.


🎯 Purpose

To easily navigate and manage project assets, including:

  • Geospatial files (e.g., Shapefiles, GeoTIFFs) stored in ingestion_data
  • X3D assets, PDFs, and screenshots stored in webapp/media

This is especially helpful for debugging, manual uploads, or reviewing asset structure.


βš™οΈ Setup & Configuration

  • Docker Service Name: citymap-filebrowser
  • Host: localhost
  • Port: 8083
  • Base URL: /files

Accessible at:

http://localhost:8083/files

Connected to the shared citymap-network.


πŸ‘€ Default Credentials

  • Username: admin
  • Password: admin

⚠️ It is strongly recommended to change the default credentials in a production environment.


πŸ“ Mounted Directories

Host Path Container Path Description
./../ingestion_data /srv/ingestion_data Stores uploaded geospatial and asset files
./../webapp/media /srv/media Stores frontend X3D assets, screenshots, PDFs

πŸ”’ Notes on Security

  • Filebrowser is not password protected beyond the default credentials. For production, configure secure credentials or disable the service entirely in public-facing deployments.
  • You can manage users and roles from the Filebrowser UI or a .filebrowser.json config.

πŸ› οΈ Common Use Cases

  • Uploading new data assets
  • Browsing ingested files without needing shell access
  • Debugging issues with missing or malformed files
  • Quick visual review of generated files (e.g., PDF reports, 3D models)