docs

Buildata dataset structure

Each Buildata dataset provides structured BIM building graphs ready to be used in AI workflows. Datasets include semantic elements, relationships, materials and property sets organized for graph learning, machine learning pipelines and BIM reasoning experiments.

Dataset origin

Buildata datasets are synthetically generated using a procedural BIM generation engine. The system creates semantically valid building graphs following IFC schema rules, including spatial hierarchy, materials, property sets and element relationships.

Transparency note

Catalog datasets do not contain models of real buildings. This is important for reproducibility, public sharing and AI research workflows that need legally shareable training data.

Core dataset structure

buildata-residential-v1/
  buildings/
  dataset_card.md
  edges.json
  metadata.json
  nodes.json
  statistics.json
  tasks.json

Each dataset contains a semantic building graph composed of BIM elements, relationships and associated metadata. The structure allows users to explore building topology, materials and property sets in a consistent machine-readable format.

AI-ready dataset layers

buildata-residential-v1/
  graph/
    graph_dataset.json
    graph_index.json
  ml/
    tabular training datasets
  llm/
    instruction and reasoning datasets
  buildings/
    individual building graph samples

In addition to the core BIM graph, Buildata datasets may include additional layers designed for AI experimentation and model training. These layers organize the same building information for different learning workflows.

metadata.json

{
  "generator": "Buildata Synthetic Building Engine",
  "generatorVersion": "3.0.0",
  "schemaVersion": "IFC4X3_ADD2",
  "datasetSchemaVersion": "3",
  "typology": "residential",
  "constructionMethod": "Cast-in-place concrete",
  "objectCount": 263,
  "nodeCount": 263,
  "edgeCount": 623
}

What metadata does

Metadata describes the dataset or the building sample, including typology, schema, generation settings, counts and quality metrics used for reproducibility.

nodes.json

{
  "node_id": "ifcwall-0025",
  "entity": "IfcWall",
  "parent_id": "ifcspace-0024",
  "predefinedType": "NOTDEFINED",
  "features": {
    "syntheticHeight": 2.85,
    "syntheticLength": 5228.84,
    "syntheticThickness": 0.12,
    "syntheticMaterial": "Gypsum",
    "objectType": "Exterior Wall"
  }
}

Nodes as BIM elements

Each node is a BIM element or hierarchy entity, enriched with features and selected property sets. This is the semantic backbone of the dataset.

edges.json

{
  "edge_id": "edge-00047",
  "source": "ifcdoor-0033",
  "sourceEntity": "IfcDoor",
  "relation": "RelFillsElement",
  "target": "ifcwall-0027",
  "targetEntity": "IfcWall"
}

Edges as semantic relationships

Edges turn the building into a BIM graph that supports relation prediction, topology learning, graph embeddings and reasoning tasks.

  • RelContainedInSpatialStructure
  • RelFillsElement
  • RelSpaceBoundary
  • RelAggregates
  • hasMaterial

tasks.json

{
  "classification": [...],
  "metadata_completion": [...],
  "relation_prediction": [...],
  "topology_learning": [...],
  "ifc_reasoning": [...],
  "material_prediction": [...]
}

Training tasks

tasks.json defines benchmark-ready tasks that can be used to train and evaluate AI models on BIM classification, completion, relation prediction and reasoning workflows.

ML layer

Prepared for machine learning workflows that use structured features, labels and benchmark splits for prediction and evaluation tasks.

LLM layer

Provides instruction-style and reasoning-oriented samples that can support BIM language models and question-answering experiments.

Graph layer

Packages building graphs for direct use in graph learning workflows, including node, edge and topology-based experiments.

Current dataset card example

name: buildata-residential-batch
version: 3
dataset_type: synthetic
generation_method: procedural BIM generation
source_buildings: synthetic
derived_from_real_buildings: no
generatorVersion: 3.0.0
task_categories: [graph-ml, node-classification, link-prediction, material-prediction]
tags: [BIM, IFC, architecture, building, graph-neural-network, synthetic, material, pset]
license: cc-by-4.0

Dataset card

Each dataset includes a dataset_card.md file describing its contents, statistics and intended AI tasks. The dataset card provides a quick overview for researchers and developers evaluating the dataset before using it in experiments or training workflows.

next step

Datasets designed for AI workflows

Buildata datasets are designed to support multiple AI workflows in the built environment, including graph learning, machine learning experimentation and BIM reasoning models.