Natural Language Processing (NLP)-based framework for Construction Quantity Take Off from Building Information Models
Date of Award
4-2025
Degree Name
Doctor of Philosophy
Department
Civil and Construction Engineering
First Advisor
Hexu Liu, Ph.D.
Second Advisor
Osama Abudayyeh, Ph.D.
Third Advisor
Alvis Fong, Ph.D.
Abstract
Construction cost estimation is a critical task in construction management, aiming to determine the total cost of projects before construction begins. It serves as the foundation for cost management and control during the construction stage. The process of construction cost estimation typically involves several procedures, including 1) developing construction methods, 2) establishing a work breakdown structure (WBS), 3) performing quantity take-off(QTO) for construction work packages in the WBS, 4) calculating direct costs based on quantities and unit prices of each work package, and 5) determining the total construction cost by adding overhead, profit, and contingencies(Tang et al., 2022). However, these steps often require significant manual efforts and are challenging to fully automate. This is partly due to the fact that construction cost estimation is a knowledge-intensive process, and the necessary estimation knowledge is often lacking in current computer systems. For instance, the interpretation of construction specifications to establish the WBS requires the knowledge and experience of cost estimators. QTO step also demands manual judgment and involvement of estimators in analyzing the work descriptions of the WBS cost items to determine their quantities accurately.
QTO is “a detailed measurement of materials and labor needed to complete a construction project” (Liu et al., 2016). It serves as the foundation for other tasks in construction management such as cost estimation and schedule planning, and its accuracy can directly affect downstream analyses and decision making. QTO process involves extracting information from design drawings or 3D models in order to measure quantities of building elements or features. This process includes reviewing the scope of work for a specific work package and identifying the specific objects or activities that are included in that work package. Construction-oriented QTO, in this context, refers to determining the quantity amount of construction work packages or cost items from a specific database. Traditional QTO in construction is a manual process that is prone to human errors(Monteiro & Poças Martins, 2013). Estimators need to manually review the scope of work packages, which can be time-consuming and tedious. Estimators also need to put in substantial manual efforts to understand the scope of work packages, which can result in inconsistencies and discrepancies in quantity results. Different estimators may interpret the same work package or cost item differently, leading to variations in quantity estimates. These manual processes in traditional construction-oriented QTO can be inefficient and increase the risk of errors, highlighting the need for automated and more reliable approaches.
This study developed a specialized version of BERT, named ConBERT, specifically tailored to the construction industry. The purpose of ConBERT is to extract essential Building Information Model (BIM) query parameters from textual descriptions of work packages, which are critical for reliable cost estimation and efficient resource allocation in construction projects.
The key challenge addressed in the paper is the variability and unstructured nature of construction work package descriptions. These descriptions often contain specialized terminology, abbreviations, and inconsistent phrasing, which makes automated extraction of relevant parameters challenging. To overcome this, ConBERT was trained on a construction-specific corpus to better understand the unique vocabulary and syntax used in construction documents. By developing a domain-specific language model, the researchers aimed to improve the precision of extracting construction elements, such as materials, dimensions, and other parameters required for generating BIM queries.
The proposed method integrates ConBERT-based Named Entity Recognition (NER) and a text classification model to automatically interpret textual descriptions. The NER model identifies key construction entities, such as materials, building components, and construction activities, while the text classification model infers quantity types from these descriptions. This combined framework helps generate structured query parameters from unstructured text, ultimately facilitating automated retrieval of quantities directly from BIM models. Experiments showed that ConBERT significantly outperformed general-purpose BERT models in terms of accuracy for domain-specific tasks, achieving over 90% precision, recall, and F1 scores for various classification tasks.
In addition to ConBERT, the authors proposed a labeling system to classify different components of the construction process, which includes both subject-related information like materials and building components, as well as attributes such as size and function. The study also introduced sequence labeling rules to further enhance the accuracy of entity recognition. The experimental results demonstrated that the developed models, including the ConBERT-based NER and text classifier, are effective in accurately extracting the information needed to generate BIM query statements, thereby advancing the automation of QTO.
The authors concluded that the specialized adaptation of BERT, trained with construction-specific data, provides significant improvements in extracting BIM query parameters. However, they also acknowledged that the complete automation of QTO, including the generation of query statements for BIM, requires further research. This study lays a strong foundation for advancing construction management technologies by making cost estimation processes more efficient, consistent, and accurate through the use of NLP and transfer learning.
Access Setting
Dissertation-Abstract Only
Restricted to Campus until
3-27-2035
Recommended Citation
Tang, Shengxian, "Natural Language Processing (NLP)-based framework for Construction Quantity Take Off from Building Information Models" (2025). Dissertations. 4166.
https://scholarworks.wmich.edu/dissertations/4166