Construction regulatory document digitalization with layout knowledge-informed object detection and semantic text recognition
- Authors
- Wang, Shuyi; Moon, Seonghyeon; Fu, Yuguang; Kim, Jinwoo
- Issue Date
- May-2025
- Publisher
- Pergamon Press Ltd.
- Keywords
- Construction documents; Digitalization; Layout knowledge; Optical Character Recognition (OCR); Text recognition
- Citation
- Advanced Engineering Informatics, v.65
- Indexed
- SCIE
SCOPUS
- Journal Title
- Advanced Engineering Informatics
- Volume
- 65
- URI
- https://scholarworks.gnu.ac.kr/handle/sw.gnu/78124
- DOI
- 10.1016/j.aei.2025.103278
- ISSN
- 1474-0346
1873-5320
- Abstract
- Construction documents, containing extensive project information, are often stored and shared in unstructured paper formats, leading to inefficiencies in retrieval and transfer among stakeholders. There has been a pressing need for digitalizing construction documents by converting Portable Document Format documents into machinereadable, structured texts. However, current optical character recognition technologies struggle with complex layouts commonly found in construction project documents. To address this issue, we propose a construction document digitalization approach integrated with layout knowledge-informed object detection and semantic text recognition, improving recognition accuracy across various layouts and preserving the structural integrity of texts. Results show that our approach can reduce the average word error rate by 5.6 %p with the assistance of layout knowledge and achieve a structural similarity of 78.8 %, while achieving 87.4 % mAP@50 for layout analysis. These findings highlight the positive impacts of layout knowledge on digitalizing construction documents and underscore the practical viability of our approach.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - 공과대학 > Department of Industrial and Systems Engineering > Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.