Technology
AUTOMATING PERMIT PARSING FOR AN EDTECH CLIENT AND ACHIEVING 80% FASTER PROCESSING

Background of
the study
An Edtech client approached Nexority Infotech with a critical need
to automate their permit parsing process. Their existing manual
system was slow, error-prone, and unable to handle the
complexity of nested hierarchies and unstructured data. The
client’s objectives were:
Challenges
- Handling Nested Hierarchies: Extracting data from deeply nested sections while maintaining context.
- Improving Accuracy: Ensuring high accuracy in data extraction despite unstructured and low-quality documents.
- Balancing Performance and Scalability: Developing a solution that could process large volumes of permits quickly without compromising accuracy.
- Integration with Existing Systems: Providing structured outputs compatible with the client’s data pipelines.
Nexority’s Solution
- Dataset Preparation
Converted permit PDFs into images and annotated them using LabelMe to create a diverse training dataset of sections, subsections, and tables. - Model Selection & Fine-Tuning
Chose the Faster R-CNN model from Detectron2 and fine-tuned it on the dataset by adjusting hyperparameters like learning rate and batch size for improved accuracy. - Custom Parsing Logic
Implemented regex-based logic to extract data from nested hierarchies and complex document structures. - Integration with Permit Parser
Integrated the trained model into the NexoParser app to automate data extraction, outputting structured results in JSON and Excel formats.
Results & Impact
- Time Savings – Manual processing time was reduced by 80%, allowing the team to focus on high-value activities.
- Increased Accuracys – Data extraction accuracy improved to 93%, minimizing errors in nested and complex information.
- Faster Processing – Documents were processed four times faster, enabling the client to handle more permits in less time.
- Cost Savings – Automation reduced operational costs by 30%, optimizing resource allocation.
- Enhanced Usability – Structured outputs in JSON and Excel formats simplified data analysis and reporting.