Master 2.5 GB of unstructured specification documents with ease
Dr. Andreas Schilling is Senior Software Engineer at eXXcellent solutions. In his job, he helps customers to develop software solutions from the early stage of defining the particular requirements to developing information systems which meet their needs.
Before working at eXXcellent solutions Andreas Schilling studied Information Systems at the University of Bamberg focusing on distributed systems and information management. Thereafter, he pursued his PhD and studied collaboration dynamics in open source projects.
Tags: networkx pandas visualization knowledge-management analytics use-case python business
How Do you kick start a project which is based on 2.5 GB files of unstructured specification documents? To answer this question, we present our lessons learned from developing a Python based knowledge management tool which provides a lightweight and intuitive browser frontend.
In particular this talk covers the following topics:
How to make use of pywin32 to access layout and content information from partially corrupt .doc and .docx files and create simple JSON files with UTF-8 encoding.
Identify and categorize signal words in your specification.
Use pandas to compile content based recommender functionality.
Use networkx and py2cytoscape to visualize call sequences and semantic relationships in your specification.