The Secret Sauce Behind Project Cortex: Good Metadata
Microsoft launched Project Cortex at its Ignite conference last November. It introduced the new service as a means to leverage all of the information stored in the various cloud systems (principally Microsoft 365) through Graph APIs and the application of AI techniques. The goal is to put knowledge management where it needs to be: in the flow, embedded into our everyday processes and enabled by, and presented through, the software we use on a regular basis.
Project Cortex's ingestion engine provides an interface for businesses to train the Azure Cognitive Services-driven machine learning algorithms on their content. It can ingest content from Microsoft 365, including SharePoint Online sites, but also from other systems and repositories using the existing connectors developed for Microsoft Search. Without a doubt, other vendors will quickly develop connectors to take advantage of these capabilities.
With this information, Project Cortex will produce and index a set of “Topics,” which spurs the automatic creation of a topic page and the presentation of topic cards embedded into the interfaces of the Office 365 applications — hence my comment about embedding the knowledge management capabilities directly into your workflow. It will of course, apply security trimming, just as it does with Microsoft Search results.
Sounds pretty sexy, no? AI! Machine learning! Azure Cognitive Services! But the foundation of all this magic is something much more humble: metadata. So how and where does metadata fit into the picture?
Related Article: Microsoft Project Cortex Ushers in the Age of Topic Computing
Project Cortex: When AI Creates Its Own Metadata
Businesses can use Azure Cognitive Services' text recognition, natural language processing (NLP) and image recognition capabilities to process their content and generate large amounts of new metadata to tag the same content items. In some respects this is recursive: AI feeding AI. The first stage is to process the content to generate metadata, and the second stage is to use that metadata in the creation and cataloging of the topics.
According to Microsoft, topic creation is actually creating “knowledge entities,” a new type of object class used by the Microsoft Graph. Metadata also comes into play when establishing the relationships between topics, i.e. the creation of ontologies. Ontologies represent the concepts or categories in a subject area or information domain and the relationships between them. So if multiple topics are tagged with the same metadata, then probably there is a relationship of some sort in play.
Related Article: Using AI for Metadata Creation
Human Created and Curated Metadata
It's true, even with all of this super clever technology like NLP and machine learning in play, us humble humans (you have been watching WestWorld, right?) still have a role.
Microsoft has demonstrated Project Cortex's ability to see the topics, with taxonomy term sets, acronyms and lists of terms.
How to Build a More Innovative and Resilient Workplace Culture
What would happen if every member of your team came to work focused on finding solutions and creating better results?
Getting Your Digital Workplace Fit for the Future of Work
We’ve all read a lot about the future of work. But what does it really mean?
Cyber Resilience In The Cloud Is Not Optional
In today’s age of ransomware and widespread cyber attacks, your cloud data needs to be protected.
3 Secrets to Accelerating Transformation to Improve CX + EX
Learn about force multipliers that will reduce technical debt and grow revenue while reducing costs
This is where human subject matter experts add considerable value: setting the automated system up, tweaking it and managing it to ensure good quality results. Luckily many companies have been working to develop metadata taxonomies for decades. The AI techniques don’t push them out of the picture, rather, they bring them to the fore: Project Cortex can use a taxonomic hierarchy of metadata terms, with synonyms acting not only as a seed, but as a means to check the output of the NLP processing. For example, if 20 entities are extracted from a document, how many of them correlate to terms in a taxonomy you have uploaded to the Microsoft Managed Metadata Service?
My current role focuses on the legal industry. WAND Inc has been working in the metadata and taxonomies space for 35 years, and its legal taxonomy includes 3,622 terms, and 1,868 synonyms across subject areas like courts, contract law, criminal law, civil litigation and more — in total over 40 different areas of legal practice management. So using this as an example for legal KM, that is a lot of topics you can seed Project Cortex to work with!
Related Article: The Intelligent Digital Workplace Is Already Here
The Human and AI Virtuous Circle
I am theorizing, or rather extrapolating capabilities at this point as I've never worked with Azure Cognitive Services text analytics. But I imagine it would be easy to use the APIs to compare entities extracted by the text analytics against a thesaurus to look for additional synonyms. The output from this could be reviewed by human subject matter experts / taxonomy specialists and potentially added back into the Managed Metadata Services taxonomy as a new terms, or new synonyms for existing terms.
AI techniques like NLP, machine learning, text and image processing and more are going to provide a step change in the contextual information derived from inputted content: they will create a lot of new metadata, generating additional value when human experts review the quality of these new attributes by comparing them to existing metadata taxonomies.
It is good to know the “bags of mostly water” are still useful and not ready to give into Skynet yet.
Related Article: Blade Runner's 2019 vs. Today: Which Is the Dystopia?
Learn how you can join our contributor community.
About the Author
Jed Cawthorne is principal evangelist at ShinyDocs, focusing on spreading the message of the benefits of good data and information management. Jed has over 20 years experience in information and knowledge management, and over 25 years in IT.