The Secret Sauce Behind Project Cortex: Good Metadata

Microsoft launched Project Cortex at its Ignite conference last November. It introduced the new service as a means to leverage all of the information stored in the various cloud systems (principally Microsoft 365) through Graph APIs and the application of AI techniques. The goal is to put knowledge management where it needs to be: in the flow, embedded into our everyday processes and enabled by, and presented through, the software we use on a regular basis.

Project Cortex's ingestion engine provides an interface for businesses to train the Azure Cognitive Services-driven machine learning algorithms on their content. It can ingest content from Microsoft 365, including SharePoint Online sites, but also from other systems and repositories using the existing connectors developed for Microsoft Search. Without a doubt, other vendors will quickly develop connectors to take advantage of these capabilities.

With this information, Project Cortex will produce and index a set of “Topics,” which spurs the automatic creation of a topic page and the presentation of topic cards embedded into the interfaces of the Office 365 applications — hence my comment about embedding the knowledge management capabilities directly into your workflow. It will of course, apply security trimming, just as it does with Microsoft Search results.

Sounds pretty sexy, no? AI! Machine learning! Azure Cognitive Services! But the foundation of all this magic is something much more humble: metadata. So how and where does metadata fit into the picture?

Project Cortex: When AI Creates Its Own Metadata

Businesses can use Azure Cognitive Services' text recognition, natural language processing (NLP) and image recognition capabilities to process their content and generate large amounts of new metadata to tag the same content items. In some respects this is recursive: AI feeding AI. The first stage is to process the content to generate metadata, and the second stage is to use that metadata in the creation and cataloging of the topics.

According to Microsoft, topic creation is actually creating “knowledge entities,” a new type of object class used by the Microsoft Graph. Metadata also comes into play when establishing the relationships between topics, i.e. the creation of ontologies. Ontologies represent the concepts or categories in a subject area or information domain and the relationships between them. So if multiple topics are tagged with the same metadata, then probably there is a relationship of some sort in play.

Human Created and Curated Metadata

It's true, even with all of this super clever technology like NLP and machine learning in play, us humble humans (you have been watching WestWorld, right?) still have a role.

Microsoft has demonstrated Project Cortex's ability to see the topics, with taxonomy term sets, acronyms and lists of terms.

This is where human subject matter experts add considerable value: setting the automated system up, tweaking it and managing it to ensure good quality results. Luckily many companies have been working to develop metadata taxonomies for decades. The AI techniques don’t push them out of the picture, rather, they bring them to the fore: Project Cortex can use a taxonomic hierarchy of metadata terms, with synonyms acting not only as a seed, but as a means to check the output of the NLP processing. For example, if 20 entities are extracted from a document, how many of them correlate to terms in a taxonomy you have uploaded to the Microsoft Managed Metadata Service?

My current role focuses on the legal industry. WAND Inc has been working in the metadata and taxonomies space for 35 years, and its legal taxonomy includes 3,622 terms, and 1,868 synonyms across subject areas like courts, contract law, criminal law, civil litigation and more — in total over 40 different areas of legal practice management. So using this as an example for legal KM, that is a lot of topics you can seed Project Cortex to work with!

Learning Opportunities

WebinarOn Demand

Engaged, But Not Convinced: What's Eroding Beneath Your Strongest Engagement Numbers

Watch Now

WebinarOn Demand

Retention Is a Communication Problem: New Data from 6,000+ Employees

Watch Now

WebinarOn Demand

Do Learning Programs Really Work? How to Turn Education Into Engagement In Healthcare

Watch Now

WebinarOn Demand

Fuel Your Front Line: Workforce Trends Shaping 2026

Watch Now

WebinarOn Demand

Fix the Disconnect Between Projects and Profit

Watch Now

WebinarOn Demand

AI in Customer Service: Faster Resolutions, Happier Customers

Watch Now

WebinarOn Demand

Accelerating Healthcare Ops with AI

Watch Now

Chameleon Tech: A chameleon changing colors to blend with various tech environments, symbolizing adaptability and versatility in technology solutions.

WebinarOn Demand

From Nice-to-Have to Non-Negotiable: Prove the Value of Your EX Stack

Watch Now

View All

The Human and AI Virtuous Circle

I am theorizing, or rather extrapolating capabilities at this point as I've never worked with Azure Cognitive Services text analytics. But I imagine it would be easy to use the APIs to compare entities extracted by the text analytics against a thesaurus to look for additional synonyms. The output from this could be reviewed by human subject matter experts / taxonomy specialists and potentially added back into the Managed Metadata Services taxonomy as a new terms, or new synonyms for existing terms.

AI techniques like NLP, machine learning, text and image processing and more are going to provide a step change in the contextual information derived from inputted content: they will create a lot of new metadata, generating additional value when human experts review the quality of these new attributes by comparing them to existing metadata taxonomies.

It is good to know the “bags of mostly water” are still useful and not ready to give into Skynet yet.

fa-solid fa-hand-paper Learn how you can join our contributor community.

Project Cortex: When AI Creates Its Own Metadata

Human Created and Curated Metadata

The Human and AI Virtuous Circle

About the Author