Clinical Research with FHIR

Microsoft Research · Beginner ·📐 ML Fundamentals ·5y ago

Skills: ML Maths Basics80%ML Pipelines70%Supervised Learning60%Unsupervised Learning60%

Key Takeaways

The video discusses the use of FHIR (Fast Healthcare Interoperability Resources) in clinical research, highlighting its importance in standardizing healthcare data and enabling interoperability between different systems. Microsoft Research's Health Cloud and Data team is working on incubating technologies for healthcare data and machine learning, with a focus on FHIR and its applications in clinical research.

Full Transcript

all right it looks like it's about 1105 here in redmond so we'll go ahead and get started i want to welcome everybody to this uh research talk i'm doug 7 i'll tell you a little bit about myself in just one second but the intention of this talk is to kick off a series so my team will be doing a series of talks uh the first wednesday of every month uh related to healthcare data and research and so we're kicking off this series with a talk about clinical research with an evolving a new standard called fire which we'll be spending some time kind of baselining what fire is and and why it's relevant in this context and then talking about uh how how it would tie into clinical research so this is think of this as a bit of an introductory talk that will give you the high level flyover view of how you would work with healthcare data and healthcare standards in clinical research and then the series of talks that we'll have different people from the team will come and talk about different capabilities like the internet of medical things and working with connected devices through things like clinical trials and things like that or the secondary use of clinical data through tools like the identification anonymization and things like that so that's uh where we are today and where we're gonna be going over the next uh several months we'll be sharing things we're working on so my goal today is to give you that high level landscape and high level overview let me just quickly tell you a little bit about who i am i'm the senior director in a team called health cloud and data which is part of health next here in microsoft research the team's been around for a number of years i've been on this team for about two and a half years and our job is to incubate technologies around the healthcare domain specifically for the purpose of enabling organizations to improve healthcare through things like machine learning and artificial intelligence and analytics and things like that so a lot of what we do is incubate products and technologies that either ship as individual products or ship as part of other existing products we'll talk about some of those things but the idea being that that enables this data to come in this clinical data to come into the into our cloud environments and be used in these different ways personally i've been to microsoft for 14 of the last 21 years i started originally in 1999 but i've left a couple of times to do startup work most of my career has been dealing with incubations working in different parts of both microsoft and outside of microsoft on new new businesses and new products new technologies and incubation and like i said for the last couple years i've been here in microsoft research as part of the health next team working on this technology so that's enough about me let's talk about where we're going what we're going to do so my plan for the time we have here today is to first sort of explain the role of fire the the fire standard in data and interoperability more to set some context for what's happening in health care and why this standard is relevant and important in what we're going to be talking about in terms of how you then enable this data to be used in different research projects and research endeavors and in talking about that i do want to spend some time talking about fire and the fairness principles for data i think this is relevant i think this is important i want to tie into some of the conversation that's happening internally around the fair use of data particularly in research in ai and things like that so i just want to anchor the fire specification with the fairness principles so you have a good sense of how that applies and why that's relevant and then we'll spend some time talking about some of the research data flow and we'll actually talk about a data flow starting with how this data exists in the in the real world in patient care and then how you bring it into the environments we have uh for working with that data and how you kind of move through these different workflows and in the process highlight some of the technologies and some of the things that we've been working on that we'll be talking about over the next few months and bear in mind that these things are continuously evolving our team is continuously working on these things so what i talk about today conceptually exists in in our in our landscape but as we progress over the next few months and we introduce some more deep dive conversations uh those technologies will evolve and change so um it's a good reason to come back and see the other other conversations to know more about what's happening there before we end so fire is largely you know fires around a standard around clinical data textual data medical record data before we end i want to talk about how you would integrate medical imaging data into this and we have projects that have existed in research for a while around things like inner eye for doing some of the the work with medical imaging data i want to talk about in this context how you would integrate medical imaging data with this clinical data that's coming in and fire and then sort of wrap up and give you a little bit of look ahead as to what's coming uh from our team over the next few months in some of these conversations uh so let me start with the role of fire and interoperability and talk about kind of what's the state of things and where we're going so today and you've probably all experienced this sort of first hand anecdotally um data you know patient data is uh sort of trapped in different silos depending on where you go either in a clinical space you go to one provider and they have data you go to another provider and they have different data or some data maybe they are part of a network that shares data maybe they're not um i was just having a conversation last night with a friend whose data who she's having difficulty getting access to her data from one of the providers because the data is sort of trapped and locked away and this exists across a collection of systems so these systems all exist for different purposes whether it's your electronic medical record that's in a clinical environment or some imaging data that's part of a radiology uh scenario or some of the medication data or maybe you're part of a clinical trial uh or or some of the financial data so this data is all over the place it i won't get into sort of the problems that exist operationally with this in terms of the waste and the cost and things like that but it's worth noting that from a research perspective you generally want more of this data you don't want just one silo of this data you want more of this data you want to create a bigger picture of what you're researching and it's difficult to get it's difficult because it is in different silos and it's all over the place in different locations um not only do you have sort of the business aspect of access do you have the privilege and rights to get access to which have the technical challenge of it's in different places and in different formats and different things and so what you really need is this data to be connected through some form of interoperability and the good news is at least in the united states um and what i'll say is there's legislation happening in the united states that is requiring this data to become more accessible and more available but that legislation is sort of paving the way for the rest of the world to think about how the state is accessible and in some cases other parts of the world are way ahead of the united states in terms of availability and access to this data and the unification of this data but the idea here is that through an agreed-upon set of interoperability rules all of this data would be available and exchangeable and those rules are based around a standard for data called fire which we'll spend a little bit time talking about this is huge in terms of what it means in the united states for health care data in terms of unifying the standard for the data and how it's presented how it's exchanged and then ultimately getting access to that data this is effectively i always make the analogy that this is very similar this is for healthcare this is the equivalent to what usb is for multimedia the standards for usb while not legislatively mandated the standards for usb mean that i can have my laptop and i can plug it into a printer or a webcam or a projector and it's just going to work because the the industry has agreed on a standard for data exchange and data interoperability and that's effectively what fire is for uh health care and clinical scenarios the challenge though is that it's not just about what's in your electronic medical record so we'll talk about what fire does for for clinical data and the fact that these rules that are coming down but legislatively mandate that organizations that are custodians of healthcare data providers payers these people have to make that data available through these standards there's lots of other data that may not have be in those places your education data job status data health and fitness data biometrics things like that this data is largely relevant in many research scenarios and in fact probably some of this is more relevant in many research scenarios than what's in your medical record um your medical record is only a piece of it in fact uh some of these research uh research shows that it's probably 20 or 30 percent of your of your information in your medical record is what's relevant to your total health and well-being um so we do have to think about these other things we often sort of lump all these things into the category of social determinants of health or social influencers of health and it's important to recognize that they play a critical role in what we do and so we'll talk about what is possible by pulling data from the medical record but also how can you extend that to include this other data and what you can do with it so you know this talk is about clinical research with fire and i'm going to make the assumption that some people come in knowing what fire is some people come in not knowing what fire is and wants to know uh and by the way i should have mentioned in the beginning we have a q a panel for the live event you can't ask questions directly verbally but you can't ask them uh or vocally i should say but you can't ask them the q a panel and some of my teams are ready to answer questions as well but fire is a standard it's a piece of paper effectively it's a it's a standard um created by a set of working groups across the healthcare industry with representation from healthcare payers and providers and pharmaceutical companies and tech companies like ourselves and it's a standard that's designed to support and enable the exchange of clinical health information primarily for the benefit of care but also in support of other scenarios like research and things like that um so when we talk about fire it really is just a specification much like a bluetooth or usb it's just a specification and then there's an implementation of that specification and what that means when we talk about fire the idea here is uh and this may seem um slightly archaic to to us at microsoft because we tend to be you know on the leading edge of technology but healthcare is really on the laggard edge of technology it's healthcare is driven by fear and compliance and so oftentimes they're the last to adopt technologies uh in in many cases there's some cases certainly where you know healthcare leads the way through through research and things like that but fire is really based on a set of widely used best practices that we see across the rest of um technological industries uh use of restful apis and schemas and data formats and things like that that are relatively clear in many cases um so there's a bit of education here but for those of us at microsoft we're very familiar with these things and and these things are well proven and easy to implement the intention of fire is to be relatively broad and so the the idea that the fire's been developing over the last 10 years or so and it's put together by a set of working groups with all this representation from different places and so the intent is to think about the the wide variety of scenarios in which you would have um sort of clinical data and and how it would be used whether it be human you know mentions your human or veterinary or other things certainly the emphasis in most cases is on human clinical care but the intent behind the fire specification is to be quite broad to support care and you know point of care so if i'm a patient that a provider has access to all of my medical history and medical information through this interoperability standard they can get that data from wherever it might exist and have it in one place but also in areas like public health and analytics and how we can how we can look at that data clinical trials research things like that and so fire is defined as a collection of what they call resources so fire stands for fast health care interoperability resources and the fire resources are the resources are those entities those things a patient is a resource an observation it's a resource and we'll talk more about that in just a second but it is intended to be quite broad um and i mentioned that it's been been developed by this international group so it's not a u.s standard that's the probably one of the more important pieces of this in fact we're seeing rapid adoption of the fire standard outside of the united states we talk with lots of organizations in the uk in africa in asia that are moving toward the adoption of fire because of what it's intended to do which is sort of unify the standard for how the state is represented in exchange so again kind of going back to my usb comparison the the standard for the data exchange is defined by fire the sort of shape of the plug and the the what data is on what wire if you will from an analogy perspective but the internal operations of the system are completely orthogonal to the fire standard so in other words if i can exchange the data in and out of my system according to the fire standard it doesn't matter what happens inside my system i can do anything i want as long as i can represent that data and receive that data according to the standard and that's hugely important when we start talking about unifying these different things and getting data from different places so let me take a second to kind of show you a little bit of what fire looks like so the fire standard itself defines like i mentioned in my analogy the shape of the plug and the data that goes over the wire so in other words what are what what are the apis how do i exchange this data and then when i do exchange exchange the data what does it look like what's the shape of that data and so when we talk about the resources in fire there's about i think current count is about 140 resources that are defined in the fire specification and they go through a sort of tiering of maturity so when a new use case is thought of in some reason that says oh we we need to really think about clinical trials and research so we need to have a resource for a research subject a research subject is something that we we will talk about they're different than and it starts at a maturity level of zero and then through the working groups and through real world use cases that resource gets exercised and validated and then as a as it gets validated it moves up in maturity level from a zero to a one to a two to a three all the way up to the fifth level of maturity if i recall my maturity levels correctly uh which is the normative state meaning that we've exercised this resource well enough that we understand what it is uh and it is likely to not change going forward we have a good sense of it there's only about a dozen or so maybe two dozen uh resources that have reached the normative state you can imagine what they might be patient um provider some of these things that are the most commonly used uh resources in the in the specification everything else is at varying levels of resource and in fact you can go to the fire websites if you go to hl7.org fhir you'll find all the information about the the fire specification and there's a resources tab that you can click and see a list of all the resources and they'll have a little number next to it that shows you what the maturity level is but the resources are independent so a patient is a resource all by itself a procedure is a resource all by itself and they have relationships with one another so this isn't a hierarchy necessarily and it's not even a relational database necessarily although you certainly could probably represent it that way but it really is more like a graph of resources and their relationship to one another so i have an example here on the screen of the procedure so if a patient's going in for a procedure like a i'll be dramatic and say a heart transplant that procedure is its own resource that heart transplant procedure is its own resource and it's an instance of this definition and it has a relationship to these other things so that procedure has a subject the procedure is being being performed on a subject and that subject is the patient and it's being performed by a practitioner and it is related to a diagnostic report a condition and it is all part of an encounter and this is just an example that i wanted to show you but let me also kind of give you a look of what these things really look like and so this is kind of how fire works if we if we take a look at what it what how it's represented we have here just a sample repo these are these are synthetic synthetically generated patients so there's no phi here these aren't real people we use the tool to generate a bunch of patients and we just have a little viewer this is just a sample app we have to show uh what this can be and so um let me refresh make sure i'm logged in and loaded up so i've got a bunch of patients here and if i look at this i've got a 65 year old patient named iris 784 and i can click on this little fire icon and what i see here is this is the actual fire resource and i'm going to point out a few things that are relevant as we proceed through the rest of the conversation this is important stuff so there's a unique url associated with this so it's my fire server slash patient and then slash this id you can see here there's the id of the patient so this is a resource type defined as patient you might have a resource type of observation you might have an encounter so on and so forth and this resource this particular resource has a unique and durable id this has nothing to do with the medical record this has nothing to do with anything else this is generated before and by the fire server instance and it's a unique and durable id so anytime i make a request for this id i will get this patient and that's really important when we talk about the sort of reusability of data and the some of the fair principles as we look at this there's a few other things of note here there's something called an extension in this case an extension is a way in which you can modify the resource so the resource has a definition and you might want to extend that definition with some other information so i can add in this case i'm adding a mother's maiden name and a birthplace as an extension to this patient so it's not part of the official specification but by adding an extension i can extend the definition of patient without breaking the definition according to specification this is one of the values of fire and when we think about things like social determinants of health and i want to add data that's not historically part of a medical record the extensions really give me the opportunity to do that so i have something called a structured definition which is sort of the base object if you will uh for fire data and from that structure definition i can define new data types that i want to work with and what's included in them so you can see a mother's maiden name has a value string a birthplace has an address and an address is a type and so i have a lot of flexibility with what i can do here and how i can extend this data furthermore i have different things like here i have an identifier it's part of a coding system the coding system is a well-defined and well-known coding system and in this case i'm dealing with a medical record number and the medical record number is associated to a system and it has a value and so uh same with social security number i have a well-defined code system for for this social security number is well defined and it's published so the the key here is as we think about the fairness principles as we come into this later is some of these things are really important the the reusable and verbal id the extensibility of this and the known code systems and things that i can work against and understand so these are open standards and well known and well published and so you can see this is sort of an example of a patient there's various attributes of a patient like marital status and and so on and so forth if i leave my patient i can also go in and look at what a patient looks like and here i can see that i have conditions and counters observations these are all other resources that are related to this patient so in other words if i look at a particular condition so there's a cardiac arrest condition so it's a resource type condition it too has a unique and durable identifier and um it has a clinical status which is a condition and if i look at this i can see that it's got works with a known terminology for for defining what it is but more importantly what i want to point out is it has something called the subject and that subject is that patient durable patient id we we talked about before it also has something called an encounter in this case an encounter is you know some point at which that patient and a provider came into contact with one another and that created an encounter and that encounter then consumes a number of other things so if i look at this cardiac arrest encounter it'll tell me what it is in this case it's an emergency encounter it's for this particular patient um you know when did that encounter start and end who was the service provider so on and so forth through all of this it's important to note that we also see version ids so every resource we looked at had a version id which means that i conversion these resources so if i go back to my patient um i think this is a version one patient um but i can see that i've got my id i've got my person and i've got a version zone version one patient but if i update something about that patient let's say their name changes their something changes i can then go to a version two of that patient so i can always go back and look at all the versions that exist within the system so that's just kind of a quick tour of how fire looks and what it looks like so like i mentioned there's about 140 or so different resource definitions for different things but they all basically work the same way which is you have this sort of identity and versioning system so up here we see the top on the example below we saw the version down at the bottom but it has a type and it has a durable id it has a version and the last updated date we have this concept of an extension so we can we can extend these resources to include data that is relevant to our use case and what we're doing if we need to and then we have a bunch of standard data that is dependent on that particular resource so in this case a patient has for example a medical record number a name agenda birthday so on and so forth um a research subject would have things like consent you know what's what's the patient id associated with that research subject and so on and so forth so there's different different capabilities that are used there so this sort of then bridges us into this concept of the principles the fairness principles for data and the fairness principles for data of course are the findability of the data the accessibility of the data the interoperability of the data and the reusability of the data so let's talk about those for a second so what i want to do is talk about these principles and i want to connect these principles to the fire specification why this is so relevant why this is so important when when the working groups come together to think about how to implement fire um these concepts are also top of mind while they're thinking about how do we support the exchange of data for care and for payments and things like that they're also thinking about how do we ensure that this data can be used in a variety of scenarios including research so the fair principles are pretty important in how we think about this so if we think about findable and i'm not going to go through everything here but i just want to kind of point out that some of the key principles behind this is that you have to be able to find the data it has to be reliably findable in other words you have to have a global unique and durable identifier which we talked about every resource then fire has that you have the metadata that describes it you know these things are are available and the fourth note here registered or indexed and in a searchable resource is key so fire part of the fire specification is not only the sort of crud operations if you will of how do i create a new resource how do i retrieve that resource how do i update or delete that resource but also how do i search for the resources that matter to me so how do i you know for example how do i find the cohort that i want to do my research on and so these resources that are defined in fire are all also defined as having searchable attributes so i can go in and search for the things i care about and get that data back out and that's one of the key values here as we think about some of the research scenarios is as i bring this data into my environment that i'm going to be working on how do i then find the parts of the data that i care about and do different things and there's some things that are built into fire so the search capabilities that are built into fire so i can say i want to find you know patients with an encounter in a certain time window or something like that but also the ability to connect that data into other systems that provide different and maybe more flexible ways of doing things so for example we have a customer who has brought data into fire and then connected it to azure search to enable different ways of doing search we've connected fire to power bi to enable you to sort of visually filter through data in power bi so there's some different options there accessible is the next principle and how it's there and again we're starting to see some some you know sort of a repeat of some of the things i've pointed out intentionally while we were looking at the fire examples in terms of how i can retrieve a resource by its id using standard protocol restful apis things like that this is open freely and universally implementable anybody can implement this all we have done by the way we have built a fire server for azure all we have done is implement the standard as it's written so we haven't really done anything super secret we built some tooling to make it work well but the implementation of the api is according to the specification so we're adhering to these standards for the protocol being open freely and universally implementable um and we've also supported the authentication authorization rules in this case the the fair principle state where necessary in this case we're dealing with some of the most sensitive data that you could have protected health information the information about my personal health and well-being you know whatever it might include and so authentication authorization are critical we've done a couple of things one one thing we did is we've released an implementation of the fire standard as an open source uh project on github called the fire server for azure and i'll have links to this at the end and that enables anybody to stand up a fire server in azure in their own azure subscription and you have the ability to control some of the authentication and authorization around it in other words you can make it anonymously accessible if you want although we don't recommend that because of the kind of data you're dealing with when we run it as an azure service we also have something called the azure api for fire we mandate that you must use azure active directory for authentication because we do not believe that anybody should ever create an instance of a fire service a service intended solely for the purpose of holding clinical data and not have it securely locked down so i think authentication authorization are key here and over time we're working on resources for how you can finally control access so the authentication authorization could actually be controlled down to the individual resource where necessary in terms of interoperable this is sort of a given this is exactly what fire is about the purpose of fire is to create interoperable data um and so we really have any published standard that's broadly accepted by the healthcare industry in the pharmaceutical industry and that data can then be exchanged according to that using uh common vocabularies using common terminology systems uh and and well described in linked data is key so i don't need to spend a lot of time here because this is the fundamental purpose behind the fire specification and then lastly reusable i think this is key and some of this comes down to the implementation um and i just noticed i forgot a closing parenthesis at the end of my last bullet there i apologize for that uh but the the goal of fair of course is to optimize the reuse of data and in this case largely that's what we think about when we think about these capabilities we're building and as you think about research work um i was having a conversation with uh air horvitz the other day and he he he said this so perfectly in that you have all this data in these systems of truth right the the electronic medical record the pac system with your dicom data maybe you're connected to some uh iot system for fitness data maybe it's your billing data whatever all these systems of truth have this data and the goal is not to replace those systems or move that data but what we're doing is we're creating a digital reflection of that data in the cloud like credit eric with those words this digital reflection of the data in the cloud that enables us to then start to do something really interesting with that data and so as we talk about the reusability of data that's really what this is is taking all this data it's all these disparate locations and maybe even integrating data that you wouldn't typically associate with health like census data or education data that might be against some of the social determinants and you bring that in and you you format it in the correct way and now you've got this digital reflection of all of this data that is reusable across a wide variety of scenarios now the key of course is we think about some of these research scenarios and some of the the work we're doing here at microsoft to think about um data and research is how do we then create this digital reflection in a way that's accessible to all the people who need access to it that it's reusable so we're not recreating this data all the time that it's well described and well documented so as we think about some of these tools that we're building for digital research part of what we want to think about is how do we have then how do we associate the metadata around this to describe what this data is so that it's searchable and discoverable as a data set not as individual data to say okay well i'm looking for i'm really interested in finding data around a particular health scenario that i want to do research on how do i find that data within a particular environment whether it's within microsoft or within a health organization so the reusability data is kind of built into what we do here but there's certainly work we can do around it to describe the data and make it more discoverable from a reusability standpoint one of the key bullets here at the very end of this is meets domain relevant community standards now again fire is exclusively about clinical health data and in such there's really strict rules for how we work with this because of the sensitivity of this data in the united states hipaa is the dominant rule the healthcare interoperability and portability rules privacy rules associated with hipaa the key here is we have to be diligent and we have to be serious about our approach to the security safety and privacy of this data it is paramount nothing we do can compromise that in the systems that we've been building to support bringing this clinical data into azure and into environments in which we can do research on it so if you were if you're partnered with a health organization you're bringing their data into our cloud one is you can sort of check the box that are the services we're building for healthcare data the azure api for fire and then the associated services around it are designed from the very beginning to meet the requirements for protective health information globally not just in the united states so everywhere we deploy into an azure region we've met the rules for that area in that region so hipaa rules are um sort of the bar for us we have high uh high tech certification excuse me high trust certification fedramp certification things like that so i'll talk about that as i i'll dig into a little bit of what we've actually built um before we get done but know that that's there that's important that's extremely relevant um the last thing we want to be doing is catching you know clinical health data from our customers in excel files that we're storing uh you know in in who knows where but actually having that data in a place that's meant for it that can be used in a meaningful way and so that's sort of the relationship between the fair principles and fire and and i'm sure there's some some review there for people and some uh not but i think it's irrelevant now when we talk about how do we then start to apply this data in um clinical research scenarios what can we do with this data so i mentioned you know sort of having all these sources of truth or data generating environments the electronic health record the imaging environments the iot the building finance all these things these are all those silos that talk about the meeting they're all in different places sure in time the hope is that the the industry starts to unify and those things can can be represented in a much more cohesive way but for now that's our job we're working to enable all of the data from these different environments to be brought into our environment you know sort of this ingest and enriched process in which we create that digital reflection of all this data we persist that digital reflection and then we enable the access to find within that data set what is relevant and what you need for what you're doing and then connect it to these systems of intelligence to do whatever that research work is that you're doing and then how you want to represent that um if you want to represent it in some kind of system of engagement over on the right there's the tools to do that there's lots of ways that can be enabled and in fact you may have heard of some of the things that we're pushing out for our customers in terms of the microsoft cloud for healthcare and what we're doing with dynamics and teams all of these things are designed to work together so i want to kind of walk through this flow a little bit and talk first about what we've built and what we're doing um and let me let me actually come back to this slide i want to hop a little bit out of order i i had a slide that i wanted to share before well let me just stay on the path of mine we'll talk about in a second so ingestion and rich different ways to get data in the the three things i want to talk about well four things i want to talk about one is the direct the direct route i can get data if it's in fire format i've got a way to persist it i can kind of skip past some of this and just pull the data in but if data's coming from different places then there's different things i might want to do with it and i've highlighted three things one is device data so data coming from whether it's a wearable or an in-facility device like a hospital bed or something like that if you're doing research that is dependent upon these connected devices that are sending real-time or relative real-time data maybe daily snapshots or something like that you need to connect this sort what we call the iomt data the internet of medical things data we've built an iomt connector that will basically in a nutshell collect that data in its telemetry stream and then transform show you a couple screenshots of how some of these things work and then and work through this this flow the fire converter this is just a this is a ui that's more of a dev test utility that we built but we have a fire converter which can run as an api where you can pass in a message here you see the top panel a traditional hl7 v2 message which are often very difficult to work with they're highly customized and you sort of need a decoder ring to understand them you can build a template we have some default templates which you see in the lower left here a template that basically parses through that hl72 message and then transforms it into a fire what's called a fire bundle or a collection of fire resources and so this tool is just one example of how you can convert hl7v2 or cda documents to fire so it gives you the ability to collect this data even if it's not in fire collect this data transform it to fire and bring it into our certain r system uh the other one is the text analytics for health and like i mentioned this is an extension of the cognitive services capabilities where the text analytics has been trained on the healthcare domain and so you can identify do classification on clinical notes for example lots of valuable data is often for research in the clinical notes and somebody has to sit there and read through them and pull out the interesting data in this case you can classify data that's in clinical notes as medical data and then you you could then actually turn it into fire resources there's been some internal demos on how we could turn these into fire resources and put them in the fire system it's not part of the tool today but you could you could imagine how you could easily get from one to the other when you have things like a medication or a diagnosis or a symptom or some of these things could easily translate into fire resources uh the the next thing i want to talk about is sort of this persistence and this is where i was thinking about maybe going out of order a little bit is what i want to share is the two things that we built um the first of which is uh what we call the azure api for fire and api is a little bit of a misnomer here because it's not just an api it's actually a data management system so we've wrapped up some data tools for persisting and managing this data and exposed it through an api that adheres to the fire specifications so we sort of leveraged the fire specification we wrapped all the management of that data up in a compliance boundary which is really important so um if you're dealing with clinical data the the min bar of having to meet the hipaa compliance rules but even more so if you're starting to work with data that's government in nature the fedramp certifications if you're dealing with european data the gdpr rules all of these things have been taken care of and built into the system so the idea is that we've leveraged these standards through taking care of the compliance and management in this environment and then we enable you to connect that data to all these other systems of intelligence and systems of engagement for what you're going to do with it and where you want to go currently today we have in azure the azure api for fire it includes the ability to deploy an iot connector which is currently in public preview state but you can deploy an iot connector so you can connect those devices up and have that data go right into the api for fire some of these other tools like the fire connector and some of the other things i'm going to talk about are coming they're currently open source projects we typically release things as open source and then we bring them into the managed service as we go and one of the things you saw in the last slide that i mentioned was this medical imaging server for dicom i'm going to come around to this at the end and talk about it but i just want to mention it as another persistence tool here so we can bring clinical data into the api for fire and we can bring imaging data into the medical imaging server from dicom which i'll come back around to last next i want to talk about sort of how you get data out so once i've collected this data and essentially i've unified i've created this longitudinal view of data across a wide selection of systems so i collected ehr data iot data whatever and now i've got this longitudinal view so for a patient i can see much more than i could see before now i need to find what i really want for my research and so fire as i mentioned enables different search tools to say i want to find this but when we start talking about research there's really strict rules around how this data gets used in research and particularly what is considered secondary use and uh rhonda j kumar is going to do a session on secondary use and de-identification uh in december i believe but this is critical um there's rules around uh what can be done with the state and how can be viewed and what we've started to do is build a set of tools that put some flexibility in your hands so if you're doing research um the the first step is you just redact anything that's identifiable but oftentimes that renders the data to a state that it's not particularly useful for research and so what we are trying to do is build a set of tools and put the knobs and dials in your hand to say okay well i need to properly de-identify this data for privacy adherence but i still want some data here that i can work with and so for instance i can't include a date like an admission date is considered potentially identifiable information because if i know that somebody went to a hospital a particular time once i can find that date and time then i can find everything else associated with it i can find all of their medical data and so dave is considered identifiable and so instead of redacting the date you might still need dates to to build a view of time spans and things that happen and things like that in your in your machine learning and so we can do things like date shifting we can apply a random seed and shift dates back and forth so you can still maintain some of the the validity of the date data even though it's no longer 100 accurate you might still be you can shift the dates accordingly and do things like that and that is part of a project called the fire tools for anonymization and we're bringing those capabilities into the azure api for fire so they're built right in so i can say i want to search for a cohort i want to export that data as json data that i want to then pull into something like azure synapse or some of these other tools but as i'm exporting it i want to be identified and here's the rules for how i want that data to be identified and so when we look at the identification rules we have different ways of identifying things you can see the very the number one here you know if the node is an address type addresses are identifiable if i know where you live i know who you are i can find everything else about you but country state are considered general enough that you can keep them so i can say uh you know down below i'm going to redact addresses but i've added a rule here that says i want to keep the country and the state so i can choose to override some of these rules i can also choose to do different things beyond redact so in this case for the ids i'm doing a crypto hash on the ids rather than redacting them i can also choose to redact them which is the default rule if it's identified we'll just get rid of it or i can do things like date shift and i mentioned data shift and here you can see that for dates uh we've said date shift and down below you can see the date shift scope is for the folder so we've said everything in this set i want to date shift the same way so i don't want to accidentally make the admit and discharge dates shift apart or together i want to shift them the same way but i'm going to randomly shift them so they're not the correct dates and then i can also do things like maintain or redact key data so zip codes are a great one in the united states where um generally you have to you can provide some value of the some amount of a zip code to say something because there's enough population in a particular zip code to keep it from being identifiable but in some zip codes in the united states the population is so small that it's considered identifiable and so you have to identify those zip codes and redact them completely and so what you end up with is and here's just a screenshot of what it would look like in bs code is on the left hand side this is a patient resource and so you can see we have you know things like gender birthday deceased time this is a patient who's unfortunately passed away but we're doing research there's some extensions for allowed to launch due to where this patient is we have city state all these things and then we have our the identification rules which are similar to the ones i just showed you on screen so when we process all of these fire resources against these rules which is a utility you can run either as part of a azure data factory workflow or you can run it if you're doing some dev test work you can run it locally to make sure it's working the way you want the end result is that you know things get taken out so gender is considered not identifiable we can keep that but here you can see the date shift uh that we applied to the birth date so the birth date is still there but it's been shifted randomly as has the deceased time been shifted randomly and you can see like the address is considered identifiable so that was removed or at least reduced down to the state postal code and country and the postal code was even reduced down to just the first two digits and so this is how some of these de-identification tools can work you'll still be able to preserve the data the rules for how you would identify the data remain in your hands and then lastly i want to talk and we only have a few minutes left i want to talk about these systems or how we connect into different systems of intelligence we have a couple things there's a few ways to think about this there's a little bit of you know some assembly required so i could do things like export data de-identify and what i get is a bunch of what are called nd json files that are put into a storage account so a new line delimited json files where each line is a different fire resource dumped into a you know a storage account and then i can take those nd json files and i can bring them into tools like azure synapse it's okay i've got json data i'm going to go ahead and use the sort of sql on demand capabilities of azure synapse and i'll pull that json data in and now i can work with an incentive so i can pull it into data bricks and i can create temporary tables out of the json data join those tables together and then create a new view of that data a tabular view of that data that's appropriate for machine learning or for analytics where i might be joining different parts of that data together so i can shape the data however i want uh with this sort of sum assembly required and then we've also built some tools to sort of connect the data automatically to some tools that you might want to use so the power bi fire connector is one of those where we've actually um it's already available in power bi power bi has the option to choose fire as a data source and you point it at your fire service you authenticate against it we interrogate what's called the capability statement so the capability statement for a fire service just specifies what that particular fire service is capable of doing so i can choose much like any other standard i can choose my implementation and so in this case you can see i've got different resources so if i scroll through you can see there's lots of different resources here and all the things associated with them um so an adverse event for example as a resource i may choose to include that or not include that in my implementation of fire not for the fire server that we have in azure we have them all supported and so we just interrogate that capability statement say what data is available from this fire server and then let's represent that in power bi and give you the ability to sort of visualize that data in a variety of different ways and we're doing something similar right now with the common data service to support dynamics to build a sync agent that works in a similar kind of way that enables data to come out of the fire service and be represented in the common data service and also send updates back into the fire service as well so that we can start to work with all these different tools together in different meaningful ways and then lastly we're really close to time here an

Original Description

Clinical research faces multiple challenges, including data sourcing, data standardization, and data management, just to get to the point where you can start to create data experiments to prove a hypothesis. This is further complicated by the fact that the healthcare industry is one of the last industries to still maintain their data in on-premise systems that have little or no interoperability with other systems. To do clinical research, you need to bring data together from disparate data sources and standardize the data in preparation for analytics and/or machine learning, but the data is locked away in these on-premise systems. This session will focus on how the emerging Fast Healthcare Interoperability Resources (FHIR) specification is making healthcare data more accessible, and how you can leverage this new standard to bring together disparate data sets and create longitudinal patient data – including social determinants of health – to support your research efforts. You will learn about the FHIR standard and how it supports the FAIRness of research data and how you can leverage capabilities in Azure to support your research. Speaker: Doug Seven, FHIR See all videos in this Health Data Series: https://www.youtube.com/playlist?list=PLD7HFcN7LXReDOD9tfbLHE0Cl20T_9ws9

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Microsoft Research · Microsoft Research · 55 of 60

← Previous Next →

Frontiers in ML: Learning from Limited Labeled Data: Challenges and Opportunities for NLP

Frontiers in ML: Learning from Limited Labeled Data: Challenges and Opportunities for NLP

Microsoft Research

Frontiers in Machine Learning: Climate Impact of Machine Learning

Frontiers in Machine Learning: Climate Impact of Machine Learning

Microsoft Research

Frontiers in Machine Learning: Security and Machine Learning

Frontiers in Machine Learning: Security and Machine Learning

Microsoft Research

Hope Speech and Help Speech: Surfacing Positivity Amidst Hate

Hope Speech and Help Speech: Surfacing Positivity Amidst Hate

Microsoft Research

Early Indicators of the Effect of the Global Shift to Remote Work on People with Disabilities

Early Indicators of the Effect of the Global Shift to Remote Work on People with Disabilities

Microsoft Research

Remote Work and Well-Being

Remote Work and Well-Being

Microsoft Research

Challenges and Gratitude of Software Developers During COVID-19 Working From Home

Challenges and Gratitude of Software Developers During COVID-19 Working From Home

Microsoft Research

Towards a Practical Virtual Office for Mobile Knowledge Workers

Towards a Practical Virtual Office for Mobile Knowledge Workers

Microsoft Research

Impact of COVID-19 crisis on the future of work in India

Impact of COVID-19 crisis on the future of work in India

Microsoft Research

Empowering and Supporting Remote Software Development Team Members through a Culture of Allyship

Empowering and Supporting Remote Software Development Team Members through a Culture of Allyship

Microsoft Research

How Work From Home Affects Collaboration: Information Workers in a Natural Experiment During COVID19

How Work From Home Affects Collaboration: Information Workers in a Natural Experiment During COVID19

Microsoft Research

Phong Surface: Efficient 3D Model Fitting using Lifted Optimization

Phong Surface: Efficient 3D Model Fitting using Lifted Optimization

Microsoft Research

Managing Tasks Across the Work-Life Boundary: Opportunities, Challenges, and Directions

Managing Tasks Across the Work-Life Boundary: Opportunities, Challenges, and Directions

Microsoft Research

Microsoft Urban Futures Summer Workshop | Data Driven Urban Transformation [Day 1]

Microsoft Urban Futures Summer Workshop | Data Driven Urban Transformation [Day 1]

Microsoft Research

Microsoft Urban Futures Summer Workshop | Sensors and Data [Day 2]

Microsoft Urban Futures Summer Workshop | Sensors and Data [Day 2]

Microsoft Research

Microsoft Urban Futures Summer Workshop | Policy and Social Impact [Day 3]

Microsoft Urban Futures Summer Workshop | Policy and Social Impact [Day 3]

Microsoft Research

Directions in ML: Algorithmic foundations of neural architecture search

Directions in ML: Algorithmic foundations of neural architecture search

Microsoft Research

MineRL Competition 2020

MineRL Competition 2020

Microsoft Research

Can we make better software by using ML and AI techniques? With Chandra Maddila and Chetan Bansal

Can we make better software by using ML and AI techniques? With Chandra Maddila and Chetan Bansal

Microsoft Research

From Paper to Product

From Paper to Product

Microsoft Research

SkinnerDB: Regret Bounded Query Evaluation using RL

SkinnerDB: Regret Bounded Query Evaluation using RL

Microsoft Research

From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks

From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks

Microsoft Research

Programming with Proofs for High-assurance Software

Programming with Proofs for High-assurance Software

Microsoft Research

Platform for Situated Intelligence Overview

Platform for Situated Intelligence Overview

Microsoft Research

Directional Sources & Listeners in Interactive Sound Propagation using Reciprocal Wave Field Coding

Directional Sources & Listeners in Interactive Sound Propagation using Reciprocal Wave Field Coding

Microsoft Research

Galactic Bell Star Music Demo

Galactic Bell Star Music Demo

Microsoft Research

Importing Animations in Microsoft Expressive Pixels (9 of 9)

Importing Animations in Microsoft Expressive Pixels (9 of 9)

Microsoft Research

Welcome to Microsoft Expressive Pixels (1 of 9)

Welcome to Microsoft Expressive Pixels (1 of 9)

Microsoft Research

Getting Started with Microsoft Expressive Pixels (2 of 9)

Getting Started with Microsoft Expressive Pixels (2 of 9)

Microsoft Research

Creating an Image in Microsoft Expressive Pixels (3 of 9)

Creating an Image in Microsoft Expressive Pixels (3 of 9)

Microsoft Research

Creating Animations in Microsoft Expressive Pixels (4 of 9)

Creating Animations in Microsoft Expressive Pixels (4 of 9)

Microsoft Research

Managing Animation Galleries in Microsoft Expressive Pixels (5 of 9)

Managing Animation Galleries in Microsoft Expressive Pixels (5 of 9)

Microsoft Research

Creating Fragments in Microsoft Expressive Pixels (6 of 9)

Creating Fragments in Microsoft Expressive Pixels (6 of 9)

Microsoft Research

Using Layers in Microsoft Expressive Pixels (7 of 9)

Using Layers in Microsoft Expressive Pixels (7 of 9)

Microsoft Research

Exporting Animations with Microsoft Expressive Pixels (8 of 9)

Exporting Animations with Microsoft Expressive Pixels (8 of 9)

Microsoft Research

What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 2/2)

What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 2/2)

Microsoft Research

What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 1/2)

What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 1/2)

Microsoft Research

Planeverb: Interactive sound propagation for dynamic scenes using 2D wave simulation

Planeverb: Interactive sound propagation for dynamic scenes using 2D wave simulation

Microsoft Research

Making cryptography accessible, efficient, and scalable with Dr. Divya Gupta and Dr. Rahul Sharma

Making cryptography accessible, efficient, and scalable with Dr. Divya Gupta and Dr. Rahul Sharma

Microsoft Research

Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 Talk)

Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 Talk)

Microsoft Research

Optics for the cloud – Light at the end of the tunnel? (SIGCOMM 2020 Workshop)

Optics for the cloud – Light at the end of the tunnel? (SIGCOMM 2020 Workshop)

Microsoft Research

Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 short talk)

Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 short talk)

Microsoft Research

Sirius: A Flat Datacenter Network with Nanosecond Optical Switching (SIGCOMM 2020 short talk)

Sirius: A Flat Datacenter Network with Nanosecond Optical Switching (SIGCOMM 2020 short talk)

Microsoft Research

Novel Image Captioning

Novel Image Captioning

Microsoft Research

Forest Sound Scene Simulation and Bird Localization with Distributed Microphone Arrays

Forest Sound Scene Simulation and Bird Localization with Distributed Microphone Arrays

Microsoft Research

Decoding Music Attention from “EEG headphones”: a User-friendly Auditory Brain-computer Interface

Decoding Music Attention from “EEG headphones”: a User-friendly Auditory Brain-computer Interface

Microsoft Research

How does holographic storage work?

How does holographic storage work?

Microsoft Research

The physics of hologram formation in iron doped lithium niobate

The physics of hologram formation in iron doped lithium niobate

Microsoft Research

Introduction to coax: A Modular RL Package

Introduction to coax: A Modular RL Package

Microsoft Research

Directions in ML: "Neural architecture search: Coming of age"

Directions in ML: "Neural architecture search: Coming of age"

Microsoft Research

Microsoft Research AI Breakthroughs 2020: 20 minute research talks + Q&A panel

Microsoft Research AI Breakthroughs 2020: 20 minute research talks + Q&A panel

Microsoft Research

Fireside Chat with Johannes Gehrke during Microsoft Research AI Breakthroughs 2020

Fireside Chat with Johannes Gehrke during Microsoft Research AI Breakthroughs 2020

Microsoft Research

Fireside Chat with Susan Dumais during Microsoft Research AI Breakthroughs 2020

Fireside Chat with Susan Dumais during Microsoft Research AI Breakthroughs 2020

Microsoft Research

Microsoft Research AI Breakthroughs 2020: 20 minute research talks, Q&A panel, and event wrap-up

Microsoft Research AI Breakthroughs 2020: 20 minute research talks, Q&A panel, and event wrap-up

Microsoft Research

Clinical Research with FHIR

Clinical Research with FHIR

Microsoft Research

Soundscape Street Preview

Soundscape Street Preview

Microsoft Research

Tilt-Responsive Techniques for Digital Drawing Boards

Tilt-Responsive Techniques for Digital Drawing Boards

Microsoft Research

SurfaceFleet: Exploring Distributed Interactions Unbounded from Device, Application, User, and Time

SurfaceFleet: Exploring Distributed Interactions Unbounded from Device, Application, User, and Time

Microsoft Research

Haptic PIVOT: On-Demand Handhelds in VR

Haptic PIVOT: On-Demand Handhelds in VR

Microsoft Research

SurfaceFleet Supplemental Video Demonstration (UIST 2020)

SurfaceFleet Supplemental Video Demonstration (UIST 2020)

Microsoft Research

This video teaches the importance of FHIR in clinical research and how it can be used to standardize healthcare data and enable interoperability between different systems. The video also covers the use of FHIR in machine learning and data analytics, and provides practical examples of how to work with FHIR-based data.

Key Takeaways

Export data from electronic medical records
De-identify data using FHIR tools for anonymization
Store data in a storage account as nd json files
Pull in nd json files using Azure Synapse
Use SQL on demand capabilities to analyze data

💡 FHIR is a critical standard for healthcare data and interoperability, and its use in clinical research can enable the creation of a digital reflection of clinical data that is reusable across various scenarios.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

What Is MLIR and Why Does It Exist?

Learn about MLIR, a intermediate representation for machine learning models, and its purpose in optimizing ML workflows

Dev.to · Fedor Nikolaev

Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model

Choosing the right machine learning development company is crucial for turning AI investments into measurable results, as it can make or break the success of AI projects

Medium · Machine Learning

Data privacy in AI training: federated learning, differential privacy, and synthetic data

Learn how federated learning, differential privacy, and synthetic data preserve data privacy in AI training, and why they matter for secure machine learning

Data Preprocessing: Encoding and Feature Scaling in Machine Learning

Learn to preprocess data by encoding and scaling features for better machine learning model performance

Medium · Machine Learning

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB