Modelling Evolution

Data Skeptic · Beginner ·🏗️ Systems Design & Architecture ·2y ago

Key Takeaways

Modelling evolutionary processes using SLiM, a tool for simulating population genetics over time, with a focus on natural selection and its various sources.

Full Transcript

[Music] today on data skeptic animal intelligence we're talking about modeling Evolution I speak with Ben holler who is the primary developer on a project called slim it's not worth defining that acronym but it's a simulation system for ecological purposes Becky were you familiar with this before we started looking into the topic I was not and I was really Pleasant surprised with this interview and how much I learned and what a powerful tool this is I can totally see why evolutionary biologists would want to use this could you give a quick summary before we let Ben do the formal definition sure so slim allows researchers to basically model Evolution from a genetic standpoint actually looking at the genetic code but they're able to customize their simulations using different scripts to add in all sorts of functions and variables that you might not be able to do with other pieces of software so if I wanted to model the evolution of some birds in response to a predator but I wanted to know how does mate preference so females liking blue feathers or something like that interact with say this predation selective pressure you could potentially do that with a tool like Slim So the customization is just really wild we have lots of things um evolutionary biologists are big fans of shiny apps if if you're familiar with those of making things for students yeah where you can play with variables and evolution in evolutionary systems and see what happens so this is like that on steroids it's phenomenally cool yeah it's got its own scripting language so that gives it a lot of that robustness and flexibility doesn't look too hard to learn certainly uh you know something you got to sit down and really immerse yourself in but after that a powerful tool for a lot of researchers absolutely I'm super excited well let's get right into the [Music] interview well the one constant in my own career is lifelong learning I've never been able to rest on my Laurels so if you're looking to Pivot or to grow and become a data Savvy leader consider the Georgia Tech sheller College of Business they offer a 100% online business analytics graduate certificate program it teaches professionals how to analyze and interpret data the online program's got four courses business analytics for managers machine learning for business business data prep and visualization and Analysis of unstructured data that is a great coverage of topics to prep you to be a professional in any industry this program definitely allows a work life balance so if you've got a lot going on it'll fit in if you want to push hard you can take two courses a semester and graduate within a year the business analytics graduate certificate can be your gateway to a Georgia Tech MBA you can apply select credits towards the Georgia Tech full-time MBA evening MBA or online Ms and analytics so are you ready to become a data Savvy leader get the skills necessary to do that applications for the Georgia Tech sheller college of business business analytics graduate certificate are open now for spring 2025 visit techg grad certificates.com to learn more and don't forget to apply by October 1st that's the deadline for more information head over to techg grad certificates.com [Music] hi I'm Ben Hower and I work at Cornell University I'm sort of a a research programmer in a sense I work on a software program called slim which I guess is what we're going to talk about today I've been there since 2014 working with a professor named Philip Messer and uh yeah I hope to work here for the rest of my career and can you tell us a little bit about how you get started where you always in Academia and the sciences no quite the opposite I started out as a software engineer I got into writing software when I was quite young still in high school even before that really and worked at Berkeley systems on a screen saver program called after dark and worked at Apple on some HTML stuff that underlay the the Apple Store at the time and various things like that and eventually I got tired of Silicon Valley culture and I wanted to do something that felt more meaningful and so I went back to school I had to get my undergrad degree first so I went to San Jose State and got my undergrad in biology conservation biology and then went off and got my PhD at Mill in Montreal then ended up at Cornell doing what I'm doing now and uh besides you know leaving Silicon Valley what Drew you specifically to science well I've always been interested in evolutionary biology when I was a little kid my parents like to read aloud to each other my mom would be sewing or cooking or something like that and my my father would be reading a book to her and often those books were you know Steph J Gould and Richard Dawkins and things like that uh sort of the classics of evolutionary biology for a popular audience I was always fascinated by those and so I kind of fell into software engineering because it was just something that I was good at and it was kind of easy money but evolutionary biology was always the path not taken for me and so when I decided that I was tired of Silicon Valley um evolutionary biology was the obvious direction to go and it's been nice because it combines in my work now I'm able to use both of those backgrounds I use the software engineering and the the formal training in evolutionary biology together to do what I do for me it's not obvious that that choice is on the table at what point did you realize your software skills would serve you well when studying evolutionary biology it took longer than it probably should have I didn't really figure it out for a long time even when I was applying to graduate schools I was applying to places uh I've always been interested in birds and Ornithology and so I was applying to grad schools trying to get somebody to fund me to you know go up in the Andes and mistet birds and do genetic analyses of hybridization zones and stuff like that and eventually I realized oh everybody actually is interested in me for my computational skills so maybe that's what I need to do which is for the best I don't think I would have been that good at MIS netting birds in the Andes anyway so I'm I'm better at what I'm doing now well let's introduce slim can you break down the acronym and tell us a little bit about the framework yeah well the acronym originally stood for selection on linked mutations that was the name that Philip gave it Philip who I work with wrote the original version of slim back when he was I guess working as a postto at Stanford and he made up the name and it's quite a nice name it's catchy and memorable and short but selection on linked mutations was referring to what was special about slim back then in like 2013 that it could simulate more than one mutation in a population at the same time where each mutation is under selection and the Dynamics of that are surprisingly complex and you can't really model it analytically you can't write down equations that represent what's going to happen in the system you have to simulate it the way that slim does so Philip wrote slim to do that and so he named it selection on linked mutations but nowadays slim does much much more than that and so the name is not so appropriate anymore the acronym is not so appropriate we've talked about maybe uh repurposing the acronym to stand for simulating life in machines instead but we haven't made a decision on that yet so right now we just call it slim and we don't really think about what it stands for maybe we could break down some of the layers in slim I know like there's slim core but there's also the guey what are the components that make up the slim ecosystem slim runs simulations and those simulations are scriptable you write a script in a language called and so is the first component of slim it is a scripting language that I invented and that I wrote all the code for but it's based kind of loosely on the language R which a lot of people use in Academia and in grad school so a lot of people know R already and so basing on R made it more accessible to its audience so the the interpreter for the language is one part of slim then there's the slim core which runs the core part of the simulations except for the aspects that are scripted by the user and then it has a guey called slim guey so that's a graphical user interface uh a visual modeling environment an integrated development environment whatever you want to call it it's an application that you run on your computer that visually shows you what your model is doing and lets you play around with it interactively which is really valuable for exploring the possibilities of a model and understanding what it does in a way that is hard to get from you know just raw text output going into your terminal window or something like that well I'm not sure if you have the concept of personas around the software something as formal as that but do you have a sense of the types of users that uh find their way to slim it started out when Philip wrote it originally it was pretty much a tool for population geneticists and so that's a subfield of EV evolutionary biology that is primarily concerned with the Dynamics of mutations over time in populations and those mutations might be selected for or they might be selected against or they might be neutral and population genetics thinks about how are the frequencies of those mutations going to change over time how is the course of evolution going to unfold at a rather detailed level so slim was for that but I've been working on broadening Slim's Ambit over the years years and now it's for much more than that it can model multiple species now it can model ecological interactions between individuals and between species so things like predation and parasitism and cooperation and mutualism and and so forth can all be modeled in slim you can even Model Behavior and so since we're in the animal behavior part of data skeptic I think that's maybe why you want to talk to me but Slim's modeling of behavior is mostly through scripting it doesn't provide a lot of built-in behaviors for you know how individuals disperse or forage or choose their mates or some things like that instead you do that through scripting and what does like a a hello world example look like in it would be just a couple of lines of code you would declare something called an initialized callback callbacks are little chunks of code that do something slim calls them to do something and the initialized Callback is your code that initializes your simulation so your IED callback might you know set up a genetic structure the length of the chromosome and the recombination rate and the mutation rate and so forth and then you would have what's called an early event which is another kind of callback basically that would create the initial subpopulation that you that you're going to simulate it could create more than one subpopulation and set up migration rates between them and uh you know set different sizes for them and so forth but you know the simplest hello world would just be a single subpopulation and then you would have a final a mination kind of event that would set the end point of the simulation and produce some output it might write out all of the mutations that have fixed in the population over the course of the simulation for example or whatever whatever sort of output you want but usually you want to know what your simulation did so at the end of it you you produce some output well I know just a touch about physics simulations and uh if you take the time to do those right you do something with magnetisms you're going to follow Maxwell's equations that truly is the source code of the universe like that simulation when I was taught genetics algorithm either in undergrad or grad I can't remember what I built was a toy problem all the ideas were there crossover mutation yeah I had to iterate and stuff like that but it didn't in my head connect to the real world how simulated versus uh you know close to reality does slim process the data I'm not sure whether your simulation that you wrote was uh an individual based simulation or not but slim is this is also sometimes called it agent based so we're modeling each individual organism in the slim model as it runs forward so the individuals choose mates and reproduce new individuals get born old individuals die they can move around in space they can interact with each other they could do things like Gathering resources even learning you can have individuals learn from what they encounter in the environment but beyond the most basic behaviors like reproduction most of the rest of what you would be doing uh in your slim model would be in your script and so it's really up to you how biologically realistic you make it you can model genetics up to the full genome scale so you could do a model of you know all of the human chromosomes and the genetic structure within them you know coding and non-coding regions and exons and introns and so forth slim can certainly model at that level of detail although you know the bigger the genetic model the slower it gets of course and similarly with the spatial environment that the individuals live in you can model just a single population that is what we call panmictic where there's really no spatial structure and individuals have equal probability of interacting with every other individual or you can model separate deems where there are separate populations of individuals that are connected by migration but are otherwise isolated from each other or you can Model A continuous spatial landscape where the individuals are really living you know on a 2d or even 3D landscape and interacting with the individuals that are nearby so you can get to that level of biological realism and your landscape can have terrain you know elevation and temperature variation and so forth so you can build quite a realistic environment and similarly you can build realistic behaviors but you're going to have to write the script for those yourself Beyond a certain pretty fun foundational level that slim provides well in some of those cases I think of the agents as making decisions it's a choice to choose a mate it's a choice to um gather resources or not how does the simulation simulate the decision as Slim simulates it is executing what I call Ticks so ticks are just units of time and within one tick a series of operations always occurs uh reproduction and then Fitness evaluation and then mortality based on the fitness values that were just calculated and a couple of other sort of bookkeeping things and and then the next tick starts and then the next tick starts within each of those ticks in an event like the reproduction phase of the tick cycle individuals would be doing things like choosing mates and generating Offspring and things of that sort you could either go with Slim's default behavior for that which well for the for the right Fisher model for example which slim supports mates and parents in general would be chosen with a probability proportional to their Fitness so the higher Fitness an individual is the more likely is to get reproductive opportunities or you could write a call back so that's the purpose of callbacks in your script is to override a default Behavior like that in slim and say well no I want my individuals to say look at the individuals that are near to them in space take the five or 10 closest ones evaluate them in terms of how fit do they look how large are they how many resources have they gathered and select a mate based on those kinds of criteria and maybe some stochasticity as well you would plug into a particular point in Slim's tick cycle and write a little chunk of callback code like that that would Implement whatever mate Choice Behavior you want you know monogamy or sequential mate Choice like I just described or or any of a number of other things yeah when I'm getting involved in a new tool or learning some new system my instinct is like let me kind of go with the defaults but then I'm definitely going to Tinker I'm going to go in there and customize some stuff probably break it a few times but play with all the knobs do you have a sense of what a typical user journey is like for someone getting involved with slim the first part of the user journey is getting past the learning curve it is a complex piece of software the the manuals for slim and combined are I think over a thousand Pages now a lot of that is example recipes the the slim manual has I think more than 150 different example recipes now for how to construct a model that does this that or the other kind of thing and then people can sort of read those recipes learn from them and start to figure out how to develop a model of their own so a lot of it is examples but still you know a thousand Pages it's it's a steep learning curve and I offer a slim Workshop that helps people to get past the first part of that learning curve uh with more of a gentle systematic introduction to the things that pretty much everybody needs to know about it once you get past that point then paths really diverge enormously as far as how different people use slim some people want to do a small model that they can run on their local machine whereas other people you know want to do a big model that takes potentially weeks to run and so they need to do their model runs on a Computing cluster some people use the guey a lot and other people don't because they just like terminal and of course it depends on the system that you're modeling if you're modeling Coral versus modeling rhinos you know versus modeling trees the kind of model that you're going to be building is going to be very different and you're going to be drawing on different example recipes from the slim manual and using different facilities within slim so that's a tricky thing in developing slim is that it has to be enormously general purpose in being able to be adapted to all of these different kinds of goals and multispecies ecological models as well but at the same time it needs to be extremely performant because individual based simulations are notoriously slow and uh performance is is almost always an issue for people could you speak to some of the coding challenges both solved and maybe if you're still facing any in that process in optimization specifically absolutely yeah optimization is sort of its own branch of software engineering really it's it's a dark art you know you get into profiling the code using some kind of a a profiler and there are various kinds that are often built into your development environment and then based on that profile you find the hotspots in your code and you try to figure out how to make them faster and sometimes that means just optimizing the code you know hand coding things in a more efficient way without changing the algorithm that you're using but sometimes often the biggest performance gains come from changing the algorithm so you're realize oh the way that I'm doing this is actually algorithmically inefficient and even if I were to optimize the code that might give me you know a 2X or 5x speed up but if I change the algorithm to be something smarter it might give me 100 x speed up or a th000 x speed up even so thinking about the algorithms that are used under the hood is the most important part of optimization and computer scientists have a whole terminology for this you can call something order and squared and so that means that depending on say the number of individuals that you're simulating you might call that n and the model might be order n it scales linearly in the number of individuals so if you model 10 times as many in individuals it'll get 10 times slower or it might be order n squar so then if you model 10 times as many individuals it gets a 100 times slower so often a lot of optimization is trying to find a way from an N squared or even an N cubed algorithm down to say an N log n algorithm and so that will often uh increase the performance enormously especially for the biggest models which are the ones where you really care the most well how technical does a user need to be if they want to adopt slim well I guess that depends on your perspective is a pretty simple scripting language so getting into and learning it well enough that you can write the slim script that you want to write is from my perspective as a software engineer not terribly technical but for someone who's never programmed before it's probably a pretty difficult thing to learn then there's the biological side of it as well you need to understand your system the biological system that you're interested in well enough to have an idea of what a good model for that system would look like potentially if if you want to be modeling some concrete real world biological system you need to be technically Savvy enough to you know know how to get the genome sequences that you want either doing it yourself or getting them from some database somewhere or some publication somewhere and then modeling itself is kind of an art form and needs to be thought about and and carefully as well there's a very famous saying from a modeler named George box he said all models are wrong but some are useful and this is a very important concept you are never going to make a model in slim or any other modeling environment that is the truth that is biologically accurate and you mentioned physics and physics models before maybe there's some hope of making a physics model that is the final ultimate truth and that is you know in some sense the same code that the universe is actually running that is really not a realistic hope in the world of biology will never happen there's always more biological detail going on in the real world than we could ever hope to put into a simulation and so the art is in choosing what to put in and what to leave out in some sense it's like that old story about Leonardo and you know carving a sculpture that how how do you carve a sculpture of of David from a block of marble well you just start with a block of marble and cut away everything that isn't David sounds very simple but uh the devil is in the details so how do you make a good model well you just cut away everything that you don't need that isn't essential not just to the biological system that you're simulating but to the particular research question that you have the right model is going to be different depending on what question you're trying to answer and so there's a lot to think about in that area to make a good model that has nothing to do with slim it's just general modeling background and and technique and could you share some details about the outputs of the simulations when people are using slim what are they looking for once the simulation has completed that also really depends a lot on the questions that you're trying to answer you might output something very realistic looking you know the same kind of data that you would have for field work or wet lab work so that would be something like a genome sequence in VCF or fasta format things like that that can be read and processed by many open- Source tools that are that are out there or you might output something quite abstract well you can output a tree sequence for example so you might be particularly interested in the pattern of ancestry that results at the end of your simulation so for example if you're running a big spatial simulation you might wonder at the end of the simulation you've got individuals across the whole landscape but do they all Trace their ancestry back to one particular area of the landscape in the end and what area is that and why and how long does it take to trace back to that one spot how much wandering around does do the patterns of ancestry do on the landscape before they ultimately coales to that one location all kinds of questions like that that you could answer with a recorded tree sequence or you might have some specific question you know what's the distribution of runs of homozygosity in the final state of the model or some question like that um and your model just might produce that kind of output directly by analyzing what the simulation has done I'd love to talk more about the modeling to what degree do most users go in and and tinker and uh what does the data structure look like what is a model in the system I guess well the model in the system is really your script most or all of the tinkering goes on there there are also some other input files like the initial genetics that you want your model to start with landscape maps that maybe Define the spa environment that the individuals are living in migration rates and you know parameter values like mutation rate and so forth those are all of the inputs but but mostly the model is the code that you've written under the hood slim is doing all sorts of stuff with this for interpreting the code in the first place it builds a computer science data structure called an as tree that's a abstract syntax tree that is a tree-shaped representation of the code that you put into slim it turns out that it's useful often to represent code as a tree rather than as a linear text file because it makes it easier to kind of un for The Interpreter to understand the structure of the code and execute it efficiently so those data structures like that if you build a a spatial model then for searching for nearby individuals in the vicinity of some focal individual slim builds a data structure called a KD tree for you so that's uh another sort of standard computer science data structure that's very useful for doing fast spatial searches it kind of breaks up the space that you're simulating into a hierarchy of nested boxes in a way so that you can find things that are close to other things by descending through that hierarchy of boxes and it makes it well it makes it n log n instead of order N squared to use the terminology that I that I outlined before so there's all sorts of data structures like that under the hood but at the level of a user using slim you don't really need to worry about those things at all that's all in the C++ code that slim is implemented in and almost never does the end user go into the C++ code it's uh it's beyond the level of expertise that that users of slim are expected to have when you're scripting in slim instead you're interacting with Slim in your code using some classes that are defined for you is an object-oriented language meaning that it supports thinking about problems in terms of entities that that problem is composed of and how those entities interact with each other and that's a very natural Paradigm for to support because you know individuals that you're simulating are objects mutations are objects populations are objects and so forth it's very natural to use an object-oriented Paradigm and the user interacts with those kinds of objects and accesses properties and makes method calls and so forth that tell slim how what those objects should do basically well you'd mentioned that there's support for agents that have learning capabilities could you expand on that that would at the present time all be done in script individuals that you're simulating in your model can have any kind of state attached to them that you want to Define so they could remember which other individuals they've interacted with in the past for example so they could be playing repeated prisoners dilemma games with each other and they could remember oh I interacted with that guy before and he you know messed with me and I don't I don't want to cooperate with him again because I just got screwed or they could remember oh this area of the landscape had a lot of resources and the one I'm in now doesn't seem so great maybe I'll move back toward that area that was better you know any kind of learning like that that you want you could attach that learned state to the individuals in your model and make them behave in a way that's informed by those by those past experiences now I've been thinking I actually listened to some of the previous animal behavior episodes in in data skeptic and it got me to thinking that there's probably a lot more that slim could do to support learning in these kinds of models and I don't think that it could necessarily support learning at the individual level terribly usefully that's probably best done in the user script because there's going to be so much Variety in kind of learning you want your individuals to be doing it's it's better to just script things that that are you know that diverse but slim probably could support learning at the level of instinct and evolved behaviors that are intrinsic rather than learned through experience or through social learning so I've been thinking about the idea it's very early days for this thought uh since it just came up to me a couple of days ago but I've been thinking that slim could support some sort of neural network type mechanism or maybe something like a gausian process backed learning mechanism where the experiences of individual organisms would end up being training data for that learning data structure and the learning data structure would represent what the species as a whole has learned and perhaps that would even evolve in a sense where uh one branch of the ancestry tree would have learned one thing in one environment and a different branch of the ancestry tree would have learned a different thing from living in a different environment and then you know if those two subpopulations came into contact in a sense their two instinctive learned behavior sets would come into competition and you would get to see which was actually Superior in the shared environment things like that I don't know how practical that is of course these machine learning algorithms require a lot of training data to train them even running on gpus and so for in the most efficient way possible it can take quite a long time for them to get trained up to a level where they're useful so I don't know whether that's really within Slim's reach or not but it's certainly an interesting idea that I'm I'm going to think about more I mean every agent has its own processor so it's a big computational load to take on but could be a really interesting step forward in agent modeling software yeah and I think you know my goal for slim is that it be able to bridge the gap s between many different subfields in evolutionary biology you know it now can simulate from the level of individual nucleotides and that sort of molecular biology level all the way up to the level of ecosystems and communities at the top level but Behavior since it has to be scripted entirely right now is kind of missing from What Slim provides builtin you know you can script it but that is difficult and it would be lovely to be able to bring in behavioral ecology and evolutionary behavior and so forth uh bring that into the fold as well because I think evolutionary biology and many fields in science of course has kind of splintered into all of these different subfields that don't really talk to each other that much anymore but they're all important to each other they're all studying different facets of the same thing in the real world ultimately one of my goals for slim is to build those bridges and get people talking to each other again and make it possible to build a model that simulates nucleotides but also has animal behavior in it and also has multiple species and ecology in it and so forth because all of those things in reality interact and and they're all happening out there in nature at the same time so dividing nature up into these little bits is not always the right way to look at it I don't know that you've had the chance to do a literature search but do you see people citing results they're getting from slim oh sure I mean yeah slim is is quite widely used there's probably I don't know well over a thousand Publications out there now that used slim for modeling in one way or another and that's mostly still within the population genetics World we're only just starting to break into more evolutionary ecology and other branches of evolutionary biology because the multispecies capabilities only got added to slim I don't know about a year ago now it takes some time there's a lag time between when you add a feature and when you start seeing public app that are actually using that new feature that you added because science is slow but I think we're starting to get there or could you summarize some of the ways people use the multiple species option what has that opened up in terms of new research yeah well like I said we're only starting to see publications related to that but I can certainly speak to what people want to be able to simulate these processes processes like Evolution don't happen in a vacuum when one species evolves that often has KnockOn effects on other species that are in the same environment and that might be due to direct interactions you know if if a wolf evolves to be better at hunting rabbits that's going to very directly impact the rabbits or it might happen through more indirect effects if an insect gets better at eating a particular plant then that might affect other insects or other animals that also depend on that plant either for food or for other purposes as well so you know when humans cut down forests that impact all of the animals that live in that Forest even though we're not deliberately going out and killing those animals directly we are indirectly killing those those forest dwellers so all of these things are connected so people want to be able to model these kinds of co-evolutionary processes where the evolution of one species affects the evolution of another species there are also what are called ecoevolutionary Dynamics where ecology and evolution are intertwined where the evolution of a species changes its population size for example makes it uh more fit and therefore it can grow to be larger and therefore its ecological impact becomes larger on other species even if the change in the organism itself The evolutionary change didn't matter the ecological change might matter to other species yeah co-evolution Eco ecoevolutionary Dynamics people want to be able to model things like complex life cycles if you have a lot of acidic organisms uh live in different host species at different points in their life cycle and move from host to host over time you know from a snail to a human to a you know whatever being able to model that is important to understanding for example what control measures might do if you try to control the snail population to reduce the number of parasites that are in an environment what effect is that really going to have when maybe there are alternate hosts and the snails might evolve away from the control measures that you're trying to impose and so forth to get a good picture of what's going to happen you might need a multispecies model all sorts of things like that I mean yeah it's it's hard to it's really all of the crazy diversity of the world is all multispecies interactions how do you make decisions about what new features to work on oh there's a lot of competing demands people always want their simulations to be faster of course so one thing that I've been working on quite a bit uh over the last year and a half two years is parallelizing Slim so right now Slim can only run on a single core and that isn't as much of a problem as you might think because you usually want to do many many runs of slim you want to look at different parameter values and even for a given set of parameter values you typically want to do lots of replicate runs because the outcome of a slim simulation is stochastic and so you don't want to get just a single result you want to sort of get a sense of what the distribution of results is going to be for a given set of parameter values many many runs thousands of runs tens of thousands millions of runs you can kind of spray All Those runs across all your available cores and things work out pretty well but sometimes people want to do really big models where they want to model every mosquito in North Africa or something like that and the population size is just enormous the genetic model maybe is enormous like the whole Human Genome the time scale that you want to model on might be enormous many many generations and there are you know various ways that you can try to tackle that problem but one is to try to make the model run across multiple cores and take advantage of parallel processing to be able to run faster that's a very complicated problem for a piece of software as complicated as Slim there are so many different moving Parts under the hood so many different Loops that need to be parallelized and so forth so that's one thing is optimization is is always a demand on my time new features people always have new stuff that they want to be able to model the behavior stuff that I that I mentioned uh just now is one perhaps fruitful area of future expansion but there are more basic things too slim only supports modeling a single chromosome intrinsically you can you can simulate multiple chromosomes but you have to do it by playing a trick of putting recombination break points between your chromosomes with a recombination rate of 0.5 and that basically makes the different chunks in the simulation assort independently as if they were separate chromosomes it's kind of a hack it works fine so you can model you know the entire Human Genome but it's ugly it's a hack and slim really should support multiple chromosomes intrinsically so that's something I need to do improving various parts of the computational model you know right now Slim started out kind of oriented toward mandelian traits where a single Locus controls some trait that is under selection but a lot of what happens in real world biology is quantitative traits or polygenic traits where there are many lowai at different positions along the chromosome that are all affecting the same trait that's under selection you know human height is a good example there are many many different genes that influence human height Each of which has a very small effect on height so it is still heritable and it still does follow darwinian evolution in some sense but it's not mandelian and slim wasn't originally built to model quantitative traits like that and so again like multiple chromosomes you can do it but you kind of do it by hacking it in in your script in a somewhat ugly way and it would be really nice if slim supported that more aesthetically and and more intrinsically another Demand on my time is you know developing the workshop materials uh improving the manual all of that kind of stuff it's actually probably at least 50% of my time is spent writing not coding answering questions on the slim discuss support email list that takes quite a bit of time sometimes sometimes it feels like I'm trying to get something done and just every 15 minutes a new question is coming in and you know uh that that keeps me busy so there are lots of different things to be juggled uh and I'm the only person that works full-time on slim but I do want to acknowledge some of the other contribut ERS and collaborators uh in particular uh Philip Messer who I work with at Cornell and and have from the start also Peter Ralph at the University of Oregon who has contributed in really big ways to uh particularly tree sequence recording and continuous space modeling and then a whole bunch of other people that help out with Slim in one way or another people that have submitted feature requests and Bug reports and people that handle building the installers for different platforms and all kinds of stuff like that that's incredibly helpful so uh it is a team effort well an exciting project I think Ben where can listeners go online to learn more about it or follow the progress well there's a slim homepage that is on The Messer Lab website um I assume you can in some way attach a URL to this podcast so yeah look for the URL to the slim homepage uh somewhere below this podcast and then there are the manuals that you can read there's the slim Workshop that you can go to all of that's all available online uh Google will find it for you yeah and we'll have some links in the show notes as well Ben thank you so much for taking the time to come on and tell us about the project all right well thank you very much for having me it's a real treat to get to talk about this to a wider audience than I usually reach [Music]

Original Description

Modeling evolutionary processes goes way beyond the Hardy-Weinberg Equilibrium we all learned in biology class. Natural selection comes from many sources like resources availability, mate preferences, competition. Modeling entire populations of organisms of different species is the holy grail of digital evolution. Join our discussion with evolutionary biologist and software engineer Ben Haller to learn about his work on SLiM and how it helps other biologists model population genetics over time.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 0 of 60

← Previous Next →
1 Data Skeptic book giveaway contest winner selection
Data Skeptic book giveaway contest winner selection
Data Skeptic
2 OpenHouse - Front end and API overview
OpenHouse - Front end and API overview
Data Skeptic
3 OpenHouse Crawling with AWS Lambda
OpenHouse Crawling with AWS Lambda
Data Skeptic
4 [MINI] Logistic Regression on Audio Data
[MINI] Logistic Regression on Audio Data
Data Skeptic
5 Data Provenance and Reproducibility with Pachyderm
Data Provenance and Reproducibility with Pachyderm
Data Skeptic
6 [MINI] Primer on Deep Learning
[MINI] Primer on Deep Learning
Data Skeptic
7 Big Data Tools and Trends
Big Data Tools and Trends
Data Skeptic
8 [MINI] Automated Feature Engineering
[MINI] Automated Feature Engineering
Data Skeptic
9 The Data Refuge Project
The Data Refuge Project
Data Skeptic
10 [MINI] The Perceptron
[MINI] The Perceptron
Data Skeptic
11 [MINI] Feed Forward Neural Networks
[MINI] Feed Forward Neural Networks
Data Skeptic
12 Data Science at Patreon
Data Science at Patreon
Data Skeptic
13 [MINI] Backpropagation
[MINI] Backpropagation
Data Skeptic
14 [MINI] GPU CPU
[MINI] GPU CPU
Data Skeptic
15 OpenHouse
OpenHouse
Data Skeptic
16 [MINI] Generative Adversarial Networks
[MINI] Generative Adversarial Networks
Data Skeptic
17 [MINI] AdaBoost
[MINI] AdaBoost
Data Skeptic
18 [MINI] The Bootstrap
[MINI] The Bootstrap
Data Skeptic
19 [MINI] Dropout
[MINI] Dropout
Data Skeptic
20 [MINI] Gini Coefficients
[MINI] Gini Coefficients
Data Skeptic
21 [MINI] Random Forest
[MINI] Random Forest
Data Skeptic
22 [MINI] Heteroskedasticity
[MINI] Heteroskedasticity
Data Skeptic
23 [MINI] ANOVA
[MINI] ANOVA
Data Skeptic
24 Urban Congestion
Urban Congestion
Data Skeptic
25 [MINI] The CAP Theorem
[MINI] The CAP Theorem
Data Skeptic
26 Unstructured Data for Finance
Unstructured Data for Finance
Data Skeptic
27 Detecting Terrorists with Facial Recognition?
Detecting Terrorists with Facial Recognition?
Data Skeptic
28 Predictive Models on Random Data
Predictive Models on Random Data
Data Skeptic
29 [MINI] Entropy
[MINI] Entropy
Data Skeptic
30 [MINI] F1 Score
[MINI] F1 Score
Data Skeptic
31 Causal Impact
Causal Impact
Data Skeptic
32 Machine Learning on Images with Noisy Human-centric Labels
Machine Learning on Images with Noisy Human-centric Labels
Data Skeptic
33 The Library Problem
The Library Problem
Data Skeptic
34 Stealing Models from the Cloud
Stealing Models from the Cloud
Data Skeptic
35 Data Science at eHarmony
Data Science at eHarmony
Data Skeptic
36 Multiple Comparisons and Conversion Optimization
Multiple Comparisons and Conversion Optimization
Data Skeptic
37 Election Predictions
Election Predictions
Data Skeptic
38 [MINI] Calculating Feature Importance
[MINI] Calculating Feature Importance
Data Skeptic
39 MS Connect Conference
MS Connect Conference
Data Skeptic
40 Music21
Music21
Data Skeptic
41 The Police Data and the Data Driven Justice Initiatives
The Police Data and the Data Driven Justice Initiatives
Data Skeptic
42 Studying Competition and Gender Through Chess
Studying Competition and Gender Through Chess
Data Skeptic
43 [MINI] Goodhart's Law
[MINI] Goodhart's Law
Data Skeptic
44 Trusting Machine Learning Models with LIME
Trusting Machine Learning Models with LIME
Data Skeptic
45 [MINI] Leakage
[MINI] Leakage
Data Skeptic
46 Predictive Policing
Predictive Policing
Data Skeptic
47 Mutli-Agent Diverse Generative Adversarial Networks
Mutli-Agent Diverse Generative Adversarial Networks
Data Skeptic
48 [MINI] Convolutional Neural Networks
[MINI] Convolutional Neural Networks
Data Skeptic
49 Unsupervised Depth Perception
Unsupervised Depth Perception
Data Skeptic
50 [MINI] Max-pooling
[MINI] Max-pooling
Data Skeptic
51 MS Build 2017
MS Build 2017
Data Skeptic
52 Activation Functions
Activation Functions
Data Skeptic
53 Doctor AI
Doctor AI
Data Skeptic
54 [MINI] The Vanishing Gradient
[MINI] The Vanishing Gradient
Data Skeptic
55 CosmosDB
CosmosDB
Data Skeptic
56 Estimating Sheep Pain with Facial Recognition
Estimating Sheep Pain with Facial Recognition
Data Skeptic
57 [MINI] Conditional Independence
[MINI] Conditional Independence
Data Skeptic
58 MINI: Bayesian Belief Networks
MINI: Bayesian Belief Networks
Data Skeptic
59 Project Common Voice
Project Common Voice
Data Skeptic
60 [MINI] Recurrent Neural Networks
[MINI] Recurrent Neural Networks
Data Skeptic

This video discusses the modelling of evolutionary processes using SLiM, a tool for simulating population genetics over time. It covers the basics of natural selection and its various sources, and how SLiM can be used to model entire populations of organisms. The discussion is led by evolutionary biologist and software engineer Ben Haller.

Key Takeaways
  1. Learn about the Hardy-Weinberg Equilibrium
  2. Understand the sources of natural selection
  3. Explore the use of SLiM for simulating population genetics
  4. Model evolutionary processes using SLiM
  5. Analyze the results of the simulation
💡 Modelling evolutionary processes is crucial for understanding the dynamics of populations and the effects of natural selection, and SLiM provides a powerful tool for doing so.

Related AI Lessons

What OOP Actually Buys You (And Why “Real World Modeling” Is a Lie)
Learn the actual benefits of Object-Oriented Programming (OOP) and why 'real world modeling' is a misconception
Medium · Programming
Data Partitioning in System Design: Why Every Scalable Application Depends on It
Learn how data partitioning enables scalable applications to handle growth without failing
Medium · Programming
Why Realtime Collaboration Is Harder Than It Looks?
Realtime collaboration is a complex distributed systems problem that requires careful engineering, not just a simple UI feature
Medium · JavaScript
Podcast: Architectural Patterns: Moving Beyond Cloud-Native to Local-First - Insights from Adam Wiggins
Learn how to design local-first architectures that combine cloud-based collaboration with local software performance and data ownership
InfoQ AI/ML
Up next
Retracing It All With My Son
Ginny Clarke
Watch →