Data Scientist - Synthetic Population Engineer

Epistemix is looking for a data engineer to join our synthetic population team to advance the state-of-the-art in synthetic populations. In this role, your work will help improve decisions that could affect millions of people around the world.

Synthetic data has the power to make simulation, modeling, and machine learning applications quicker and cheaper to deploy, and creates an opportunity to address limitations and biases present in available empirical data.

A synthetic population is a synthetic dataset that represents the attributes of, and relationships between, individual people in a real population and their environment. They empower customers to solve problems in domains where empirical data is unavailable due to legitimate concerns for personal privacy or the data is unavailable. 

At Epistemix, our synthetic population is the foundation on which our clients and internal professional services team build models and solutions. Having a synthetic population that is continuously updated and improved over time is critical to building trust in the models and solutions built with our platform. The successful candidate will report to our Director of Data Science and work closely with the engineering, customer success, and professional services teams.

Company Information

Epistemix helps customers increase the ROI of decisions by simulating the impact of strategies and interventions using synthetic populations. We deliver an integrated platform for creating, running, and analyzing the behavior of agent-based simulation models (ABMs) built on top of our synthetic population.

Our clients use these models to understand how decisions and actions of individual people lead to large scale, population-level outcomes. Our in-house professional services team works with clients to develop models in healthcare, insurance, marketing analytics, product demand planning, and government.

Epistemix’s mission is to pioneer the use of synthetic populations and simulation across industry, government, and academia to improve decision making for the benefit of all. Since its founding, Epistemix has been refining its technology and developing its client-base.

Having recently completed our Series A funding, we are looking forward to the next phase of our growth and evolution.


  • Identify and evaluate empirical datasets and use them to augment synthetic population attributes (e.g. health histories) and relationships (e.g. social networks).
  • Enhance geographical realism of places such as homes, workplaces, and schools.
  • Improve realism of our baseline behavioral model (e.g. synthetic individuals visit friends and relatives).
  • Contribute to best practices for standardizing the development of synthetic populations for countries around the world.
  • Develop software packages for visualizing synthetic populations, and create visualizations for marketing and productizing synthetic populations.
  • Deliver synthetic populations as self-contained products with comprehensive documentation that can be marketed to external users.
  • Develop workflows that enable external users to augment Epistemix synthetic populations with their own proprietary data.
  • Work with external vendors and marketplaces to expand the ecosystem of data providers that can be integrated with the synthetic population.
  • Support the synthetic populations team in engaging with customer success, professional services, and engineering teams to understand project specific synthetic population requirements.
  • Key metrics you will influence include:
    • Synthetic population subscription sales
    • Number of third party data providers in the data marketplace
    • Time for new users to integrate their data with the synthetic population
    • Number of new models created in the platform
    • Number of solutions generated in the platform




  • A PhD or master’s degree in data science or a relevant technical discipline like mathematics, statistics, computer science, or computational social science.
  • Proficient experience in 
    • using Python for data science applications;
    • working with relational databases such as PostgreSQL (additional database management experience preferred);
    • working with geospatial data; and
    • working with simulation or machine learning models.
  • Empathy for users and decision makers using the synthetic population in their work and being able to translate the complexity into understandable information by our users and customers.
  • Possessing the passion to build the standard for synthetic populations globally to improve decision making across social, health, economic, and environmental policies and advancing data science into more commercial applications.
  • Proven track record of success building data products and/or data marketplaces.
  • Have a startup mentality with understanding the risks and the ability to flex across needs of an evolving team in a fast-paced environment.

Why Epistemix

By joining Epistemix, you will become part of a collaborative and quickly growing team that values curiosity and creativity. We are fully remote, with team members in both the United States and Europe. Benefits include incentives such as our stock option program, flexible time off, eligibility for participating in the Epistemix Health and Welfare Program for employees in the United States, and the opportunity to apply your skill set to make an impact.

Start Date: March / April 2024

To Apply: Fill out your information here.

Questions: Email