MLOps Blog

Learnings From Building the ML Platform at Stitch Fix

32 min
7th August, 2023

This article was originally an episode of the ML Platform Podcast, a show where Piotr Niedźwiedź and Aurimas Griciūnas, together with ML platform professionals, discuss design choices, best practices, example tool stacks, and real-world learnings from some of the best ML platform professionals.

In this episode, Stefan Krawczyk shares his learnings from building the ML Platform at Stitch Fix.

You can watch it on YouTube:

Or Listen to it as a podcast on: 

But if you prefer a written version, here you have it! 

In this episode, you will learn about: 

  • 1 Problems the ML platform solved for Stitch Fix
  • 2 Serializing models 
  • 3 Model packaging
  • 4 Managing feature request to the platform
  • 5 The structure of an end-to-end ML team at Stitch Fix


Piotr: Hi, everybody! This is Piotr Niedźwiedź and Aurimas Griciūnas from, and you’re listening to ML Platform Podcast. 

Today we have invited a pretty unique and interesting guest, Stefan Krawczyk. Stefan is a software engineer, data scientist, and has been doing work as an ML engineer. He also ran the data platform in his previous company and is also co-creator of open-source framework, Hamilton

I also recently found out, you are the CEO of DAGWorks.

Stefan: Yeah. Thanks for having me. I’m excited to talk with you, Piotr and Aurimas.

What is DAGWorks?

Piotr: You have a super interesting background, and you have covered all the important check boxes there are nowadays. 

Can you tell us a little bit more about your current venture, DAGWorks

Stefan: Sure. For those who don’t know DAGWorks, D-A-G is short for Directed Acyclic Graph. It’s a little bit of an homage to how we think and how we’re trying to solve problems. 

We want to stop the pain and suffering people feel with maintaining machine learning pipelines in production. 

We want to enable a team of junior data scientists to write code, take it into production, maintain it, and then when they leave, importantly, no one has nightmares about inheriting their code. 

At a high level, we are trying to make machine learning initiatives more human capital efficient by enabling teams to more easily get to production and maintain their model pipelines, ETLs, or workflows.

Piotr: The value from a high level sounds great, but as we dive deeper, there is a lot happening around pipelines, and there are different types of pains. 

How is it [DAGWorks solution] different from what is popular today? For example, let’s take Airflow, AWS SageMaker pipelines. Where does it [DAGWorks] fit?

Stefan: Good question. We’re building on top of Hamilton, which is an open-source framework for describing data flows. 

In terms of where Hamilton, and kind of where we’re starting, is helping you model the micro. 

Airflow, for example, is a macro orchestration framework. You essentially divide things up into large tasks and chunks, but the software engineering that goes within that task is the thing that you’re generally gonna be updating and adding to over time as your machine learning grows within your company or you have new data sources, you want to create new models, right? 

What we’re targeting first is helping you replace that procedural Python code with Hamilton code that you describe, which I can go into detail a little bit more.

The idea is we want to help you enable a junior team of data scientists to not trip up over the software engineering aspects of maintaining the code within the macro tasks of something such as Airflow. 

Right now, Hamilton is very lightweight. People use Hamilton within an Airflow task. They use us within FastAPI, Flask apps, they can use us within a notebook. 

You could almost think of Hamilton as DBT for Python functions. It gives a very opinionary way of writing Python. At a high level, it’s the layer above. 

And then, we’re trying to boot out features of the platform and the open-source to be able to take Hamilton data flow definitions and help you auto-generate the Airflow tasks. 

To a junior data scientist, it doesn’t matter if you’re using Airflow, Prefect, Dexter. It’s just an implementation detail. What you use doesn’t help you make better models. It’s the vehicle with which you use to run your pipelines with.

Why have a DAG within a DAG? 

Piotr: This is procedural Python code. If I understood correctly, it is kind of a DAG inside the DAG. But why do we need another DAG inside a DAG?

Stefan: When you’re iterating on models, you’re adding a new feature, right? 

A new feature roughly corresponds to a new column, right? 

You’re not going to add a new Airflow task just to compute a single feature unless it’s some sort of big, massive feature that requires a lot of memory. The iteration you’re going to be doing is going to be within those tasks. 

In terms of the backstory of how we came up with Hamilton… 

At Stitch Fix, where Hamilton was created – the prior company that I worked at – data scientists were responsible for end-to-end development (i.e., going from prototype to production and then being on call for what they took to production). 

The team was essentially doing time series forecasting, where every month or every couple of weeks, they had to update their model to help produce forecasts for the business.

The macro workflow wasn’t changing, they were just changing what was within the task steps. 

But the team was a really old team. They had a lot of code; a lot of legacy code. In terms of creating features, they were creating on the order of a thousand features. 

Piotr: A thousand features?

Stefan: Yeah, I mean, in time series forecasting, it’s very easy to add features every month.

Say there’s a marketing spend, or if you’re trying to model or simulate something. For example, there’s going to be marketing spend next month, how can we simulate demand. 

So they were always continually adding to the code, but the problem was it wasn’t engineered in a good way. Adding new things was like super slow, they didn’t have confidence when they added or changed something that something didn’t break.

Rather than having to have a senior software engineer on each pull request to tell them, 

“Hey, decouple things,” 

“Hey, you’re gonna have issues with the way you’re writing,” 

we came up with Hamilton, which is a paradigm where essentially you describe everything as functions, where the function name corresponds exactly to an output – this is because one of the issues was, given a feature, can we map it to exactly one function, make the function name correspond to that output, and in the function of the arguments, declare what’s required to compute it. 

When you come to read the code, it’s very clear what the output is and what the inputs are. You have the function docstring because with procedural code generally in script form, there is no place to stick documentation naturally.

Piotr: Oh, you can put it above the line, right?

Stefan: It’s not…  you start staring at a wall of text. 

It’s easier from a grokking perspective in terms of just reading functions if you want to understand the flow of things. 

[With Hamilton] you’re not overwhelmed, you have the docstring, a function for documentation, but then also everything’s unit testable by default – they didn’t have a good testing story. 

In terms of the distinction between other frameworks with Hamilton, the naming of the functions and the input arguments stitches together a DAG or a graph of dependencies. 

In other frameworks – 

Piotr: So you do some magic on top of Python, right? To figure it out.

Stefan: Yep!

Piotr: How about working with it? Do IDEs support it?

Stefan: So IDEs? No. It’s on the roadmap to like to provide more plugins, but essentially, rather than having to annotate a function with a step and then manually specify the workflow from the steps, we short-circuit that with everything through the aspect of naming. 

So that’s a long-winded way to say we started at the micro because that was what was slowing the team down. 

By transitioning to Hamilton, they were four times more efficient on that monthly task just because it was a very prescribed and simple way to add or update something.

It’s also clear and easy to know where to add it to the codebase, what to review, understand the impacts, and then therefore, how to integrate it with the rest of the platform.

How do you measure whether tools are adding value? 

Piotr: How do – and I think it is a question that I sometimes hear, especially from ML platform teams and leaders of those teams where they need to like to justify their existence. 

As you’ve been running the ML data platform team, how do you do that? How do you know whether the platform we are building, the tools we are providing to data science teams, or data teams are bringing value?

Stefan: Yeah, I mean, hard question, no simple answer.

If you can be data-driven, that is the best. But the hard part is people’s skill sets differ. So if you were to say, measure how long it takes someone to do something, you have to take into account how senior they are, how junior.

But essentially, if you have enough data points, then you can say roughly something on average. It used to take someone this amount of time now it takes this amount of time, and so you get the ratio and the value added there, and then you want to count how many times that thing happens. Then you can measure human time and, therefore, salary and say this is how much savings we made – that’s from just looking at efficiencies. 

The other way machine learning platforms help is like by stopping production fires. You can look at what’s the cost of an outage is and then work backwards like, “hey, if you prevent these outages, we’ve also provided this type of value.”

Piotr: Got it.

What are some use cases of Hamilton?

Aurimas: Maybe we’re getting one step a little bit back…

To me, it sounds like Hamilton is mostly useful for feature engineering. Do I understand this correctly? Or are there any other use cases?

Stefan: Yeah, that’s where Hamilton’s roots are. If you need something to help structure your feature engineering problem, Hamilton is great if you’re in Python. 

Most people don’t like their pandas code, Hamilton helps you structure that. But with Hamilton, it works with any Python object type. 

Most machines these days are large enough that you probably don’t need an Airflow right away, in which case you can model your end-to-end machine learning pipeline with Hamilton. 

In the repository, we have a few examples of what you can do end-to-end. I think Hamilton is a Swiss Army knife. We have someone from Adobe using it to help manage some prompt engineering work that they’re doing, for example. 

We have someone precisely using it more for feature engineering, but using it within a Flask app. We have other people using the fact that it’s Python-type agnostic and helping them orchestrate a data flow to generate some Python object. 

So very, very broad, but it’s roots are feature engineering, but definitely very easy to extend to a lightweight end-to-end kind of machine learning model. This is where we’re excited about extensions we’re going to add to the ecosystem. For example, how do we make it easy for someone to say, pick up Neptune and integrate it? 

Piotr: And Stefan, this part was interesting because I didn’t expect that and want to double-check. 

Would you also – let’s assume that we do not need a macro-level pipeline like this one run by Airflow, and we are fine with doing it on one machine. 

Would you also include steps that are around training a model, or is it more about data?

Stefan: No, I mean both. 

The nice thing with Hamilton is that you can logically express the data flow. You could do source, featurization, creating training set, model training, prediction, and you haven’t really specified the task boundaries. 

With Hamilton, you can logically define everything end-to-end. At runtime, you only specify what you want computed – it will only compute the subset of the DAG that you request.

Piotr:But what about the for loop of training? Like, let’s say, 1000 iterations of the gradient descent, that inside, how would this work?

Stefan: You have options there… 

I want to say right now people would stick that within the body of a function – so you’ll just have one function that encompasses that training step. 

With Hamilton, junior people and senior people like it because you have the full flexibility of whatever you want to do within the Python function. It’s just an opinionated way to help structure your code.

Why doesn’t Hamilton have a feature store? 

Aurimas: Getting back to that table in your GitHub repository, a very interesting point that I noted is that you’re saying that you are not comparing to a feature store in any way. 

However, I then thought a little bit deeper about it… The feature store is there to store the features, but it also has this feature definition, like modern feature platforms also have feature compute and definition layer, right? 

In some cases, they don’t even need a feature store. You might be okay with just computing features both on training time and inference time. So I thought, why couldn’t Hamilton be set for that?

Stefan: You’re exactly right. I term it as a feature definition store. That’s essentially what the team at Stitch Fix built – just on the back of Git. 

Hamilton forces you to separate your functions separate from the context where it runs. You’re forced to curate things into modules. 

If you want to build a feature bank of code that knows how to compute things with Hamilton, you’re forced to do that – then you can share and reuse those kind of feature transforms in different contexts very easily.

It forces you to align on naming, schema, and inputs. In terms of the inputs to a feature, they have to be named appropriately. 

If you don’t need to store data, you could use Hamilton to recompute everything. But if you need to store data for cache, you put Hamilton in front of that in terms of, use Hamilton’s compute and potentially push it to something like FIST.

Aurimas: I also saw in the, not Hamilton, but DAGWorks website, as you already mentioned, you can train models inside of it as well in the function. So let’s say you train a model inside of Hamilton’s function. 

Would you be able to also somehow extract that model from storage where you placed it and then serve it as a function as well, or is this not a possibility?

Stefan: This is where Hamilton is really lightweight. It’s not opinioned with materialization. So that is where connectors or other things come in as to, like, where do you push like actual artifacts? 

This is where it’s at a lightweight level. You would ask the Hamilton DAG to compute the model, you get the model out, and then the next line, you would save it or push it to your data store – you could also write a Hamilton function that kind of does that. 

The side effect of running the function is pushing it, but this is where looking to expand and kind of provide more capabilities to make it more naturally pluggable within the DAG to specify to build a model and then in the context that you want to run it should specify, “I want to save the model and place it into Neptune.” 

That’s where we’re heading, but right now, Hamilton doesn’t restrict how you would want to do that.

Aurimas: But could it pull the model and be used in the serving layer?

Stefan: Yes. One of the features of Hamilton is that with each function, you can switch out a function implementation based on configuration or a different module. 

For example, you could have two implementations of the function: one which takes a path to pull from S3 to pull the model, another one that expects the model or training data to be passed in to fit a model. 

There is flexibility in terms of function implementations and to be able to switch them out. In short, Hamilton the framework doesn’t have anything native for that… 

But we have flexibility in terms of how to implement that.

Aurimas: You basically could do the end-to-end, both training and serving with Hamilton. 

That’s what I hear.

Stefan:I mean, you can model that. Yes. 

Data versioning with Hamilton

Piotr: And what about data versioning? Like, let’s say, simplified form. 

I understand that Hamilton is more on the kind of codebase. When we version code, we version the maybe recipes for features, right?

Having that, what do you need on top to say, “yeah, we have versioned datasets?”

Stefan: Yeah. you’re right. Hamilton, you describe your data for encode. If you store it in Git, or have a structured way to version your Python packages, you can go back at any point in time and understand the exact lineage of computation. 

But where the source data lives and what the output is, in terms of dataset versioning, is kind of up to you (i.e. your fidelity of what you want to store and capture). 

If you were to use Hamilton to create some sort of dataset or transform a dataset, you would store that dataset somewhere. If you stored the Git SHA and the configuration that you used to instantiate the Hamilton DAG with, and you store that with that artifact, you could always go back in time to recreate it, assuming the source data is still there. 

This is from building a platform at Stitch Fix, Hamilton, we have these hooks, or at least the ability to, integrate with that. Now, this is part of the DAGWorks platform. 

We’re trying to provide precisely a means to store and capture that extra metadata for you so you don’t have to build that component out so that we can then connect it with other systems you might have. 

Depending on your size, you might have a data catalog. Maybe storing and emitting open lineage information, etc. with that. 

Definitely, looking for ideas or early stacks to integrate with, but otherwise, we’re not opinionated. Where we can help from the dataset versioning is to not only version the data, but if it’s described in Hamilton, you then go and recompute it exactly because, you know, the code path that was used to transform things.

When did you decide Hamilton must be built?  

Aurimas: Maybe moving a little bit back to what you did at Stitch Fix and to Hamilton itself. 

When was the point when you decided that Hamilton needs to be built?

Stefan: Back in 2019. 

We only open-sourced Hamilton 18 months ago. It’s not a new library – it’s been running in Stitch Fix for over three years. 

The interesting part for Stitch Fix is it was a data science organization with over 100 data scientists with various modeling disciplines doing various things for the business.

I was part of the platform team that was engineering for data science. My team’s mandate was to streamline model productionization for teams. 

We thought, “how can we lower the software engineering bar?”

The answer was to give them the tooling abstractions and APIs such that they didn’t have to be good software engineers – MLOps best practices basically came for free. 

There was a team that was struggling, and the manager came to us to talk. He was like, “This code base sucks, we need help, can you come up with anything? I want to prioritize being able to do documentation and testing, and if you can improve our workflow, that’d be great,” which is essentially the requirements, right? 

At Stitch Fix, we had been thinking about “what is the ultimate end user experience or API from a platform to data scientist interaction perspective?” 

I think Python functions are not an object-oriented interface that someone has to implement – just give me a function, and there’s enough metaprogramming you can do with Python to inspect the function and know the shape of it, know the inputs and outputs, you know have type annotations, et cetera.

So, plus one for work from home Wednesdays. Stitch Fix had a no meeting day, I set aside a whole day to think about this problem. 

I was like, “how can I ensure that everything’s unit testable, documentation friendly, and the DAG and the workflow is kind of self-explanatory and easy for someone to kind of describe.” 

In which case, I prototyped Hamilton and took it back to the team. My now co-founder, former colleague at Stich Fix, Elijah, also came up with a second implementation, which was akin to more of a DAG-style approach. 

The team liked my implementation, but essentially, the premise of everything being unit testable, documentation friendly, and having a good integration testing story.

With data science code, it’s very easy to append a lot of code to the same scripts, and it just grows and grows and grows. With Hamilton, it’s very easy. You don’t have to compute everything to test something – that was also part of the thought with building a DAG that Hamilton knows to only walk the paths needed for the things you want to compute. 

But that’s roughly the origin story.

Migrated the team and got them onboarded. Pull requests end up being faster. The team loves it. They’re super sticky. They love the paradigm because it definitely simplified their life more than what it was before.

Using Hamilton for Deep Learning & Tabular Data

Piotr: Previously you mentioned you’ve been working on over 1000 features that are manually crafted, right?

Would you say that Hamilton is more useful in the context of tabular data, or it can also be used for let’s deep learning type of data where you have a lot of features but not manually developed? 

Stefan: Definitely. Hamilton’s roots and sweet spots are coming from trying to manage and create tabular data for input to a model. 

The team at Stich Fix manages over 4,000 feature transforms with Hamilton. And I want to say –

Piotr: For one model?

Stefan: For all the models they create, they collectively in the same code base, they have 4,000 feature transforms, which they can add to and manage, and it doesn’t slow them down. 

On the question of other types, I wanna say, “yeah.”  Hamilton is essentially replacing some of the software engineering that you do. It really depends on what you have to do to stitch together a flow of data to transform for your deep learning use case. 

Some people have said, “oh, Hamilton kind of looks a little bit like LangChain.” I haven’t looked at LangChain, which I know is something that people are using for large models to stitch things together. 

So, I’m not quite sure yet exactly where they think the resemblance is, but otherwise, if you had procedural code that you’re using with encoders, there’s likely a way that you can transcribe and use it with Hamilton.

One of the features that Hamilton has is that it has a really lightweight data quality runtime check. If checking the output of a function is important to you, we have an extensible way you can do it. 

If you’re using tabular data, there’s Pandera. It’s a popular library for describing schema – we have support for that. Else we have a pluggable way that like if you’re doing some other object types or tensors or something – we have the ability that you could extend that to ensure that the tensor meets some sort of standards that you would expect it to have.

Piotr: Would you also calculate some statistics over a column or set of columns to, let’s say, use Hamilton as a framework for testing data sets? 

Like I’m not talking about verifying particular value in a column but rather statistic distribution of your data.

Stefan: The beauty of everything being Python functions and the Hamilton framework executing them is that we have flexibility with respect to, yeah, given output of a function, and it just happens to be, you know, a dataframe. 

Yeah, we could inject something in the framework that takes summary statistics and emits them. Definitely, that’s something that we’re playing around with.

Piotr: When it comes to a combination of columns, like, let’s say that you want to calculate some statistics correlations between three columns, how does it fit to this function representing a column paradigm?

Stefan: It depends on whether you want that to be an actual transform. 

You could just write a function that takes the input or the output of that data frame, and in the body of the function, do that – basically, you can do it manually. 

It really depends on whether you want that to be if you’re doing it from a platform perspective and you want to enable data scientists just to capture various things automatically, then I would come from a platform angle of trying to add a decorator what’s called something that wraps the function that then can describe and do the introspection that you want.

Why did you open-source Hamilton?

Piotr: I’m going back to a story of Hamilton that started at Stitch Fix. What was the motivation to go open-source with it?

It is something curious for me because I’ve been in a few companies, and there are always some internal libraries and projects that they liked, but yeah, like, it’s not so easy, and not every project is the right candidate for going open and be truly used. 

I’m not talking about adding a license file and making the repo public, but I am talking about making it live and really open.

Stefan: Yeah. My team had per view in terms of build versus buy, we’d been looking at like across the stack, and like we were seeing we created Hamilton back in 2019, and we were seeing very similar-ish things come out and be open-source – we’re like, “hey, I think we have a unique angle.” Of the other tools that we had, Hamilton was the easiest to open source. 

For those who know, Stitch Fix also was very big on branding. If you ever want to know some interesting stories about techniques and things, you can look up the Stitch Fix Multithreaded blog. 

There was a tech branding team that I was part of, which was trying to get quality content out. That helps the Stitch Fix brand, which helps with hiring. 

In terms of motivations, that’s the perspective of branding; set a high-quality bar, and bring things out that look good for the brand. 

And it just so happened from our perspective, and our team that just had Hamilton was kind of the easiest to open source out of the things that we did – and then I think it was, more interesting. 

We built things like, similar to MLflow, configuration-driven model pipelines, but I wanna say that’s not quite as unique. Hamilton is also a more unique angle on a particular problem. And so which case both of those two combined, it was like, “yeah, I think this is a good branding opportunity.”

And then in terms of the surface area of the library, it’s pretty small. You don’t need many dependencies, which makes it feasible to maintain from an open-source perspective. 

The requirements were also relatively low since you just need Python 3.6 – now it’s 3.6 is sunset, so now it’s 3.7, and it just kind of works. 

From that perspective, I think it had a pretty good sweet spot of likely not going to have to be, add too many things to increase adoption, make it usable from the community, but then also the maintenance aspect side of it was also kind of small.

The last part was a little bit of an unknown; “how much time would we be spending trying to build a community?” I couldn’t always spend more time on that, but that’s kind of the story of how we open-sourced it. 

I just spent a good couple of months trying to write a blog post though with it for launch – that took a bit of time, but that’s always also a good means to get your thoughts down and get them clearly articulated.

Launching an open-source product

Piotr: How was the launch when it comes to adoption from the outside? Can you share with us you promoted it? Did it work from day zero, or it took some time to make it more popular?

Stefan: Thankfully, Stitch Fix had a blog that had a reasonable amount of readership. I paired that with the blog, in which case, you know, I got a couple of hundred stars in a couple of months. We have a Slack community that you can join. 

I don’t have a comparison to say how well it was compared to something else, but people are adopting it outside of Stitch Fix.  UK Government Digital Services is using Hamilton for a national feedback pipeline. 

There is a guy internally using it at IBM for a small internal search tool kind of product. The problem with open-source is you don’t know who’s using you in production since telemetry and other things are difficult. People came in, created issues, asked questions, and which case gave us more energy to be in there and help.

Piotr: What about the first pull request, useful pull request from external guys?

Stefan: So we were fortunate to have a guy called James Lamb come in. He’s been on a few open-source projects, and he’s helped us with the repository documentation and structure. 

Basically, cleaning up and making it easy for an outside contributor to come in and run our tests and things like that. I want to say kind of grunt work but super, super valuable in the long run since he just like gave feedback like, “hey, this pull request template is just way too long. How can we shorten it?” – “you’re gonna scare off contributors.” 

He gave us a few good pointers and help set up the structure a little bit. It’s repo hygiene that enables other people to kind of contribute more easily.

Stitch Fix biggest challenges 

Aurimas: Yeah, so maybe let’s also get back a little bit to the work you did at Stitch Fix. So you mentioned that Hamilton was the easiest one to open-source, right? If I understand correctly, you were working on a lot more things than that – not only the pipeline. 

Can you go a little bit into what were the biggest problems at Stitch Fix and how did you try and solve it as a platform thing?

Stefan: Yeah, so you could think of, so take yourself back six years ago, right? There wasn’t the maturity and open-source things available. At Stitch Fix, if data scientists had to create an API for the model, they would be in charge of spinning up their own image on EC2 running some sort of Flask app that then kind of integrated things.

Where we basically started was helping from the production standpoint of stabilization, ensuring better practices. Helping a team that essentially made it easier to deploy backends on top of FastAPI, where the data scientists just had to write Python functions as the integration point.

That helped stabilize and standardize all the kind of backend microservices because the platform now owned what the actual web service was. 

Piotr: So you’re kind of providing Lambda interface to them?

Stefan: You could say a little more heavy weight. So essentially making it easy for them to provide a requirements.txt, a base Docker image, and you could say the Git repository where the code lived and be able to create a Docker container, which had the web service, which had the kind of code built, and then deployed on AWS pretty easily.

Aurimas: Do I hear the template repositories maybe? Or did you call them something different here?

Stefan: We weren’t quite template, but there were just a few things that people needed to create a micro surface and get it deployed. Right. Once that was done, it was looking at the various parts of the workflow. 

One of the problems was model serialization and “how do you know what version of a model is running in production?” So we developed a little project called the model envelope, where the idea was to do more – much like the metaphor of an envelope, you can stick things in it.

For example, you can stick in the model, but you can also stick a lot of metadata and extra information about it. The issue with model serialization is that you need pretty exact Python dependencies, or you can run into serialization issues.

If you reload models on the fly, you can run into issues of someone pushed a bad model, or its not easy to roll back. One of the way things work at Stitch Fix – or how they used to work – was that if a new model is detected, it would just automatically reload it. 

But that was kind of a challenge from an operational perspective to roll back or test things before. With the model envelope abstraction, the idea was you save your model, and then you then provide some configuration and a UI, and then we could give it a new model, auto deploy a new service, where each model build was a new Docker container, so each service was immutable. 

And it provided better constructs to push something out, make it easy to roll back, so we just switched the container. If you wanted to debug something, then you could just pull that container and compare it against something that was running in production. 

It also enabled us to insert a CI/CD type kind of pipeline without them having to put that into their model pipelines because common frameworks right now have, you know, at the end of someone’s machine learning model pipeline ETL is like, you know, you do all these kind of CI/CD checks to qualify a model. 

We kind of abstracted that part out and made it something that people could add and after they had created a model pipeline. So that way it was, you know, easier to kind of change and update, and therefore the model pipeline wouldn’t have to change if like, you know, wouldn’t have to be updated if someone there was a bug and they wanted to create a new test or something. 

And so that’s roughly it. Model envelope was the name of it. It helped users to build a model and get it into production in under an hour.

We also had the equivalent for the batch side. Usually, if you want to create a model and then run it in a batch somewhere you would have to write the task. We had books to make a model run in Spark or on a large box. 

People wouldn’t have to write that batch task to do batch prediction. Because at some level of maturity within a company, you start to have teams who want to reuse other teams’ models. In which case, we were the buffer in between, helping provide a standard way for people to kind of take someone else’s model and run in batch without them having to know much about it.

Serializing models in the Stitch Fix platform

Piotr: And Stefan, talking about serializing a model, did you also serialize the pre and post-processing of features to this model? How, where did you have a boundary? 

Like, and second that is very connected, how did you describe the signature of a model? Like, let’s say it’s a RESTful API, right? How did you do this?

Stefan: When someone saved the model, they had to provide a pointer to an object in the name of the function, or they provided a function. 

We would use that function, introspect it, and as part of the saving model API, we ask for what the input training data was, what was the sample output? So we could actually exercise a little bit the model when we’re saving it to actually introspect a little bit more about the API. So if someone had passed an appendage data frame, we would go, hey, you need to provide some sample data for this data frame so we can understand, introspect, and create the function. 

From that, we would then create a Pydantic schema on the web service side. So then you could go to, you know, so if you use FastAPI, you could go to the docs page, and you would have a nicely kind of easy to execute, you know, REST-based interface that would tell you what features are required to run this model. 

So in terms of what was stitched together in a model, it really depended on, since we were just, you know, we tried to treat Python as a black box in terms of serialization boundaries. 

The boundary was really, you know, knowing what was in the function. People could write a function that included featurization as the first step before delegating to the model, or they had the option to kind of keep both separate and in which case it was then at call time, they would have to go to the feature store first to get the right features that then would be passed to the request to kind of compute a prediction in the web service. 

So we’re not exactly opinionated as to where the boundaries were, but it was kind of something that I guess we were trying to come back to, to try to help standardize a bit more us to, since different use cases have different SLAs, have different needs, sometimes it makes sense to stitch together, sometimes it’s easier to pre-compute and you don’t need to like stick that with the model.

Piotr: And the interface for the data scientist, like building such a model and serializing this model, was in Python, like they were not leaving Python. It is everything in Python. And I like this idea of providing, let’s say, sample input, sample output. It’s very, I would say, Python way of doing things. Like unit testing, it is how we ensure that the signature is kept.

Stefan: Yeah, and so then from that, like actually from that sample input and output, it was, ideally, it was also actually the training set. And so then this is where we could, you know, we pre-computed summary statistics, as you kind of were alluding to. And so whenever someone saved a model, we tried to provide, you know, things for free. 

Like they didn’t have to think about, you know, data observability, but look, if you provided those data, we captured things about it. So then, if there was an issue, we could have a breadcrumb trail to help you determine what changed, was it something about the data, or was it, hey look, you included a new Python dependency, right? 

And that kind of changes something, right? And so, so, for example, we also introspected the environment that things ran in. So therefore, we could, to the package level, understand what was in there. 

And so then, when we ran model production, we tried to closely replicate those dependencies as much as possible to ensure that, at least from a software engineering standpoint, everything should run as expected.

Piotr: So it sounds like model packaging, it is how it’s called today, solution. And where did you store those envelopes. I understand that you had a framework envelope,  but you had instances of those envelopes that were serialized models with metadata. Where did you store it?

Stefan: Yeah. I mean pretty basic, you could say S3, so we store them in a structured manner on S3, but you know, we paired that with a database which had the actual metadata and pointer. So some of the metadata would go out to the database, so you could use that for querying. 

We had a whole system where each envelope, you would specify tags. So that way, you could hierarchy organize or query based on kind of the tag structure that you included with the model. And so then it was just one field in the row. 

There was one field that was just appointed to, like, hey, this is where the serialized artifact lives. And so yeah, pretty basic, nothing too complex there.

How to decide what feature to build?

Aurimas: Okay, Stefan, so it sounds like everything… was really natural in the platform team. So teams needed to deploy models, right? So you created envelope framework, then teams were suffering from defining the section code efficiently, you created Hamilton. 

Was there any case where someone came to you with a crazy suggestion that needs to be built, and you said no? Like how do you decide what feature has to be built and what features you rejected? 

Stefan: Yeah. So I have a blog post on some of my learnings, building the platform at Stitch Fix. And so, you could say usually those requests that we said “no” to came from someone who was, someone, wanting something super complex, but they’re also doing something speculative. 

They wanted the ability to do something, but it wasn’t in production yet, and it was trying to do something speculative based around improving something where the business value was still not known yet. 

Unless it was a business priority and we knew that this was a direction that had to be done, we would say, sure, we’ll help you kind of with that. Otherwise, we would basically say no, usually, these requests come from people who think they’re pretty capable from an engineering perspective. 

So we’re like, okay, no, you go figure it out, and then if it works, we can talk about ownership and taking it on. So, for example, we had one configuration-driven model pipeline – you could think of it as some YAML with Python code, and in SQL, we enabled people to describe how to build a model pipeline that way. 

So different than Hamilton, getting in more of a macro kind of way, and so we didn’t want to support that right away, but it grew in a way that other people wanted to adopt it, and so in terms of the complexity of being able to kind of manage it, maintain it, we came in, refactored it, made it more general, broader, right? 

And so that’s where I see a reasonable way to kind of determine whether you say yes or no, is one, if it’s not a business priority, likely probably not worth your time and get them to prove it out and then if it’s successful, assuming you have the conversation ahead of time, you can talk about adoption. 

So, it’s not your burden. Sometimes people do get attached. You just have to be aware as to their attachment to, if it’s their baby, you know, how they’re gonna hand it off to you. It’s something to think about. 

But otherwise, I’m trying to think some people wanted TensorFlow support – TensorFlow specific support, but it was only one person using TensorFlow. They were like, “yeah, you can do things right now, yeah we can add some stuff,” but thankfully, we didn’t invest our time because the project they tried it didn’t work, and then they ended up leaving. 

And so, in which case, glad we didn’t invest time there. So, yeah, happy to dig in more.

Piotr: It sounds like product manager role, very much like that.

Stefan: Yeah, so at Stitch Fix we didn’t have product managers. So the organization had a program manager. My team was our own product managers. That’s why I spent some of my time trying to talk to people, managers, understand pain points, but also understand what’s going to be valuable from business and where we should spending time.


I’m running a product at Neptune, and it is a good thing and at the same time challenging that you’re dealing with people who are technically savvy, they’re engineers, they can code, they can think in an abstract way. 

Very often, when you hear the first iteration in the feature request, it’s actually a solution. You don’t hear the problem. I like this test, and maybe other ML platform teams can learn from it. Do you have it in production? 

Is it something that works, or is it something that you plan to move to production one day? As a first filter, I like this heuristic.

Stefan: I mean, you brought back memories a lot like, there’s hey, can you do this? Like, so what’s the problem? Yeah, that is, that is actually, that is the one thing you have to learn to be your first reaction whenever someone who is using your platform asks is like, what is the actual problem? Because it could be that they found a hammer, and they want to use that particular hammer for that particular task.

For example, they want to do hyperparameter optimization. They were asking for it, like, “can you do it this way?” And stepping back, we’re like, hey, we can actually do it at a little higher level, so you don’t have to think we wouldn’t have to engineer it. And so, in which case, super important question to always ask is, “what is the actual problem you’re trying to solve?”

And then you can also ask, “what is the business value?” How important is this, et cetera, to really know, like how to prioritize?

Getting buy-in from the team

Piotr: So we have learned how you’ve been dealing with data scientists coming to you for features. How did the second part of the communication work, how did you encourage or make people, teams follow what you’ve developed, what you proposed them to do? How did you set the standards in the organization?

Stefan: Yeah, so ideally, with any initiative we had, we found a particular use case, a narrow use case, and a team who needed it and would adopt it and would use it when we kind of developed it. Nothing worse than developing something and no one using it. That looks bad, managers like, who’s using it?

  • So one is ensuring that you have a clear use case and someone who has the need and wants to partner with you. And then, only once that’s successful, start to think about broadening it. Because one, you can use them as the use case and story. This is where ideally, you have weekly, bi-weekly shareouts. So we had what was called “algorithms”, I could say beverage minute, where essentially you could get up for a couple of minutes and kind of talk about things. 
  • And so yeah, definitely had to live the dev tools evangelization internally cause at Stitch Fix, it wasn’t the data scientist who had the choice to not use our tools if they didn’t want to, if they wanted to engineer things themselves. So we had to definitely go around the route of, like, we can take these pain points off of you. You don’t have to think about them. Here’s what we’ve built. Here’s someone who’s using it, and they’re using it for this particular use case. I think, therefore, awareness is a big one, right?  You got to make sure people know about the solution, that it is an option.
  • Documentation, so we actually had a little tool that enabled you to write Sphinx docs pretty easily. So that was kind of something that we ensured that for every kind of model envelope, other tool we kind of built, Hamilton, we had kind of a Sphinx docs set up so if people wanted to like, we could point them to the documentation, try to show snippets and things. 
  • The other is, from our experience, the telemetry that we put in. So one nice thing about the platform is that we can put in as much telemetry as we want. So we actually, when everyone was using something, and there was an error, we would get a Slack alert on it. And so we would try to be on top of that and ask them and go, what are you doing?

Maybe try to engage them to ensure that they were successful in kind of doing things correctly. You can’t do that with open-source. Unfortunately, that’s slightly invasive. But otherwise, most people are only willing to kind of adopt things, maybe a couple of times a quarter. 

And so it’s just, you need to have the thing in the right place, right time for them to kind of when they have that moment to be able to get started and over the hump since getting started is the biggest challenge. And so, therefore, trying to find the documentation examples and ways to kind of make that as small a jump as possible.

How did you assemble a team for creating the platform?

Aurimas: Okay, so have you been in Stitch Fix from the very beginning of the ML platform, or did it evolve from the very beginning, right?

Stefan: Yeah, so I mean, when I got there, it was a pretty basic small team. In the six years I was there, it grew quite a bit.

Aurimas: Do you know how it was created? Why was it decided that it was the correct time to actually have a platform team?

Stefan: No, I don’t know the answer to that, but the two guys have kind of heads up, Eric Colson and Jeff Magnusson.

Jeff Magnusson has a pretty famous post about engineers shouldn’t write ETL. If you Google that, you’ll see this kind of post that kind of describes the philosophy of Stitch Fix, where we wanted to create full stack data scientists, where if they can do everything end to end, they can do things move faster and better. 

But with that thesis, though, there’s a certain scale limit you can’t hire. It’s hard to hire everyone who has all the skills to do everything full stack, you know, data science, right? And so in which case it was really their vision that like, hey, a platform team to build tools of leverage, right? 

I think, it’s something I don’t know what data you have, but like my cursory knowledge around machine learning initiatives is generally there’s a ratio of engineers to data scientists of like 1:1 or 1:2. But at Stitch Fix, the ratio of safe, if you just take the engineering, the platform team that was focused on helping pipelines, right? 

The ratio was closer to 1:10. And so in terms of just like leverage of, like, engineers to what data scientists can kind of do, I think it does a little, you have to understand what a platform does now, then you also have to know how to communicate it. 

So given your earlier question, Piotr, about, like, how do you measure the effectiveness of platform teams in which case, you know, they, I don’t know what conversations they had to get a head count, so potentially you do need a little bit of help or at least like thinking in terms of communicating that like, hey yes this team is going to be second order because we’re not going to be directly impacting and producing a feature, but if we can make the people more effective and efficient who are doing it then you know it’s going to be a worthwhile investment.

Aurimas: When you say engineers and data scientists, do you assume that Machine Learning Engineer is an engineer or he or she is more of a data scientist?

Stefan: Yeah, I count them, the distinction between a data scientist and machine learning engineers, you could say, one, maybe you could say has a connotation they do a little bit more online kind of things, right? 

And so they need to do a little bit more engineering. But I think there’s a pretty small gap. You know, for me, actually, my hope is that if when people use Hamilton, we enable them to do more, they can actually switch the title from data scientist to machine learning engineer. 

Otherwise, I kind of lump them into the data scientist bucket in that regard. So like platform engineering was specifically what I was talking about.

Aurimas: Okay. And did you see any evolution in how teams were structured throughout your years at Stitch Fix? Did you change the composition of these end-to-end machine learning teams composed of data scientists and engineers?

Stefan: It really depended on their problem because the forecasting teams they were very much an offline batch. Worked fine, they didn’t have to know, engineer anything thing too complex from an online perspective. 

But more than the personalization teams where you know SLA and client-facing things started to matter, they definitely started hiring towards people with a little bit more experience there since they did kind of help from, much like we’re not tackling that yet, I would say, but with DAGWorks we’re trying to enable a lower software engineering bar for to build and maintain model pipelines. 

I wouldn’t say the recommendation stack and producing recommendations online. There isn’t anything that’s simplifying that and so in which case, you just still need a stronger engineering skillset to ensure that over time, if you’re managing a lot of microservices that are talking to each other or you’re managing SLAs, you do need a little bit more engineering knowledge to kind of do well. 

In so which case, if anything, that was the split that started to merge. Anyone who’s doing more client-faced SLA, required stuff was slightly stronger on the software engineering side, else everyone was fine to be great modelers with lower software engineering skills.

Aurimas: And when it comes to roles that are not necessarily technical, would you embed them into those ML teams like project managers or subject matter experts? Or is it just plain data scientists?

Stefan: I mean, so some of it was landed on the shoulder of the data scientist team is to like partner, who they’re partnering with right, and so they were generally partnering with someone within the organization in which case, you could say, collectively between the two the product managing something so we didn’t have explicit product manager roles. 

I think at this scale, when Stitch Fix started to grow was really like project management was a pain point of like: how do we bring that in who does that? So it really depends on the scale.

The product is what you’re doing, what it’s touching, is to like whether you start to need that. But yeah, definitely something that the org was thinking about when I was still there, is like how do you structure things to run more efficiently and effectively? And, like, how exactly do you draw the bounds of a team delivering machine learning? 

If you’re working with the inventory team, who’s managing inventory in a warehouse, for example, what is the team structure there was still being kind of shaped out, right? When I was there, it was very separate. But they had, they worked together, but they were different managers, right? 

Kind of reporting to each other, but they worked on the same initiative. So, worked well when we were small. You’d have to ask someone there now as to, like, what’s happening, but otherwise, I would say depends on the size of the company and the importance of the machine learning initiative.

Model monitoring and production

Piotr: I wanted to ask about monitoring of the models and production, making them live. Because it sounds pretty similar to software space, okay? The data scientists are here with software engineers. ML platform team can be for this DevOps team.

What about people who are making sure it is live, and how did it work?

Stefan: With the model envelope, we provided deployment for free. That meant the data scientists, you could say the only thing that they were responsible for was the model. 

And we tried to structure things in a way that, like, hey, bad models shouldn’t reach production because we have enough of a CI validation step that, like the model, you know, shouldn’t be an issue. 

And so the only thing, thing that would break in production is an infrastructure change, in which case the data scientists aren’t responsible and capable for.

But otherwise, you know, if they were, so therefore, if they were, so it was our job to kind of like my team’s responsibility.

I think we were on call for something like, you know, over 50 services because that’s how many models were deployed with us. And we were frontline. So we were frontline precisely because, you know, most of the time, if something was going to go wrong, it was likely going to be something to do with infrastructure. 

We were the first point, but they were also on the call chain. Actually, well, I’ll step back. Once any model was deployed, we were both on call, just to make sure that it deployed and it was running initiative, but then it would slightly bifurcate us to, like, okay, we would do the first escalation because if it’s infrastructure, you can’t do anything, but otherwise, you need to be on call because if the model is actually doing some weird predictions, we can’t fix that, in which case you’re the person who has to debug and diagnose it.

Piotr: Sounds like something with data, right? Data drift.

Stefan: Yeah, data drift, something upstream, et cetera. And so this is where better model observability and data observability helps. So trying to capture and use that. 

So there’s many different ways, but the nice thing with what we had set up is that we were in a good position to be able to capture inputs at training time, but then also because we controlled the web service. And what was the internals, we could actually log and emit things that came in. 

So then we had pipelines then to kind of build and reconcile. So if you want to ask the question, is there training serving SKU? You, as a data scientist or machine learning engineer, didn’t have to build that in. You just had to turn on logging in to your service. 

Then we had like turn on some other configuration downstream, but then we provided a way that you could push it to an observability solution to then compare production features versus training features.

Piotr: Sounds like you provided a very comfortable interface for your data scientists.

Stefan: Yeah, I mean, that’s the idea. I mean, so truth be told, that’s kind of what I’m trying to replicate with DAGWorks right, provide the abstractions to allow anyone to have that experience we built at Stitch Fix. 

But yeah, data scientists hate migrations. And so part of the reason why to focus on an API thing is to be able to if we wanted to change things underneath from a platform perspective, we wouldn’t be like, hey, data scientists, you need to migrate, right? And so that was also part of the idea of why we focused so heavily on these kinds of API boundaries, so we could make our life simpler but then also theirs as well.

Piotr: And can you share how big was the team of data scientists and ML platform team when it comes to the number of people at the time when you work at Stitch Fix?

Stefan: It was, I think, at its peak it was like 150, was total data scientists and platform team together.

Piotr: And the team was 1:10?

Stefan: So we had a platform team, I think we roughly, it was like, either 1:4, 1:5 total, because we had a whole platform team that was helping with UIs, a whole platform team focusing on the microservices and kind of online architecture, right? So not pipeline related. 

And so, yeah. And so there was more, you could say, work required from an engineering perspective from integrating APIs, machine learning, other stuff in the business. So the actual ratio was 1:4, 1:5, but that’s because there was a large component of the platform team that was helping with doing more things around building platforms to help integrate, debug, machine learning recommendations, et cetera.

Aurimas: But what were the sizes of the machine learning teams? Probably not hundreds of people in a single team, right?

Stefan: They were, yeah, it’s kind of varied, you know, like eight to ten. Some teams were that large, and others were five, right? 

So really, it really depended on the vertical and kind of who they were helping with respect to the business. So you can think of roughly almost scaled on the modeling. So if you, we were in the UK, there are districts in the UK and the US, and then there were different business lines. There were men’s, women’s, kind of kids, right? 

You could think of like data scientists on each one, on each kind of combination, right? So really dependent where that was needed and not, but like, yeah, anywhere from like teams of three to like eight to ten.

How to be a valuable MLOps Engineer?

Piotr: There is a lot of information and content on how to become data scientists. But there is an order of magnitude less around being an MLOps engineer or a member of the ML platform team. 

What do you think is needed for a person to be a valuable member of an ML platform team? And what is the typical ML platform team composition? What type of people do you need to have?

Stefan: I think you need to have empathy for what people are trying to do. So I think if you have done a bit of machine learning, done a little bit of modeling, it’s not like, so when someone says, so when someone comes to you with a thing, you can ask, what are you trying to do? 

You have a bit more understanding, at a high level, like, what can you do? Right? And then having built things yourself and lived the pains that definitely helps with our empathy. So if you’re an ex-operator, you know that’s kind of what my path was. 

I built models, I realized I liked less building the actual models but the infrastructure around them to ensure that people can do things effectively and efficiently. So yeah, having, I would say, the skillset may be slightly changing from what it was six years ago to now, just because there’s a lot more maturity and open-source in kind of the vendor market. So, there’s a bit of a meme or trope of, with MLOps, it’s VendorOps.

If you’re going to integrate and bring in solutions that you’re not building in-house, then you need to understand a little bit more about abstractions and what do you want to control versus tightly integrate. 

Empathy, so having some background and then the software engineering skillset that you’ve built things to kind of, in my blog post, I frame it as a two-layer API. 

You should never ideally expose the vendor API directly. You should always have a wrap

of veneer around it so that you control some aspects. So that the people you’re providing the platform for don’t have to make decisions. 

So, for example, where should the artifact be stored? Like for the saved file, like that should be something that you as a platform take care of, even though that could be something that’s required from the API, the vendor API to kind of be provided, you can kind of make that decision. 

This is where I kind of say, if you’ve lived the experience of managing and maintaining vendor APIs you’re gonna be a little better at it the next time around. But otherwise, yeah. 

And then if you have a DevOps background as well, or like have built things to deploy yourself, so worked in smaller places, then you also can kind of understand the production implications and like the toolset available of what you can integrate with.

Since you could get a pretty reasonable way with Datadog just on service deployment, right?

But if you want to really understand what’s within the model, why training, serving is important to understand, right? Then having seen it done, having some of the empathy to understand why you need to do it, then I think leads you to just, you know if you have the bigger picture of how things fit end to end, the macro picture, I think then that helps you make better micro decisions.

The road ahead for ML platform teams

Piotr: Okay, makes sense. Stefan, a question because I think when it comes to topics we wanted to cover, we are doing pretty well. I am looking at the agenda. Is there anything we should ask, or would you like to talk?

Stefan: Good question. 

Let’s see, I’m just looking at the agenda as well. Yeah, I mean, I think one of like my, in terms of the future, right? 

I think to me Stitch Fix tried to enable data scientists to do things end-to-end. 

The way I interpreted it is that if you enable data practitioners, in general, to be able to do more self-service, more end-to-end work, they can take business domain context and create something that iterates all the way through. 

Therefore they have a better feedback loop to understand whether it’s valuable or not, rather than more traditional where people are still in this kind of handoff model. And so which case, like there’s a bit of then, who you’re designing tools for kind of question. So are you trying to target engineers, Machine Learning Engineers like with these kinds of solutions? 

Does that mean the data scientist has to become a software engineer to be able to use your solution to do things self-service? There is the other extreme, which is the low code, no code, but I think that’s kind of limiting. Most of those solutions are SQL or some sort of custom DSL, which I don’t think lends itself well to kind of taking knowledge or learning a skill set and then applying it, going into another job.  It’s not necessarily that only works if they’re using the same tool, right?

And so, my kind of belief here is that if we can simplify the tools, the software engineering kind of abstraction that’s required, then we can better enable this kind of self-service paradigm that also makes it easier for platform teams to also kind of manage things and hence why I was saying if you take a vendor and you can simplify the API, you can actually make it easier for a data scientist to use, right? 

So that is where my thesis is that if we can make it lower the software engineering bar to do more self-service, you can provide more value because that same person can get more done. 

But then also, if it’s constructed in the right way, you’re also going to, this is where the thesis with Hamilton is and kind of DAGWorks, that you can kind of more easily maintain things over time so that when someone leaves, it’s not, no one has nightmares inheriting things, which is really where, like at Stitch Fix, we made it really easy to get to production, but teams because the business moved so quickly and other things, they spent half their time trying to keep machine learning pipelines afloat. 

And so this is where I think, you know, and that’s some of the reasons why was because we enable them to do more too, too much engineering, right?

Skills required for building robust tools

Stefan: I’m curious, what do you guys think in terms of who should be the ultimate target for kind of, the level of software engineering skill required to enable self-service, model building, machinery pipelines. 

Aurimas: What do you mean specifically?

Stefan: I mean, so if self-serve is the future. If so, what is that self-engineering skillset required?

Aurimas: To me, at least how I see it in the future, self-service is the future, first of all, but then I don’t really see, at least from experience, that there are platforms right now that data scientists themselves could work against end to end. 

As I’ve seen, in my experience, there is always a need for a machine learning engineer basically who is still in between the data scientists and the platform, unfortunately, but definitely, there should be a goal probably that a person who has a skill set of a current data scientist could be able to do end to end. That’s what I believe.

Piotr: I think it is getting… that is kind of a race. So things that used to be hard six years ago are easy today, but at the same time, techniques got more complex. 

Like we have, okay, today, great foundational models, encoders. The models we’re building are more and more dependent on the other services. And this abstraction will not be anymore, data sets, some preprocessing, training, post-processing, model packaging, and then independent web service, right? 

It is getting more and more dependent also on external services. So, I think that the goal, yes, of course, like if we are repeating ourselves and we will be repeating ourselves, let’s make it self-service friendly, but I think with the development of the techniques and methods in this space, it will be kind of a race, so we will solve some things, but we will introduce another complexity, especially when you’re trying to do something state of the art, you’re not thinking about making things simple to use at the beginning, rather you’re thinking about, okay, whether you will be able to do it, right? 

So the new techniques usually are not so friendly and easy to use. Once they are becoming more common, we are making them easier to use.

Stefan: I was gonna say, or at least jump over what he’s saying, that in terms of one of the techniques I use for designing APIs is really actually trying to design the API first before. 

I think what Piotr was saying is that very easy for an engineer. I found this, you know, problem myself is to go bottom up. It’s like, I wanna build this capability, and then I wanna expose how people kind of use it.

And I actually think inverting that and going, you know, what is the experience that I want

someone to kind of use or get from the API first and then go down is really, it has been a very enlightening experience as to like how could you simplify what you could do because it’s very easy from bottoms up to like to include all these concerns because you want to enable anyone to do anything as a natural tendency of an engineer. 

But when you want to simplify things, you really need to kind of ask the question, you know what is the eighty-twenty? This is where the Python ethos of batteries is included, right?

So how can you make this easy as possible for the most pre-optimal kind of set of people who want to use it?

Final words

Aurimas: Agreed, agreed, actually. 

So we are almost running out of time. So maybe the last question, maybe Stefan, you want to leave our listeners with some idea, maybe you want to promote something. It’s the right time to do it now.

Stefan: Yeah. 

So if you are terrified of inheriting your colleagues’ work, or this is where maybe you’re a new person joining your company, and you’re terrified of the pipelines or the things that you’re inheriting, right? 

I would say I’d love to hear from you. Hamilton, I think, but it is, you could say we’re still a pretty early open-source project, very easy. We have a roadmap that’s being shaped and formed by inputs and opinions. So if you want an easy way to maintain and collaborate as a team on your model pipeline, since individuals build models, but teams own them. 

I think that requires a different skill set and discipline to kind of do well. So come check out Hamilton, tell us what you think. And then from the DAGWorks platform, we’re still at the current, at the time of recording this, we’re still kind of currently kind of closed beta. We have a waitlist, early access form that you can kind of fill out if you’re interested in trying out the platform. 

Otherwise, search for Hamilton, and give us a star on GitHub. Let me know your experience. We’d love to ensure that as your ML ETLs or pipelines kind of grow, your maintenance burdens shouldn’t. 


Aurimas: So, thank you for being here with us today and really good conversation. Thank you.

Stefan: Thanks for having me, Piotr, and Aurimas.

Was the article useful?

Thank you for your feedback!