Some Thoughts on Rules Engines

February 21, 2021

Spend enough time working with domain experts on a problem and someone is bound to say "we should build a rules engine to solve this problem." Often this comes along with the goal of letting domain experts express rules themselves rather than having to evolve engineers. However, once you start to dig into this problem you'll discover that "rules engine" can mean many different things to many different people.

I've seen this problem a few times over the past couple of years, and I've come to think it's useful to break down your rules engine across a few important axes. Doing this will allow you to have clear discussions on what problems you need to solve and what you need to build to solve them.

The questions

What data do your rules have access to? Do your rules need to arbitrarily grab data from a database, or are there a set of well-defined features that people need to express their rules?
When will your rules run? Do they run on every request? When some set of known events happen within your system? Whenever a data pipeline kicks off?
What actions will your rules take when they pass? Do they just need to output true/false, or do they need to output some more complex actions? If the latter, what do you do when a set of rules conflict?
How complex are the rules you want to express? Do you essentially want to run arbitrary logic to match rules, or do you have a set of simple if/else conditions that you need to follow?
How do you want to express your rules? Is abstractions in a programming language okay? Or do you need something simpler?

All of these taken together form a set of constraints for your rules engine. If you want to express arbitrary logic over arbitrary data, well then you're probably doing to need to write a business readable DSL or use something like drools. If you can simplify across some dimension, you can probably save yourself some work .

Looking at this in practice

Let's think about how all this works in practice by using a few examples.

Radar from Stripe

Stripe's Radar product has a rules engine which allows merchants to configure how they should handle charges. It answers the above questions in the following ways:

What data do your rules have access to? A constrained set of features that Stripe provides.
When will your rules run? On every charge to your business.
What actions will your rules take when they pass? Block, allow, review, or request 3DS for the charge. Conflicts are resolved by choosing the first rule that decided on an action.
How complex are the rules you want to express? Fairly simple, a few if/else conditions.
How do you want to express your rules? A simplified DSL that any non-engineer can configure.

This approach constrains users across almost every dimension, but the resulting rules engine is simple enough that almost anyone can use it.

FXL From Facebook

At the other end of the spectrum is FXL, a language that Facebook developed for fighting spam. It has slightly different answers to these questions.

What data do your rules have access to? "A Multitude". We don't know for sure, but probably more than the O(10s) of features radar allows.
When will your rules run? Presumably something like "whenever a content creation request comes in."
What actions will your rules take if they pass? They may log requests, block posts, or warn users. Other actions and conflict resolution is undefined.
How complex are the rules you want to express? Seems like they can get quite complicated.
How do the rules get expressed? A hand-rolled language called FXL.

FXL's rules engine gives users more power at the expense of some complexity. The rules are actual code rather than fancy configuration. This creates spam fighting superpowers at the price of it being harder for non-engineers to interact with write.

SQL-Based Rules Engines

Another common pattern is the SQL-based rules engine, which has an example here. The general idea of these is often "our domain-experts already understand SQL, so let's allow them to just write their rules in a language they know." Again, this can fit into our framework.

What data do your rules have access to? Anything in your database.
When will your rules run? Probably in response to a web request.
What actions will your rules take if they pass? Some set of actions defined as stored procedures. Resolution is again unclear.
How complex are the rules you want to express? Not crazy simple, but also not crazy complicated as well. We could probably express some pretty complex stuff in raw SQL if we needed to though.
Hwo do the rules get expressed? SQL statements that follow a well-defined form.

These approaches fit somewhere in between the two approaches above. They seem useful when you want to return a simple classification result, but it may get more challenging if you want to take more complex actions.

Using the framework

I'm not totally sure exactly how to use this framework yet, but it has been useful for me to ask these questions whenever someone says "we need a rules engine." I hypothesize that if you want to keep the expression language simple, you'll need to invest more engineering-time to provide curated data to your rules engine and limit what it can do.

I also imagine that these approaches can be mixed-and-matched for maximum effectiveness. For example, you could have a rules engine which is based on a business readable DSL, and provide some tooling for non-engineers to create for common use-cases. This allows for iteration speed in the common case without sacrificing power when you really need something complex.

Discussion, links, and tweets

Hey! Thanks for reading! If you like what you read and want more, you can follow me on Twitter.

Follow @maltzj