Is machine learning the silver bullet in underwriting?

Using machine-learning to underwrite property insurance has been UrbanStat’s main focus for the last 3 years. It started as a simple Minimum Viable Product (MVP), we had our highs and lows, we made many mistakes, and every time we see a new data set, we are surprised how much we are still learning. For engineers, machine learning is a simple concept albeit the complicated math behind it. For non-engineers, it is an abstract concept where you input your data, and it generates magical results. Ergo, we get the question `… but how? ` very often.

Machine learning is a powerful tool that helps a variety of different industries in so many great ways. It always requires lengthy preparations on data; certain problems need more research or understanding of an entire industry and its regulations. Insurance is unquestionably one of them.

Predicting the policyholders who will file a claim within the next 12 months sounds like a true supervised learning problem at first. However, when you start thinking about how the industry works, you start seeing that there are no real `True Positives` or `True Negatives`. Supervised learning problems require a historical data set with known outcomes. Let’s say you are trying to identify fraudulent cases; the algorithms require a historical dataset where you mark the claims with actual fraudulent claims. When It comes to claim prediction, there is the problem of `lack of claims`. We don’t truly know (most of the time) what happens to the customer that switch carriers. Did they stay claim-free? Another issue is the trouble of defining success. Since insurance is a long-term game, a customer who didn’t claim anything for five years could file a large claim in year 6. If you score this customer `High` in year 4, is your algorithm successful or not? When you measure your performance in year 4, your algorithm fails this policyholder, however, when you run a long-term performance measurement, the scoreboard will reflect an entirely different story.

There are also a few technical obstacles insurance carriers need to overcome. One of the most significant problems is the imbalance between the customers who claim and don’t claim. Most insurance companies have around 1-6% of claim frequency. It means that out of every 100 policyholders, only 1-6 policyholders will file a claim. Between those 1-6 policy holders, there will also be a significant difference on causes of claims and amounts of claims (e.g. few hundred dollars to millions of dollars) Our algorithms try to identify those 1-6 policyholders and rank them based on their predicted profitability so the insurance carriers can come up with fairer terms and pricing for their entire portfolio. It means that only 1-6% of the data tells us a story we want to hear. This is one of the very first things insurance carriers need to solve, and the good news is that there are a few solutions.

Insurance is a highly regulated industry. The way that insurance carriers can use these technologies could be constrained by regulators depending on the state/country they are operating in. The industry should not pass human biases to the algorithms; we need to be very careful about it. This is why UrbanStat never uses PII (gender, age, ethnicity, etc.) or credit scores/financial risk information. We always tell our clients that we don’t want any personal information that would be present in the policy or claim files. The only personal information we use is an address, and we only use that to understand location-based risks, not to come up with socio-economic segmentation. Excluding personal information makes things interesting because we don’t know anything about the customers we are trying to score. Often the actual underwriters have more information about the very same customers as the algorithms don’t have the personal experience/knowledge of the underwriter. This is why algorithms don’t have an ongoing bias but they definitely have the inherited bias which is a topic for another article.

Regardless of the barriers, this is a fascinating problem to work on. It’s our belief that machine learning will change how the insurance industry is underwriting. We don’t necessarily believe that underwriting will be entirely replaced by machines (although some people think otherwise).

Our carrier partners have been generating amazing results improving their loss ratios anywhere from 2 points to 17 points. We’ve recently looked at about 8 years long data from about 10 carriers (UrbanStat clients) in 3 different continents. Loss ratio of our clients has dropped by 10 points in average after they started using our platform compared to before. The technology alone couldn’t achieve this. The tech (machine learning) is not the silver bullet here. Actual silver bullet is the combination of machine-learning, traditional probabilistic modeling, and the last but not least, human intuition. We call this combination ‘the Three Pillars of Risk Analysis`.

To learn more and quickly leverage what we’ve already successfully deployed for our carrier partners contact us at

More from this Author