5 Ways Data is Transforming the Insurance Industry

St. John’s Tobin School of Risk Conference on Transforming Insurance

I recently spoke at a conference on the use of data in the Insurance industry at the St. John’s School of Risk in New York City and here are 5 key things I took away:

1. All insurance companies aspire to use more data but few are able to operationalize the use of data

Operationalizing Data Is a Challenge

The Insurance industry has a renewed realization of the value of data due to AI and machine learning. Predictive models have myriad applications in the insurance industry including optimizing customer acquisition, delivering personalized service, processing claims efficiently, intelligently underwriting policies, and detecting fraud more effectively. The common ingredient to build and train predictive models is operational and business data. Luckily, the industry has access to lots of this data — from both internal and external sources. One person sitting next to me at the conference worked at a reinsurance company and had the title of “Data Hunter”. Her sole job was to seek new data sources to help the company. However operationalizing AI can be challenging for incumbent insurance companies whose traditional IT infrastructure cannot scale to take advantage of new data sources and whose internal data is locked up in silos that are incompatible with each other.

Implication

One of the biggest opportunities the industry incumbents have at their disposal is to break down the corporate data silos. (See my general blog on data silos.) Imagine a data platform that has the customer, policy and claims data all stored in one place so that new policy underwriting could consider previous claims and leverage data from the underwriting process such as policy modifications.

2. Early movers are using external data in really interesting ways:

Sensors in Homes Transform P&C Insurance Via Prevention

If there is a common thread that has the potential to transform all the segments and the lines of business of the insurance industry, it is the use of external data. New data sources are transformative for the insurance industry because they can make customer interactions seamless to increase brand loyalty, make critical business processes such as claims management efficient and even help implement preventive practices that can improve the overall profitability of the industry. Let’s explore some of these new data sources.

  • Automobiles equipped with sensors (telematics) and mobile apps will make the process of claims management automated. Armed with data from sensors, insurance companies will no longer be dependent on the parties involved in the incident to determine liability. Furthermore, the application of artificial intelligence and machine learning (ML) to this data will enable the insurance companies to resolve the claims and pay for the damages in a matter of days versus weeks or months. Sensor data and ML will also play a significant role in identifying fraudulent claims and preventing claims to improve the profitability of the insurer. I love the ad where the pregnant mother insists that her husband not speed to the hospital so that she retains her safe driving rating on her mobile app. A financial incentive to not speed using a mobile app.
  • Sensors are not only relevant to the automobile insurance but also to property and casualty (P&C) business. Smart devices such as thermostats, smoke detectors, and security systems represent only the first step towards preventing an adverse event. Once we have sensors in our homes and offices that can detect events such as fire and leaks before they happen and notify the relevant agencies or the homeowner, the potential losses that insurance companies have to cover each year will be significantly reduced.
  • According to eMarketer, about 22% of the U.S. population owns a wearable device. These devices, which track physical activity and vital signs, enable insurers to incorporate this data into pricing life insurance policies based on the lifestyle of the applicant. The insurance companies that figure out how to leverage this data as part of their underwriting and pricing process will also enjoy a first-mover advantage in targeting the healthiest and the most profitable segment of the population.

Implication

Insurance companies need a data platform that has three defining attributes. Not only must it be capable of storing data from diverse data sources including the ones mentioned above. It must also be able to scale from terabytes to petabytes, and from a few to hundreds of nodes, by simply adding commodity servers. So scale-out capability is important.

Second, the platform itself must be able to power mission-critical applications, as well as facilitate data analysis. Insights should not be decoupled from the application but rather be inextricably linked.

Third, the platform must offer functionality to build, train and operationalize predictive models using machine learning. It should be able to store internal and externally generated data, accelerate model training, track workflow, and push these models into production — all in one system.

3. Incumbents will need technology-based solutions to thrive against InsurTechs and industry disruption

InsurTech Upstarts Are Hard To Catch

InsurTechs refers to companies that are using technology to disrupt the traditional insurance industry. InsurTechs tend to be smaller entrepreneurial companies with roots in data, artificial intelligence, and mobile application development. For example, companies like DataCubes and Friss are using data science to transform and accelerate core insurance functions such as commercial underwriting and fraud detection. Others, such as Metromile and Root Insurance, are reinventing core insurance products such as usage-based auto insurance based on the driving distance and habits of their customers.

InsurTechs are disrupting the industry not only through the application of technology but they are also reshaping consumer expectations and demands. According to McKinsey research, since 2012, more than $10 billion has been invested in the InsurTech sector.

Implication

In order to compete effectively against InsurTechs, incumbents must reinvent and modernize the applications that have been the source of their competitive advantage. These are the same applications that have been targeted by InsurTechs using artificial intelligence and machine learning. Incumbents have rich data sources and experienced personnel that have been trained in technologies such as SQL. Rather than trying to duct tape together various components of the traditional IT infrastructure and acquire hard to find skills, insurance companies need to consider a unified platform that can effectively manage both operational and analytical data using SQL. A unified platform enables market leaders to build predictive algorithms at the database level. In-database machine learning can greatly accelerate the speed of decision making and help incumbents fend off pesky InsureTechs.

4. Data Lakes still plague insurers

Drowning in Data Lakes

In an effort to manage Big Data effectively and to drive real-time analytics and decisions, the insurance sector invested heavily in data lakes. These data lakes were built using commercial Hadoop distributions — a flexible number of independent Open Source compute engines joined in a common platform to effect scale. However, the data lake’s schema-on-read functionality led insurance companies to bypass the process of defining which tables contained what data and how are they connected to each other resulted in a repository built haphazardly.

Data lake projects have begun to fail because insurance companies like companies in other industries placed a priority on storing all the enterprise data in a central location with the goal to make this data available to all the developers — an uber data warehouse if you will versus thinking about how the data will power applications. As a result, Hadoop clusters have devolved into gateways of enterprise data pipelines that filter, process, and transform data that is then exported to other databases and data marts for reporting downstream. Data in the data lakes almost never finds its way to a real business application. As a result, the data lakes end up being a massive set of disparate compute engines, operating on disparate workloads, all sharing the same storage which is very difficult to manage. (See my blog on data lakes and Hadoop.)

Implication

As discussed in my above-mentioned blog post, insurance companies are increasingly under pressure to demonstrate the value of their data lake. I suggest that they focus on the operational applications first and then work back to the required data.

By focusing on modernizing the applications with data and intelligence, insurance companies will be able to develop apps that can leverage data to predict what might happen in the future. Insurance companies can then proactively make decisions in-the-moment and without human intervention that results in superior business outcomes.

5. Regulators have made data governance and ML transparency paramount

Machine Learning Needs To Be Transparent And Traceable

Given immense data volumes and diverse data sources, the real value of AI and ML is best achieved if the application is capable of making intelligent decisions at scale without human intervention. However, this capability once achieved gives rise to the perception of a “black box” where most of the business personnel do not fully understand why or how a certain action was taken by the predictive model. This capability is simply not just nice to have but actually critical for use cases where the insurance company must be able to document and defend its decision such as the denial of a claim or insurance policy. Regulators will increasingly press the insurance companies to explain the inner workings of their predictive models, especially in cases where models are used in underwriting and pricing determine premiums to ensure the absence of any discriminatory practices.

Implications

Data governance provides a framework that helps define how the data is sourced, managed and used in any ecosystem. This framework is used to enhance the confidence of the enterprise in its data and the actions that are taken based on analyzing that data. At a time when the insurance industry is undergoing a major transformation, companies need a robust framework that provides visibility into data lineage, the transformations that have been performed on that data, and how it is used. This same framework must also cover predictive and machine learning models. Insurance companies must be able to demonstrate to regulators all the experiments that their data scientists have performed and which model was put into production and how it was modified over time. Therefore data governance must be an integral part of the platform that is used by data scientists to build, train and operationalize models.

To accomplish this goal, consider a platform that provides data scientists the ability to experiment freely. Building predictive models is an iterative process that requires data scientists to continuously tweak their models and assess the impact of these changes on the model’s accuracy. In order to keep track of their experiments, data scientists require a platform like MLFlow. MLFlow boasts a built-in capability to keep track of, and document, the variables for each iteration. In this way, data scientists can objectively demonstrate, to internal stakeholders and external regulators alike, the rationale for placing a specific model into production. They can also prove the absence of any discriminatory practices.

If you’d like to learn more about application modernization in the insurance industry, the team at Splice Machine (where I’m CEO and co-founder) has created a white paper that reflects the work we’ve done with some of the world’s leading insurers.