This book presents an overview of big data methods applied to insurance problems. Specifically, it is a multi-author book that gives a fairly complete view of five important aspects, each of which is presented by authors well known in the fields covered, who have complementary profiles and expertise (data scientists, actuaries, stat isticians, engineers). These range from classical data analysis methods (including learning methods like machine learning) to the impact of big data on the present and future insurance market. Big data, megadata or massive data apply to datasets that are so vast that not only the popular data management methods but also the classical methods of statistics (for example, inference) lose their meaning or cannot apply. The exponential development of the power of computers linked to the crossroads of this data analysis with artificial intelligence helps us to initiate new analysis methods for gigantic databases that are mostly found in the insurance sector as presented in this book. The first chapter, written by Romain Billot, Cécile Bothorel and Philippe Lenca (IMT Atlantique, Brest), presents a sound introduction to big data and its application to insurance. This chapter focuses on the impact of megadata, showing that hundreds of millions of people generate billions of bytes of data each day. The classical characterization of big data by 5Vs is well illustrated and enriched by other Vs such as variability and validity. Introduction written by Marine CORLOSQUET-HABART and Jacques JANSSEN. xiv Big Data for Insurance Companies In order to remedy the insufficiency of classical data management techniques, the authors develop parallelization methods for data as well as possible tasks thanks to the development of computing via the parallelism of several computers. The main IT tools, including Hadoop, are presented as well as their relationship with platforms specialized in decision-making solutions and the problem of migrating to a given oriented strategy. Application to insurance is tackled using three examples. The second chapter, written by Gilbert Saporta (CNAM, Paris), reviews the transition from classical data analysis methods to big data, which shows how big data is indebted to data analysis and artificial intelligence, notably through the use of supervised or non-supervised learning methods. Moreover, the author emphasizes the methods for validating predictive models since it has been established that the ultimate goal for using big data is not only geared towards constituting gigantic and structured databases, but also and especially as a description and prediction tool from a set of given parameters. The third chapter, written by Franck Vermet (EURIA, Brest), aims at presenting the most commonly used actuarial statistical learning methods applicable to many areas of life and non-life insurance. It also presents the distinction between supervised and non-supervised learning and the rigorous and clear use of neural networks for each of the methods, particularly the ones that are mostly used (decision trees, backpropagation of perceptron gradient, support vector machines, boosting, stacking, etc.). The last two chapters are written by insurance professionals. In Chapter 4, Florence Picard (Institute of Actuaries, Paris) describes the present and future insurance market based on the development of big data. It illustrates its implementation in the insurance sector by particularly detailing the impact of big data on management methods, marketing and new insurable risks as well as data security. It pertinently highlights the emergence of new managerial techniques that reinforce the importance of continuous training. Emmanuel Berthelé (Optimind Winter, Paris) is the author of the fifth and last chapter, who is also an actuary. He presents the main uses of big data in insurance, particularly pricing and product offerings, automobile and telematics insurance, index-based insurance, combating fraud and reinsurance. He also lays emphasis on the regulatory constraints specific to the sector Introduction xv (Solvency II, ORSA, etc.) and the current restriction on the use of certain algorithms due to an audibility requirement, which will undoubtedly be uplifted in the future. Finally, a fundamental observation emerges from these last two chapters cautioning insurers against preserving the mutualization principle which is the founding principle of insurance because as Emmanuel Berthelé puts it: “Even if the volume of data available and the capacities induced in the refinement of prices increase considerably, the personalization of price is neither fully feasible nor desirable for insurers, insured persons and society at large.” In conclusion, this book shows that big data is essential for the development of insurance as long as the necessary safeguards are put in place. Thus, this book is clearly addressed to insurance and bank managers as well as master’s students in actuarial science, computer science, finance and statistics, and, of course, new master’s students in big data who are currently increasing. isticians, engineers). These range from classical data analysis methods (including learning methods like machine learning) to the impact of big data on the present and future insurance market. Big data, megadata or massive data apply to datasets that are so vast that not only the popular data management methods but also the classical methods of statistics (for example, inference) lose their meaning or cannot apply. The exponential development of the power of computers linked to the crossroads of this data analysis with artificial intelligence helps us to initiate new analysis methods for gigantic databases that are mostly found in the insurance sector as presented in this book. The first chapter, written by Romain Billot, Cécile Bothorel and Philippe Lenca (IMT Atlantique, Brest), presents a sound introduction to big data and its application to insurance. This chapter focuses on the impact of megadata, showing that hundreds of millions of people generate billions of bytes of data each day. The classical characterization of big data by 5Vs is well illustrated and enriched by other Vs such as variability and validity. Introduction written by Marine CORLOSQUET-HABART and Jacques JANSSEN. xiv Big Data for Insurance Companies In order to remedy the insufficiency of classical data management techniques, the authors develop parallelization methods for data as well as possible tasks thanks to the development of computing via the parallelism of several computers. The main IT tools, including Hadoop, are presented as well as their relationship with platforms specialized in decision-making solutions and the problem of migrating to a given oriented strategy. Application to insurance is tackled using three examples. The second chapter, written by Gilbert Saporta (CNAM, Paris), reviews the transition from classical data analysis methods to big data, which shows how big data is indebted to data analysis and artificial intelligence, notably through the use of supervised or non-supervised learning methods. Moreover, the author emphasizes the methods for validating predictive models since it has been established that the ultimate goal for using big data is not only geared towards constituting gigantic and structured databases, but also and especially as a description and prediction tool from a set of given parameters. The third chapter, written by Franck Vermet (EURIA, Brest), aims at presenting the most commonly used actuarial statistical learning methods applicable to many areas of life and non-life insurance. It also presents the distinction between supervised and non-supervised learning and the rigorous and clear use of neural networks for each of the methods, particularly the ones that are mostly used (decision trees, backpropagation of perceptron gradient, support vector machines, boosting, stacking, etc.). The last two chapters are written by insurance professionals. In Chapter 4, Florence Picard (Institute of Actuaries, Paris) describes the present and future insurance market based on the development of big data. It illustrates its implementation in the insurance sector by particularly detailing the impact of big data on management methods, marketing and new insurable risks as well as data security. It pertinently highlights the emergence of new managerial techniques that reinforce the importance of continuous training. Emmanuel Berthelé (Optimind Winter, Paris) is the author of the fifth and last chapter, who is also an actuary. He presents the main uses of big data in insurance, particularly pricing and product offerings, automobile and telematics insurance, index-based insurance, combating fraud and reinsurance. He also lays emphasis on the regulatory constraints specific to the sector Introduction xv (Solvency II, ORSA, etc.) and the current restriction on the use of certain algorithms due to an audibility requirement, which will undoubtedly be uplifted in the future. Finally, a fundamental observation emerges from these last two chapters cautioning insurers against preserving the mutualization principle which is the founding principle of insurance because as Emmanuel Berthelé puts it: “Even if the volume of data available and the capacities induced in the refinement of prices increase considerably, the personalization of price is neither fully feasible nor desirable for insurers, insured persons and society at large.” In conclusion, this book shows that big data is essential for the development of insurance as long as the necessary safeguards are put in place. Thus, this book is clearly addressed to insurance and bank managers as well as master’s students in actuarial science, computer science, finance and statistics, and, of course, new master’s students in big data who are currently increasing.