Recent developments in information systems technologies have resulted in computerizing many applications in various business areas. Data has become a critical resource in many organizations, and therefore, ef cient access to data, sharing the data, extracting information from the data, and making use of the information has become an urgent need. As a result, there have been many efforts on not only integrating the various data sources scattered across several sites, but extracting infor- mation from these databases in the form o f patterns and trends and carrying out data analytics has also become important. These data sources may be databases managed by database management systems, or they could be data warehoused in a repository from multiple data sources. The advent of the World Wide Web in the mid-1990s has resulted in even greater demand for managing data, information, and knowledge effectively. During this period, the services paradigm was conceived which has now evolved into providing computing infrastructures, software, data- bases, and applications as services. Such capabilities have resulted in the notion of cloud computing. Over the past 5 years, developments in cloud computing have exploded and we now have several companies providing infrastructure software and application computing platforms as services. As the demand for data and information management increases, there is also a critical need for maintaining the security of the databases, applications, and information systems. Data, informa- tion, applications, the web, and the cloud have to be protected from unauthorized access as well as from malicious corruption. The approaches to secure such systems have come to be known as cyber security. The signi cant developments in data management and analytics, web services, cloud computing, and cyber security have evolved into an area called big data management and analytics (BDMA) as well as big data security and privacy (BDSP). The U.S. Bureau of Labor and Statistics de nes big data as a collection of large datasets that cannot be analyzed with normal statistical methods. The datasets can represent numerical, textual, and multimedia data. Big data is popularly de ned in terms of ve Vs: volume, velocity, variety, veracity, and value. BDMA requires handling huge volumes of data, both structured and unstructured, arriving at high velocity. By harnessing big data, we can achieve breakthroughs in several key areas such as cyber security and healthcare, resulting in increased productivity and pro tability. Not only do the big data systems have to be secure, the big data analytics have to be applied for cyber security applications such as insider threat detection. This book will review the developments in topics both BDMA and BDSP and discuss the issues and challenges in securing big data as well as applying big data techniques to solve problems. We will focus on a speci c big data analytics technique called stream data mining as well as approaches to applying this technique to insider threat detection. We will also discuss several experimental systems, infrastructures and education programs we have developed at The University of Texas at Dallas on both BDMA and BDSP. We have written two series of books for CRC Press on data management/data mining and data security. The rst series consist of 10 books. Book #1 (Data Management Systems Evolution and Interoperation) focused on general aspects of data management and also addressed interoperability and migration. Book #2 (Data Mining: Technologies, Techniques, Tools, and Trends) discussed data mining. It essentially elaborated on Chapter 9 of Book #1. Book #3 (Web Data Management and Electronic Commerce) discussed web database technologies and discussed e-commerce as an application area. It essentially elaborated on Chapter 10 of Book #1. Book #4 (Managing and Mining Multimedia Databases) addressed both multimedia database management and multimedia data mining. It elaborated on both Chapter 6 of Book #1 (for multimedia database management) xxiii xxiv Preface and Chapter 11 of Book #2 (for multimedia data mining). Book #5 (XML, Databases and the Semantic Web) described XML technologies related to data management. It elaborated on Chapter 11 of Book #3. Book #6 (Web Data Mining and Applications in Business Intelligence and Counter- terrorism) elaborated on Chapter 9 of Book #3. Book #7 (Database and Applications Security) examined security for technologies discussed in each of our previous books. It focuses on the tech- nological developments in database and applications security. It is essentially the integration of Information Security and Database Technologies. Book #8 (Building Trustworthy Semantic Webs) applies security to semantic web technologies and elaborates on Chapter 25 of Book #7. Book #9 (Secure Semantic Service-Oriented Systems) is an elaboration of Chapter 16 of Book #8. Book #10 (Developing and Securing the Cloud) is an elaboration of Chapters 5 and 25 of Book #9. Our second series of books at present consists of four books. Book #1 is Design and Implementation of Data Mining Tools. Book #2 is Data Mining Tools for Malware Detection. Book #3 is Secure Data Provenance and Inference Control with Semantic Web. Book #4 is Analyzing and Securing Social Networks. Book #5, which is the current book, is Big Data Analytics with Applications in Insider Threat Detection. For this series, we are converting some of the practical aspects of our work with students into books. The relationships between our texts will be illus- trated in Appendix A. ORGANIZATION OF THIS BOOK This book is divided into ve parts, each describing some aspect of the technology that is relevant to BDMA and BSDP. The major focus of this book will be on stream data analytics and its applica- tions in insider threat detection. In addition, we will also discuss some of the experimental systems we have developed and provide some of the challenges involved. Part I, consisting of six chapters, will describe supporting technologies for BDMA and BDSP including data security and privacy, data mining, cloud computing and semantic web. Part II, consisting of six chapters, provides a detailed overview of the techniques we have developed for stream data analytics. In particular, we will describe our techniques on novel class detection for data streams. Part III, consisting of nine chapters, will discuss the applications of stream analytics for insider threat detection. Part IV, consisting of six chapters, will discuss some of the experimental systems we have developed based on BDMA and BDSP. These include secure query processing for big data as well as social media analysis. Part V, consisting of seven chapters, discusses some of the challenges for BDMA and BDSP. In particular, securing the Internet of Things as well as our plans for developing experimental infrastructures for BDMA and BDSP are also discussed. DATA, INFORMATION, AND KNOWLEDGE In general, data management includes managing the databases, interoperability, migration, ware- housing, and mining. For example, the data on the web has to be managed and mined to extract information and patterns and trends. Data could be in les, relational databases, or other types of databases such as multimedia databases. Data may be structured or unstructured. We repeatedly use the terms data, data management, and database systems and database management systems in this book. We elaborate on these terms in the appendix. We de ne data management systems to be systems that manage the data, extract meaningful information from the data, and make use of the information extracted. Therefore, data management systems include database systems, data ware- houses, and data mining systems. Data could be structured data such as those found in relational databases, or it could be unstructured such as text, voice, imagery, and video. There have been numerous discussions in the past to distinguish between data, information, and knowledge. In some of our previous books on data management and mining, we did not attempt to clarify these terms. We simply stated that, data could be just bits and bytes or it could convey some meaningful information to the user. However, with the web and also with increasing interest in data, Preface xxv information and knowledge management as separate areas, in this book we take a different approach to data, information, and knowledge by differentiating between these terms as much as possible. For us data is usually some value like numbers, integers, and strings. Information is obtained when some meaning or semantics is associated with the data such as John’s salary is 20K. Knowledge is something that you acquire through reading and learning, and as a result understand the data and information and take actions. That is, data and information can be transferred into knowledge when uncertainty about the data and information is removed from someone’s mind. It should be noted that it is rather dif cult to give strict de nitions of data, information, and knowledge. Sometimes we will use these terms interchangeably also. Our framework for data management discussed in the appendix helps clarify some of the differences. To be consistent with the terminology in our previ- ous books, we will also distinguish between database systems and database management systems. A database management system is that component which manages the database containing persistent data. A database system consists of both the database and the database management system. FINAL THOUGHTS The goal of this book is to explore big data analytics techniques and apply them for cyber secu- rity including insider threat detection. We will discuss various concepts, technologies, issues, and challenges for both BDMA and BDSP. In addition, we also present several of the experimental systems in cloud computing and secure cloud computing that we have designed and developed at The University of Texas at Dallas. We have used some of the material in this book together with the numerous references listed in each chapter for graduate level courses at The University of Texas at Dallas on “Big Data Analytics” as well on “Developing and Securing the Cloud.” We have also provided several experimental systems developed by our graduate students. It should be noted that the eld is expanding very rapidly with several open source tools and commercial products for managing and analyzing big data. Therefore, it is important for the reader to keep up with the developments of the various big data systems. However, security cannot be an afterthought. Therefore, while the technologies for big data are being developed, it is important to include security at the onset.