Apache Kylin Hadoop上的大规模联机分析平台
What's Kylin kylin/"k:"mn/麒麟 n( in Chinese art) a mythical animal of composite form EXtreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from eBay that provides SQL interface and multi-dimensional analysis YOLAP)on Hadoop supporting extremely large datasets Open Sourced on Oct 1st, 2014 Be accepted as apache Incubator Project on Nov 25th, 2014 http:/kylinioebayinc Big Data Era More and more data becoming available on Hadoop Limitations in existing Business Intelligence(Bl) Tools Limited support for Hadoop Data size growing exponentially High latency of interactive queries Scale-Up architecture Challenges to adopt Hadoop as interactive analysis system Majority of analyst groups are SQL savvy No mature SQL interface on Hadoop OLAP capability on Hadoop ecosystem not ready yet http:/kylinioebayinc Business Needs for Big Data Analysis Sub-second query latency on billions of rows ANSI SQL for both analysts and engineers Full OLAP capability to offer advanced functionality Seamless Integration with Bl Tools Support of high cardinality and high dimensions High concurrency -thousands of end users Distributed and scale out architecture for large data volume http:/kylinioebayinc 144. Why not Build an engine from scratch? Analytics Query Taxonomy Kylin is designed to accelerate 80+% analytics queries performance on Hadoop High Level e Very High Level, e.g GMV by Aggregation site by vertical by weeks Strategy Analysis Middle level, e. g GMv by site by vertical by category(level x)past 12 weeks Quel OLAP Drill down to detail Detail Level(Summary Table) Operation LoW Leve First le Aggregation Aggragation Transaction OLTP Transaction Transaction data eve http://kylin.io bay inc Technical Challenges Huge volume data a Table scan Big table joins Data shuffling Analysis on different granularity Runtime aggregation expensive Map Reduce job Batch processing http:/kylinioebayinc OLAP Cube-Balance between Space and Time Cuboid one combination of dimensions Cube =all combination of dimensions(all cuboids) Base vs aggregate cells; ancestor VS descendant cells, parent vs child cells 1.(9/15, milk, Urbana, Dairy_land)-< 2.(9/15, milk, Urbana, *) 3.*, milk, Urbana, )- 4.(, milk, Chicago, * )- 5.(*,mik,*,*)-≤ http:kylinioeoayinc From Relational to Key-Value value Values Key values 2010.Us tech 1509 2010, Us .tech1509203 2010.us tech 3543 2010,* 1509 1509203 幸tech 15.0 u12010 us tech10011509 2010,u5 15092034 1087 us 46.3 2010tech 1509 *, us, tech th5。92041087 1509 差2010us+3509 20.34 2010us 35.4 010us tech 2010,丰本 010.*.tech 15092034 010.*tech 20.34 tech tech us tech 20.3 2010.=tech 20.34 15092034100221087 146.52 s us.tech 20.34 203 011 b 100.22 011 cn bab 10022 2011.cn bab 100.22 011, 100.2 0113 10022 10022 100.22 baby 10022 多bab 2011, baby 100.22 2011.cn幸 100.22 2011.Cn 100.22 2011, baby 100.22 2011,,baby 100.22 2012 u5.tech 2012.= 1087 cn bab 100.22 *,cn, baby 100.2 * U5* 1087 1087 012 us tech 087 2012 us tech 1087 tech 1004 1087 2012u5 1087 2012stech 1087 201 10.87 2012. 1087 =us tech 10.87 http:/kylinioebayinc
暂无评论