在SPARK SUMMIT 2017上,Matei Zaharia分享了题为《Trends for Big Data and Apache Spark in 2019》,就大数据APP发展新趋势,数据在各种领域的应用等方面的内容做了深入的分析。 Spark Streaming At Bing ScaleBing Scale Problem -Log MergingMerge Bing query events with click eventsLambda architecture: batch- and stream-processing shares the same C#Kaarthik SivashanmugamlibrarySpark Streaming in C#@kaarthikssRaw LogsDatabusEvent Merge Pipeline.-DatabusMicrosoft10-minute App- Time window32②CHaoulKaka28 GHIaLOgH MicrosoftIESIIIIE水SPARK SUMMIT 2036零 databricksApache Spark @Scale: A 60 TB+ production use caseSital kediaShuojie WangAvery ChingLatency(in hours)Facebook often uses analytics for data-driven decis 80product growth has pushed our analytics engines toa single query. Some of our batch analytics is execl 60(contributed to Apache Hive by Facebook in 2009)implementation. Facebook has also continued to gn 40against several internal data stores, including Hive20facebook0Job1JobeHiveSpark零 databricksCapitaloneCredit Fraud Prevention withSpark and Graph.AnalysiFCcApplication Scoring scal4 Attached saurabhouster E View Code DFe ORnAScheduleChris D'AgostinoVP Technology, Capital One@chrisdagostinomcapifalone517Q零 databricksThis talkWhat are the new trends for big data apps in 2017?Work to address them at databricks +elsewhere零 databricksThree Key Trends1) Hardware: compute bottleneckUsers. democratizing access to big data3)Applications: production apps零 databricksThree Key Trends1) Hardware: compute bottleneckUsers. democratizing access to big data3)Applications: production apps零 databricksHardware trends2010Storage100 MB/S(HDD)Network1GbpsCPU3GHZ零 databricksHardware trends20102017100 MB/S1000MB/sStorage(HDD)(SSD)Network1Gbps10GbpsCPU3GHZ3GHz零 databricks