Streaming Pipeline using Dataflow 源码

qqsilky21871 14 0 ZIP 2021-04-06 17:04:48

使用DataFlow进行流传输管道(正在建设中) 这是使用Python存储库的Apache Beam简介的一部分。在这里,我们将尝试学习Apache Beam的基础知识来创建Streaming管道。我们将逐步学习如何使用创建流式传输管道。完整的过程分为5个部分: 从Pub Sub读取数据解析数据过滤数据执行类型转换数据争吵删除不需要的列在Bigquery中插入数据动机在过去的两年中,我一直处于良好的学习曲线中,在此过程中,我提高了自己的技能,进入了机器学习和云计算领域。这个项目是我所有学习的实践项目。这是未来的第一件事。使用的库/框架内置克隆库 # clone this repo: git clone https://github.com/adityasolanki205/Streaming-Pipeline-using-DataFlow

文件列表

Streaming-Pipeline-using-Dataflow-master.zip (预估有个39文件)

Streaming-Pipeline-using-Dataflow-master

generating_data.py 4KB

publish_to_pubsub.py 776B

publish_to_pubsub.ipynb 2KB

data

german-original.data 78KB

german.data 78KB

.ipynb_checkpoints

german-checkpoint.data 78KB

Book1.xlsx 100KB

__pycache__

Test.cpython-37.pyc 4KB

Testing.cpython-37.pyc 4KB

generating_data.cpython-37.pyc 3KB

output

simpleoutput.txt-00000-of-00001 78KB

Convert_datatype.txt-00000-of-00001 449KB

beam-temp-testing.txt-865077c8798211eb8de37440bb0a5a10

7c9f22fd-f2e5-4996-a618-9f89b15df98c.testing.txt 5KB

complete_output.txt-00000-of-00001 470KB

beam-temp-Filtered_Data.txt-c411140676be11eba9e67440bb0a5a10

2c2c5d61-edd9-4941-a565-ca23f651d445.Filtered_Data.txt 10KB

Delete_Unwanted_Columns.txt-00000-of-00001 470KB

SplitPardo.txt-00000-of-00001 468KB

Filtered_Data.txt-00000-of-00001 462KB

DataWrangle.txt-00000-of-00001 513KB

beam-temp-Filtered_Data.txt-468a3b8076c011eb9d9b7440bb0a5a10

9456b0a9-3e51-467f-b7db-8710be785578.Filtered_Data.txt 466KB

.ipynb_checkpoints

Filtered_Data-checkpoint.txt-00000-of-00001 462KB

Convert_datatype-checkpoint.txt-00000-of-00001 449KB

complete_output-checkpoint.txt-00000-of-00001 470KB

DataWrangle-checkpoint.txt-00000-of-00001 513KB

.txt-00000-of-00001-checkpoint 78KB

SplitPardo-checkpoint.txt-00000-of-00001 468KB

Delete_Unwanted_Columns-checkpoint.txt-00000-of-00001 470KB

streaming-pipeline.py 6KB

.ipynb_checkpoints

generating_data-checkpoint.ipynb 6KB

streaming-pipeline-checkpoint.py 6KB

publish_to_pubsub-checkpoint.ipynb 72B

generating_data-checkpoint.py 4KB

Local-checkpoint.py 6KB

README-checkpoint.md 16KB

generating_data_testing-checkpoint.py 4KB

Testing-checkpoint.py 5KB

README.md 16KB

generating_data_testing.py 4KB

generating_data.ipynb 6KB

用户评论

暂无评论

Spark Streaming Introduction

关于Spark Streaming的介绍，讲课用的讲义，英文版本的

47 2018-12-27
flumeng for streaming spark

flume kafka sparkstreamngpush channel and poll snik

14 2021-05-01
Experiments in Streaming Content in Java ME源码下载

NULL 博文链接:https://fonter.iteye.com/blog/439086

11 2020-10-04
spark streaming技术内幕与源码剖析

【更多关于《Spark》资料，加qq群:931257845领取！】 streaming技术内幕与源码剖析

29 2018-12-07
darwin streaming server5.5.5源码分析

DarwinStreamingServer5.5.5源代码分析文档DarwinStreamServer是苹果公司开发的流媒体视频服务器。我们分析的代码基于版本5.5.5。

19 2019-05-28
udacity data streaming p1源码

Apache Kafka的公共交通状态在这个项目中,您将围绕Apache Kafka及其生态系统构建一个流事件管道。利用公开数据,我们将在卡夫卡周围构建一个事件管道,使我们能够实时模拟和显示火车线

5 2021-02-25
jenkins pipeline詹金斯管道源码

jenkins-pipeline:詹金斯管道

11 2021-04-08
Seattle911Emergency ETL Pipeline源码

Seattle-911-Emergency-ETL-Pipeline

5 2021-05-03
java ci pipeline课堂活动1源码

java-ci-pipeline 课堂活动1

7 2021-04-24
pipeline maven plugin管道maven插件源码

pipeline-maven-plugin:管道maven插件

19 2021-02-06

Streaming Pipeline using Dataflow 源码

文件列表

用户评论

推荐下载