DNN架构对历史报纸页面分割的评估

divide_80426 9 0 .pdf 2021-01-22 14:01:00

具有复杂布局的历史文档（例如报纸）的光学字符识别（OCR）的一个重要且特别具有挑战性的步骤是将文本与非文本内容（例如页面边框或插图）分开。此步骤通常称为页面细分。..

An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers

One important and particularly challenging step in the optical character recognition (OCR) of historical documents with complex layouts, such as newspapers, is the separation of text from non-text content (e.g. page borders or illustrations). This step is commonly referred to as page segmentation.While various rule-based algorithms have been proposed, the applicability of Deep Neural Networks (DNNs) for this task recently has gained a lot of attention. In this paper, we perform a systematic evaluation of 11 different published DNN backbone architectures and 9 different tiling and scaling configurations for separating text, tables or table column lines. We also show the influence of the number of labels and the number of training pages on the segmentation quality, which we measure using the Matthews Correlation Coefficient. Our results show that (depending on the task) Inception-ResNet-v2 and EfficientNet backbones work best, vertical tiling is generally preferable to other tiling approaches, and training data that comprises 30 to 40 pages will be sufficient most of the time.

用户评论

暂无评论

对果实图像进行阈值分割

通过OPENCV处理果实图像，完成图像中果实的识别，基于颜色空间的阈值分割

46 2019-08-02
如何对软件质量进行评估

如何对软件质量进行评估

25 2020-09-17
对网站的静态页面自动下载

方便用户下载喜欢的网站，可以支持静态页面

19 2020-05-15
b s架构的管理系统的页面模板

这个是一个b/s架构的管理系统的模板，各个页面美观实用，是值得参考的一个网站模板。

16 2020-05-13
firefox better history受Vivaldi启发的更好的历史页面源码

更好的历史历史界面受Vivaldi启发。此扩展程序带来一个新页面来浏览您的历史记录。它包含3个视图:每日,每周和每月视图。日视图: 周视图: 月视图: 黑暗主题如果在操作系统设置中启用了暗模

6 2021-02-09
从0开始学架构架构备选方案评估模板

李运华《从0开始学架构》学习笔记，架构备选方案评估模板

52 2018-12-27
网络报纸浏览收藏家上千家报纸网站的地址

收藏了上千家国内、国外报刊的网站地址,浏览各大报刊网站方便多了,也可增加你常浏览的网站地址。绿色免安装!

12 2020-08-19
SAP_MM分割评估相关配置和操作

某公司有成品A，A产品在同一工厂下既有自制入库，也有外购入库。当A是自制入库时，会计计入的存货科目为Z1(库存商品)；当A是外购入库时，会计计入的存货科目为Z2(外购库存商品)。也就是说，同一物料在同

28 2018-12-29
react实现页面代码分割按需加载的方法

虽然一直有做 react 相关的优化,按需加载、dll 分离、服务端渲染,但是从来没有从路由代码分割这一块入手过,昨天在本地开发时没有测试成功,今天又搞了下,已经部署到线上环境了,今天就这个记录一下。

13 2021-02-21
schemas对ALPS的架构支持源码

模式对ALPS的架构支持 JSON模式 XSD

6 2021-03-30

DNN架构对历史报纸页面分割的评估

用户评论

推荐下载