語系:
繁體中文
English
日文
簡体中文
說明(常見問題)
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Probabilistic random field based met...
~
Peng, Xujun.
Probabilistic random field based method for annotated machine printed documents preprocessing.
紀錄類型:
書目-語言資料,印刷品 : Monograph/item
書名/作者:
Probabilistic random field based method for annotated machine printed documents preprocessing.
作者:
Peng, Xujun.
面頁冊數:
124 p.
附註:
Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: 2211.
Contained By:
Dissertation Abstracts International72-04B.
標題:
Computer Science.
ISBN:
9781124475561
摘要、提要註:
Today, the convenience of search, both on the personal computer hard disk and on the web, is essentially limited to machine-printed text documents and images because of the poor accuracy of handwriting recognizers. The proposed research will advance the state-of-the-art in realizing search of hand-annotated documents. We will primarily target machine-printed documents which have been annotated by hand by multiple writers in an office/collaborative environment. In applications where the annotations are action instructions (such as, "make 4 copies", "remove Figure X" etc.) we can envision the proposed system serving as the front end of an OCR-based NLP module. We expect that the techniques developed in this dissertation will be also useful for retrieval of pages from material in languages for which accurate OCRs do not exist.
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3440333
Probabilistic random field based method for annotated machine printed documents preprocessing.
Peng, Xujun.
Probabilistic random field based method for annotated machine printed documents preprocessing.
- 124 p.
Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: 2211.
Thesis (Ph.D.)--State University of New York at Buffalo, 2011.
Today, the convenience of search, both on the personal computer hard disk and on the web, is essentially limited to machine-printed text documents and images because of the poor accuracy of handwriting recognizers. The proposed research will advance the state-of-the-art in realizing search of hand-annotated documents. We will primarily target machine-printed documents which have been annotated by hand by multiple writers in an office/collaborative environment. In applications where the annotations are action instructions (such as, "make 4 copies", "remove Figure X" etc.) we can envision the proposed system serving as the front end of an OCR-based NLP module. We expect that the techniques developed in this dissertation will be also useful for retrieval of pages from material in languages for which accurate OCRs do not exist.
ISBN: 9781124475561Subjects--Topical Terms:
423143
Computer Science.
Probabilistic random field based method for annotated machine printed documents preprocessing.
LDR
:03659nam 2200325 4500
001
365281
005
20120516132903.5
008
121018s2011 ||||||||||||||||| ||eng d
020
$a
9781124475561
035
$a
(UMI)AAI3440333
035
$a
AAI3440333
040
$a
UMI
$c
UMI
100
1
$a
Peng, Xujun.
$3
475305
245
1 0
$a
Probabilistic random field based method for annotated machine printed documents preprocessing.
300
$a
124 p.
500
$a
Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: 2211.
500
$a
Adviser: Venu Govindaraju.
502
$a
Thesis (Ph.D.)--State University of New York at Buffalo, 2011.
520
$a
Today, the convenience of search, both on the personal computer hard disk and on the web, is essentially limited to machine-printed text documents and images because of the poor accuracy of handwriting recognizers. The proposed research will advance the state-of-the-art in realizing search of hand-annotated documents. We will primarily target machine-printed documents which have been annotated by hand by multiple writers in an office/collaborative environment. In applications where the annotations are action instructions (such as, "make 4 copies", "remove Figure X" etc.) we can envision the proposed system serving as the front end of an OCR-based NLP module. We expect that the techniques developed in this dissertation will be also useful for retrieval of pages from material in languages for which accurate OCRs do not exist.
520
$a
The main research task proposed is that of segmenting handwritten text, machine printed text, noise or overlapped text, sometimes referred to as the task of "ink separation". Prior techniques primarily use histogram thresholding and analysis of the connectivity of strokes. These algorithms, although effective, rely on heuristic rules of spatial constraints, and are not scalable across applications. We have developed a system that is composed by three parts: the binariztion of document images (focus on hand-held devices captured documents), a boosted tree classifier to perform the initial classification which is followed by a Markov random field (MRF) based approach to re-label the initial segments based on their statistical dependencies within a neighborhood. The MRF based binarization will provide a reliable binarized document image for segmentation even with bad illumination. The boost tree will allow dividing the training data set into several small clusters and use a simple classifier to solve the initial labeling at the cluster (homogeneous) levels. The overlapped text will be further separated using a MRF based method.
520
$a
The isolated handwritten textual blocks will be indexed (unsupervised) based on writing instrument, style, ink color, etc. as being possible indicators of different writers. We have shown the ability to selectively remove the annotations belonging to a particular writer and allow the end user of the system to view an unmarked document even though the original document image is marked up. This feature will be accomplished by intelligent document restoration whereby the removal of overlapping strokes does not damage the underlying machine-printed text.
520
$a
We have performed experiments on a large document dataset and report results.
590
$a
School code: 0656.
650
4
$a
Computer Science.
$3
423143
690
$a
0984
710
2
$a
State University of New York at Buffalo.
$b
Computer Science and Engineering.
$3
475195
773
0
$t
Dissertation Abstracts International
$g
72-04B.
790
1 0
$a
Govindaraju, Venu,
$e
advisor
790
1 0
$a
Corso, Jason
$e
committee member
790
1 0
$a
Scott, Peter
$e
committee member
790
$a
0656
791
$a
Ph.D.
792
$a
2011
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3440333
筆 0 讀者評論
多媒體
多媒體檔案
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3440333
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼
登入