Web text information extraction and mining method. PDF