Summary web page text information automatic extraction technology review ++.pdf