Image-level classification by hierarchical structure learning with visual and semantic similarities