Massive document similarity calculation engine scheme