attention over learned object embeddings enables complex visual reasoning