3DIS: Depth-driven decoupled instance synthesis for text-to-image generation