Neural text-to-speech (TTS) can present high quality near pure speech if an enough quantity of high-quality speech materials is obtainable for coaching. Nonetheless, buying speech knowledge for TTS coaching is expensive and time-consuming, particularly if the purpose is to generate totally different talking types. On this work, we present that we are able to switch talking fashion throughout audio system and enhance the standard of artificial speech by coaching a multi-speaker multi-style (MSMS) mannequin with long-form recordings, along with common TTS recordings. Specifically, we present that 1) multi-speaker modeling improves the general TTS high quality, 2) the proposed MSMS strategy outperforms pre-training and fine-tuning strategy when using extra multi-speaker knowledge, and three) long-form talking fashion is extremely rated whatever the goal textual content area.