Topic mining on microblogging sites with sheer scale of instance messages and social network information, such as Twitter, is a hard and challenging problem. Although many text mining techniques and generative probabilistic models have been developed for static plain-text corpus, they are inclined to achieve unsatisfactory results in microblogs without considering that microblogs are temporally sequential and concerned with social network information. In this paper, we propose a novel topic model, MicroBlog- Topics over Time (MB-ToT), which aims for comprehensive topic analysis in microblogs. Firstly, we assume each topic is a mixture distribution influenced by both word co-occurrences and timestamps of microblogs. This allowsMB-ToT to capture the changes of each topic over time. Subsequently, we apply users’ intrinsic interests, social contact relations and #hashtags to improve the topic mining result. Finally, we present a Gibbs sampling implementation for the inference of MB-ToT. We evaluate MB-ToT and compare it with the state-of-the-art methods on a real dataset. In our experiments, MB-ToT outperforms the state-of-the-art methods by a large margin in terms of both perplexity and KL-divergence. We also show that the quality of the generated latent topics of MB-ToT is promising.
Digital Object Identifier (DOI)
Liu, Shaopeng; Yin, Jian; Ouyang, Jia; and Lin, Piyuan
"MB-ToT: An Effective Model for Topic Mining in Microblogs,"
Applied Mathematics & Information Sciences: Vol. 08
, Article 37.
Available at: https://dc.naturalspublishing.com/amis/vol08/iss1/37