Selecting genes from microarray gene expression datasets has become an important research, because such data typically consist of a large number of genes and a small number of samples. Avoiding information loss, neighborhood mutual information is used to evaluate the relevance between genes in this work. Firstly, an improved Relief feature selection algorithm is proposed to create candidate feature subsets. Then, the cohesion degree of the neighborhood of an object and coupling degree between neighborhoods of objects are defined based on neighborhood mutual information. Furthermore, a new initialization method of cluster centers for the Fuzzy C-means (FCM) algorithm is proposed. FCM is a method that allows one piece of data to belong to two or more clusters. Moreover, in view of neighborhood rough set is an effective tool to extract and select features, a novel algorithm for gene selection based on FCM algorithm and neighborhood rough set is proposed. Finally, to evaluate the performance of the proposed approach, we apply it to five well-known gene expression datasets. Experimental results show that the proposed approach can select genes effectively, and can obtain high and stable classification performance.
Digital Object Identifier (DOI)
Xu, Jiucheng; Xu, Tianhe; Sun, Lin; and Ren, Jinyu
"An Efficient Gene Selection Technique based on Fuzzy C-means and Neighborhood Rough Set,"
Applied Mathematics & Information Sciences: Vol. 08
, Article 51.
Available at: https://dc.naturalspublishing.com/amis/vol08/iss6/51