Keys are very important for data management. Due to the hierarchical structure and syntactic flexibility of XML, mining keys from XML data is a more complex and difficult task than from relational databases. In discovering keys from XML data there are some challenges in practice such as unclearness of keys, storage of enormous keys, efficient mining algorithms, etc. In this paper, in order to fill the gap between theory and practice, we propose a novel approximate measure of the support and confidence for XML keys on the basis of the number of null values on key paths. In the mining process, inference rules are used to derive new keys. Through the two-phase reasoning, a target set of approximate keys and its reduced set are obtained. Our research conducted experiments over ten benchmark XML datasets from XMark and four files in the UW XML Repository. The results show that the approach is feasible and efficient, with which effective keys in various XML data can be discovered.
Digital Object Identifier (DOI)
Liu, Yijun; Ye, Feiyue; Liu, Jixue; and He, Sheng
"Mining Approximate Keys based on Reasoning from XML Data,"
Applied Mathematics & Information Sciences: Vol. 08
, Article 59.
Available at: https://dc.naturalspublishing.com/amis/vol08/iss4/59