Journal of East China Normal University(Natural Sc ›› 2018, Vol. 2018 ›› Issue (1): 76-90.doi: 10.3969/j.issn.1000-5641.2018.01.008

Previous Articles     Next Articles

Data cleaning on probabilistic RDF database

WANG Zhen, LIN Xin   

  1. Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai 200062, China
  • Received:2016-12-03 Online:2018-01-25 Published:2018-01-11

Abstract: Due to the factors such as errors and noises in the process of obtaining and analyzing data, uncertain data arises in many domains, which has emerged as an important issue affecting the performance of data. Uncertain data can be stored in probabilistic databases and query facilities always yield answers with confidence. However, the accumulation and propagation of uncertainty may reduce the usability of the query results. As such, it is desirable to reduce the uncertainty of uncertain data. This paper aims at solving the problem how to promote the answers' certainty in RDF(resource description framework) graph query via crowdsourcing. The basic idea is to ask the crowd to decide whether the relationships represented by some edges are correct. In this paper, we introduce three different algorithms to select the edge which maximizes the uncertainty reduction. Finally, we verify these algorithms by experiments and show that unstable pruning algorithm and stable pruning algorithm perform better in term of efficiency.

Key words: probabilistic RDF graph, crowdsourcing, data cleaning

CLC Number: