TY - JOUR
T1 - Knowledge discovery in sociological databases
T2 - An application on general society survey dataset
AU - Pan, Zhiwen
AU - Li, Jiangtian
AU - Chen, Yiqiang
AU - Pacheco, Jesus
AU - Dai, Lianjun
AU - Zhang, Jun
N1 - Publisher Copyright:
© 2019, Zhiwen Pan, Jiangtian Li, Yiqiang Chen, Jesus Pacheco, Lianjun Dai and Jun Zhang.
PY - 2019/12/9
Y1 - 2019/12/9
N2 - Purpose: The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS data set is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS data set are designed by combining expert knowledges and simple statistics. By utilizing the emerging data mining algorithms, we proposed a comprehensive data management and data mining approach for GSS data sets. Design/methodology/approach: The approach are designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute pre-processing and filter-based attribute selection; a data mining phase which can extract hidden knowledge from the data set by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. Findings: According to experimental evaluation results, the paper have the following findings: Performing attribute selection on GSS data set can increase the performance of both classification analysis and clustering analysis; all the data mining analysis can effectively extract hidden knowledge from the GSS data set; the knowledge generated by different data mining analysis can somehow cross-validate each other. Originality/value: By leveraging the power of data mining techniques, the proposed approach can explore knowledge in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey data set are conducted at the end to evaluate the performance of our approach.
AB - Purpose: The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS data set is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS data set are designed by combining expert knowledges and simple statistics. By utilizing the emerging data mining algorithms, we proposed a comprehensive data management and data mining approach for GSS data sets. Design/methodology/approach: The approach are designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute pre-processing and filter-based attribute selection; a data mining phase which can extract hidden knowledge from the data set by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. Findings: According to experimental evaluation results, the paper have the following findings: Performing attribute selection on GSS data set can increase the performance of both classification analysis and clustering analysis; all the data mining analysis can effectively extract hidden knowledge from the GSS data set; the knowledge generated by different data mining analysis can somehow cross-validate each other. Originality/value: By leveraging the power of data mining techniques, the proposed approach can explore knowledge in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey data set are conducted at the end to evaluate the performance of our approach.
KW - Crowdsourced big data and analytics
KW - Data management
KW - Data mining
KW - Knowledge discovery
UR - http://www.scopus.com/inward/record.url?scp=85123213979&partnerID=8YFLogxK
U2 - 10.1108/IJCS-09-2019-0023
DO - 10.1108/IJCS-09-2019-0023
M3 - Artículo
AN - SCOPUS:85123213979
SN - 2398-7294
VL - 3
SP - 315
EP - 332
JO - International Journal of Crowd Science
JF - International Journal of Crowd Science
IS - 3
ER -