连续的环境观测数据是复杂系统,如天气建模、智慧型科技创新和宏观系统层科学研究的重要驱动力;数据源、应用领域和应用需求的不同,使得管理千差万别的实时数据,并提供有效查找、识别、融合和重用服务变得极富挑战性.数据基础设施通过为数据提供全生命周期的管理服务,为上层数据驱动的科学研究和应用创新提供标准化的数据查询、读取和处理服务.然而数据基础设施的建设常局限于特定领域和历史遗留的前期系统,缺少统一参考,以及基础设施之间数据和服务的互通困难,极大地制约了上层应用,特别是跨学科研究发展的需求.针对大数据基础建设中的上述挑战,借鉴欧洲环境大数据参考模型在环境大数据基础设施建设方面的经验,提出了农业大数据参考模型,为我国农业大数据基础设施建设和所涉及的数据互通挑战提供参考.两个案例展示了所提出的农业大数据参考模型在大数据基础设施的需求分析改进、历史遗留系统的数据互通接入等方面的作用.
Big data infrastructures provide services for the management of data over the course of their lifecycle, and offer users the ability to effectively discover and access data for different application purposes. These emerging infrastructures essentially enable system-level data-centric research; third-party innovation, however, often requires data from different sources. The construction of big data infrastructures faces important interoperability challenges arising from the diverse nature of data acquisition, annotation, and identification performed in different research domains. Moreover, the evolution of different infrastructures is often driven by the specific interests of researchers, in their respective domains, and the constraints of legacy technology. The ENVRI Reference Model (ENVRI RM) is an output of the EU H2020 ENVRI and ENVRI PLUS project, targeting the aforementioned challenges in the context of environmental sciences by modeling environmental research infrastructures with a multi-viewpoints framework; these viewpoints include science, information, computation, engineering, and technology. Each viewpoint describes concrete aspects of a system definition and forms a mechanism to improve the interoperability across the whole system as well as alignment with existing legacy systems. The challenges encountered in the Shanghai Agricultural Big Data Infrastructures construction work are similar to those detected in the ENVRI RM, which provides an ideal place to test the generalizability of the ENVRI RM to other domains. Using the ENVRI RM as a reference, this paper presents an Agricultural Reference Model, which includes the five aforementioned viewpoints, but with consideration of the specifics of the agricultural domain, to address the problems encountered in revising and upgrading the Shanghai Agricultural Big Data Infrastructures. Two use cases are introduced to demonstrate its effectiveness. One is to improve the requirement engineering procedure with the community and role context captured using the Agricultural Reference Model. The other is to upgrade the large volume of existing systems to increase interconnections via the interoperability mechanisms provided by the Agricultural Reference Model.
[1] DEMCHENKO Y, ZHAO Z M, GROSSO P, et al. Addressing big data challenges for scientific data infrastructure[C]//Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on. IEEE, 2012:614-617.
[2] 葛晶. 我国智慧农业的管理模式、问题及战略对策[J]. 生态经济, 2017, 33(11):117-121, 133.
[3] MARTIN P, CHEN Y, HARDISTY A, et al. Computational challenges in global environmental research infrastructures[C]//Terrestrial Ecosystem Research Infrastructures:Challenges and Opportunities. Boca Raton, FL USA:CRC Press, Taylor & Francis Group, 2017:305-340.
[4] ICOS. Integrated carbon observation system[EB/OL].[2018-03-05]. https://www.icons-ri.eu/.
[5] ANAEE. Analysis and experimentation on ecosystems[EB/OL].[2018-03-05]. https://www.anaee.com/.
[6] EPOS. European plate observation system, 2017[EB/OL].[2018-12-26]. https://www.epos-ip.org/.
[7] European Commission. European open science cloud[EB/OL].[2018-03-05]. https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud.
[8] DataOne. Data observation network for earth[EB/OL].[2018-12-26]. https://www.dataone.org/.
[9] 中华人民共和国农业部. 农产品追溯编码导则:NY/T 1431-2007[S].
[10] 公安部. 信息安全技术网络通讯安全审计数据留存功能要求:GA/T 695-2007[S].
[11] 北京市质量技术监督局.物联网感知设备通用信息安全技术要求:DB11/T 1285-2015[S].
[12] 浙江省质量技术监督局. 农村电子商务服务站(点)管理与服务规范:DB33/T 982-2015[S].
[13] 国家标准局. 文献数目信息交换用数学字符编码字符集:GB/T 6513-1986[S].
[14] 江苏省质量技术监督局. 江苏省农村综合信息服务平台建设通则:DB32/T 2290-2013[S].
[15] OGC. Open Geo Spatial Standards, SensorML[EB/OL].[2018-03-06]. http://www.opengeospatial.org/standards/sensorml.
[16] DCC. Curation Lifecycle Model[EB/OL].[2018-03-06]. http://www.dcc.ac.uk/resources/curation-lifecyclemodel.
[17] OGC. OGC network Common Data Form (netCDF) standards suite[EB/OL].[2018-03-06]. http://www.opengeospatial.org/standards/netcdf.
[18] Geographic information-Metadata standard:ISO19115[S/OL].[2018-03-06]. https://www.iso.org/standard/53798.html.
[19] euroCRIS. The Common European Research Information Format, CERIF[EB/OL].[2018-03-06]. http://www.eurocris.org/cerif/main-features-cerif.
[20] OGC. OGC Web Services Context Document (OWS Context), OWS[EB/OL].[2018-03-06]. http://www.opengeospatial.org/standards/owc.
[21] LININGTON P F, MILOSEVIC Z, TANAKA A, et al. Building Enterprise Systems with ODP:An Introduction to Open Distributed Processing[M].[S.l]:Chapman & Hall/CRC Press, 2011.
[22] AIMS. AGROVOC Multilingual agricultural thesaurus, AGROVOC[EB/OL].[2018-03-06]. http://aims.fao.org/vest-registry/vocabularies/agrovoc-multilingual-agricultural-thesaurus.
[23] RDA. Research Data Alliance, RDA[EB/OL].[2018-03-06]. https://rd-alliance.org/.
[24] ZHAO Z M, MARTIN P, GROSSO P, et al. Reference model guided system design and implementation for interoperable environmental research infrastructures[C]//e-Science (e-Science), 2015 IEEE 11th International Conference on. IEEE, 2015:551-556.
[25] ESFRI. European Strategic Forum for Research Infrastructures[EB/OL].[2018-03-06]. http://www.esfri.eu/.
[26] ZHAO Z M, BELLOUM A, BUBAK M. Special section on workflow systems and applications in e-Science[J]. Future Generation Computer Systems, 2009, 25(5):525-527.
[27] CANDELA L, PAGANO P, CASTELLI D, et al. Realising virtual research environments by hybrid data infrastructures:the d4science experience[C]//International Symposium on Grids and Clouds (ISGC) 2014. SISSA Medialab, 2014, 210:022.
[28] MILLER M A, PFEIFFER W, SCHWARTZ T. The CIPRES science gateway:enabling high-impact science for phylogenetics researchers with limited resources[C]//Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment:Bridging from the eXtreme to the campus and beyond. ACM, 2012:39.
[29] CEOS. Working group on information systems and services[EB/OL].[2018-03-05]. http://ceos.org/documentmanagement/Publications/GoverningDocs/WGISSTerms-of-ReferenceOct2015.pdf.
[30] BALL A. Review of data management lifecycle models[R/OL]. (2012-02-13)[2018-03-05]. https://purehost.bath.ac.uk/ws/portalfiles/portal/206543/redm1rep120110ab10.pdf.
[31] FGDC. Stages of the geospatial data lifecycle pursuant to OMB circular A-16[EB/OL].[2018-03-06]. https://www.fgdc.gov/policyandplanning/a-16/index.html.
[32] BLM. The bureau of land management handbooks[EB/OL].[2018-03-06]. https://www.blm.gov/policy/handbooks.
[33] ENVR. Environmental RI reference model[EB/OL].[2018-03-05]. http://envri.eu.
[34] ENVRIPLUS.EU H2020 ENVRIPLUS project[EB/OL].[2018-03-06]. http://www.envriplus.eu.
[35] Information technology-Open Systems Interconnection-Basic Reference Model:Naming and addressing:ISO/IEC 7498-3:1997[S/OL].[2018-03-06]. https://www.iso.org/standard/25022.html
[36] OASIS. OASIS SOA reference model TC[EB/OL].[2018-03-06]. https://www.oasis-open.org/committees/tchome.php?wgabbrev=soa-rm.
[37] WORKFLOW MANAGEMENT COALITION. Workflow management Coaliation, WfMC reference model[EB/OL].[2018-03-06]. http://www.wfmc.org/2-uncategorised/53-reference-model.