Journal of East China Normal University(Natural Science)

Design of experimental data governance module for chemical material formulation

Yiming YU, Yuchen HONG, Ye WANG, Qiwen DONG

2022, 2022 (5): 1-13. doi: 10.3969/j.issn.1000-5641.2022.05.001

Abstract ( 256 )

HTML ( 5 )

PDF (3115KB) ( 140 )

Save

As a new type of production factor, data elements are becoming key to enterprise development. As an important component of national economic production, the chemical material industry must upgrade their data information system according to the needs of its construction. In this regard, a tailored data governance module is proposed for managing experimental formula data in the chemical material industry. The data governance module proposes the use of corresponding data standards and specifications according to the current business scenario of the enterprise. The system obtains data from the front end, improves its quality, stores it in a database, evaluates the data from the back end, and finally returns it to the front end for display, thereby creating a closed-loop negative feedback system.

Figures and Tables | References | Related Articles | Metrics

Data application system for multi-source data-aided materialresearch and development

Yuchen HONG, Yiming YU, Ye WANG, Qiwen DONG

2022, 2022 (5): 14-25. doi: 10.3969/j.issn.1000-5641.2022.05.002

Abstract ( 238 )

HTML ( 7 )

PDF (945KB) ( 113 )

Save

The material research industry generates a wide variety of data from a wide range of sources. Notably, the data islanding issue pertaining to this scenario demands an application platform that integrates data collection, management, and application so that the value of the data can be better utilized for development. For this purpose, this study proposes a data management system that handles the entire data management lifecycle from collection to application for the material research industry, including thematic database, formula performance prediction, and formula correlation analysis capabilities. Itis expected that this development will promote data-driven innovations and improvements within the industry.

Figures and Tables | References | Related Articles | Metrics

Optimization of HTAP data synchronization based on query frequency

Yongjin TANG, Jiabo SUN, Peng CAI

2022, 2022 (5): 26-35. doi: 10.3969/j.issn.1000-5641.2022.05.003

Abstract ( 260 )

HTML ( 9 )

PDF (865KB) ( 171 )

Save

A hybrid transaction analytical processing (HTAP) system must concurrently support both transaction processing and query analysis. To eliminate interference between them, HTAP systems also typically assign different copies of data to both workloads, handling online transaction processing (OLTP) and online analytical processing (OLAP) requests separately, and synchronizing data between the copies based on a log replay. An HTAP system is committed to efficiently synchronizing OLTP data to OLAP, thereby providing a fresher data access service. In addition, the speed of sending and replaying the logs of the tables to be queried is a key factor affecting the freshness of the data. In this paper, using the table grouping based log parallel replay method and the characteristics of the HTAP load, a log sending and replay method is proposed based on the query frequency of the OLAP side. To ensure data consistency, this method improves the processing priority of high-frequency query table logs and achieves efficient log sending and replay capabilities along with a targeted priority display of high-frequency query table data, thereby ensuring the freshness of the HTAP system.

Figures and Tables | References | Related Articles | Metrics

Development and application of blockchain-based inspection certificates in the steel supply chain

Yingjie YANG, Gang QIN, Shugang CHEN, Furong YIN, Zhao ZHANG, Cheqing JIN

2022, 2022 (5): 36-47. doi: 10.3969/j.issn.1000-5641.2022.05.004

Abstract ( 363 )

HTML ( 9 )

PDF (1148KB) ( 130 )

Save

This study investigates the application of consortium blockchain technology to steel inspection certificates to facilitate the issuing of trustworthy, accountable, transferable, divisible, and electronic certificates for industrial internet users. With the help of blockchain technology, steel plants provided authentic certificate data to Ouyeel. They authorized Ouyeel to produce and issue electronic inspection certificates based on the trustworthy certificate data and online business transactions for internet users. The smart contracts were developed for quality data recording, ownership transferring, certificate history, and so on. The blockchain-based steel inspection certificates were tamper-proof and could be verified by scanning the 2D barcode on the certificates. Thus, the costs of printing, mailing, and archiving for manufacturers, distributors, and end users are significantly reduced. The experiments prove the high efficiency and availability of the system through sufficient function and performance tests.

Figures and Tables | References | Related Articles | Metrics

Benchmarking join order selection of query optimizers

Ting CHEN, Zhaokun XIANG, Jinkai XU, Rong ZHANG

2022, 2022 (5): 48-60. doi: 10.3969/j.issn.1000-5641.2022.05.005

Abstract ( 315 )

HTML ( 9 )

PDF (1151KB) ( 222 )

Save

Join order selection, i.e., the determination of the cheapest join order from available alternatives, is one of the most critical tasks in query optimization. The enormous search space of a join order makes it difficult to find an optimal join order in an efficient manner. Although there are many optimization algorithms for join order selection, existing benchmarks are unsuitable for evaluating these join order selection strategies because they cannot configure the depths of the joins or cover all join styles. To effectively evaluate the quality of join order selection algorithms used in an optimizer, a generic evaluation tool for join order selection is implemented in this study. The tool takes the primary key-based deterministic data generation method for portable application scenario migration, a join order sampling algorithm to reduce the investigated join spaces, and a result-guided parameter instantiation algorithm to support a valid query generation. We applied the tool on OceanBase and PostgreSQL, and the experiment results show its effectiveness in evaluating the performance of join order selection in query optimizers in a generic and efficient manner.

Figures and Tables | References | Related Articles | Metrics

Research on contract architecture and data privacy for education-oriented blockchain applications

Chaoran HUANG, Xing TONG, Zhao ZHANG, Cheqing JIN, Yingjie YANG, Gang QIN

2022, 2022 (5): 61-72. doi: 10.3969/j.issn.1000-5641.2022.05.006

Abstract ( 334 )

HTML ( 12 )

PDF (1378KB) ( 154 )

Save

As the internet drives toward “digital transformation”, education equity and data trust-worthiness pose significant challenges in development. Blockchain, as a distributed ledger technology with tamperproof data, is jointly maintained by multiple parties and can solve equity and trustworthiness issues in scenarios such as educational resource allocation, intellectual property rights, and student information authentication. Although blockchain is capable of addressing the core education problems, its data immutability and transparency properties limit the upgradation process of smart contracts and disclosure of sensitive data in blockchain applications. Hence, updating educational applications and creating low privacy security of educational data becomes strenuous. To address the problem of limited smart contract upgrades, this study proposes an efficient and fully decoupled blockchain smart contract architecture. The as-proposed architecture aids in decoupling the contracts into proxy logical contracts, proxy data contracts, logical contracts, and data contracts, achieving an average reduction of 28.2% in upgradation costs compared with traditional methods. Moreover, we combined on- and off-chain collaboration to optimize transactions under the decoupled contract architecture and reduce data migration while updating contracts by integrating the underlying blockchain storage tree, optimized to reduce latency by half. To solve the problem of privacy protection, we propose a privacy data protection scheme based on permission management and LDP (Local Differential Privacy) to improve data privacy security while reducing the negative impact on blockchain performance. Finally, these solutions were integrated and implemented into an educational platform comprising a trusted knowledge exchange community and student growth system.

Figures and Tables | References | Related Articles | Metrics

Dynamic simulation for cloud database runtime environment

Shuhong YOU, Qian SU, Rong ZHANG

2022, 2022 (5): 73-89. doi: 10.3969/j.issn.1000-5641.2022.05.007

Abstract ( 258 )

HTML ( 7 )

PDF (1581KB) ( 141 )

Save

This study proposes a comprehensive and general database environment simulation tool that can achieve the accurate, efficient and dynamic simulation of a normal runtime environment and broken resource situations from multiple dimensions. This tool can help users customize required test scenarios, reduce the difficulty of database benchmarking, improve testing efficiency, and achieve a better referential ability of benchmark results. Experiments under customized runtime environment demonstrate the superiority of this tool.

Figures and Tables | References | Related Articles | Metrics

Data-driven open source software supply chain maintenance risk analysis method

Qing SUN, Guanyu LIANG, Yanjun WU, Bin WU, Chunqi TIAN, Wei WANG

2022, 2022 (5): 90-99. doi: 10.3969/j.issn.1000-5641.2022.05.008

Abstract ( 276 )

HTML ( 10 )

PDF (779KB) ( 170 )

Save

The use of the software supply chain has been continuously interspersed throughout the software system development process. In recent years, security incidents related to the software supply chain have frequently occurred, and its security has become a global issue. Software maintainability, as one of the important attributes of software quality, reflects the difficulty of software maintenance activities. Although the trend of an open source software supply chain has gradually become popular in recent years, research into its maintainability remains extremely limited. Based on the above considerations and combined with a traditional research approach to software maintainability risk, this paper explores a unique analysis perspective regarding the maintainability risk of open source software and proposes a quality model of open source supply chain software maintainability. The model measures nine software attributes, including team health, activity, dependency influence, test integrity, external dependency, and understandability, based on 16 metrics for reflecting the maintainability of the open source software supply chain. At the same time, based on the GitHub hosting platform and npm sub-ecological data (this includes software information, dependencies, behavioral data generated during the development of each software, and so on), the maintainability indicators of different projects at the same time and within different time periods for the same project are compared and calculated, confirming the rationality of the proposed method. Using the model proposed herein, the quality maintainability of the open source software supply chain can be effectively evaluated, thereby guiding software design and reconstruction and the development of a higher quality software system.

Figures and Tables | References | Related Articles | Metrics

Research on user behavior portrait and subject mining in the express logistics field during Coronavirus epidemic

Jiling LI, Baolin LI, Songru YAN

2022, 2022 (5): 100-114. doi: 10.3969/j.issn.1000-5641.2022.05.009

Abstract ( 545 )

HTML ( 19 )

PDF (1321KB) ( 199 )

Save

Based on logistics-field blog post data from Weibo from November 2019 to May 2022, the user behaviors of express logistics services in the context of the Coronavirus epidemic are profiled. Using grounded theory and abstract clustering methods, five user behaviors and 22 subject contents are abstracted, and the corresponding user profile is generated. This paper further discusses the subject contents, the subject evolution, and the analysis of group differences. The results show that user satisfaction with logistics services was similar, and the dissatisfaction was diversified with obvious escalation. Variables of transportation efficiency and logistics guarantee were the main factors affecting the evaluation, and the development of the epidemic affected the concerns and attitudes of the subject contents, which had obvious group differences at different degrees.

Figures and Tables | References | Related Articles | Metrics

Correlation operation based on intermediate layers for knowledge method

Haojie WU, Yanjie WANG, Wenbing CAI, Fei WANG, Yang LIU, Peng PU, Shaohui LIN

2022, 2022 (5): 115-125. doi: 10.3969/j.issn.1000-5641.2022.05.010

Abstract ( 266 )

HTML ( 10 )

PDF (1351KB) ( 97 )

Save

Convolutional neural networks have made remarkable achievements in artificial intelligence, such as blockchain, speech recognition, and image understanding. However, improvement in model performance is accompanied by a substantial increase in the computational and parameter overhead, leading to a series of problems, such as a slow inference speed, large memory consumption, and difficulty of deployment on mobile devices. Knowledge distillation serves as a typical model compression method, and can transfer knowledge from the teacher network to the student network to improve the latter’s performance without any increase in the number of parameters. A method for extracting representative knowledge for distillation has become the core issue in this field. In this paper, we present a new knowledge distillation method based on intermediate correlation operation, which with the help of data augmentation captures the learning and transformation process of image features during each middle layer stage of the network. We model this feature transform procedure using a correlation operation to extract a new representation from the teacher network to guide the training of the student network. The experimental results demonstrate that our method achieves the best performance on both the CIFAR-10 and CIFAR-100 datasets, in comparison to previous state-of-the-art methods.

Figures and Tables | References | Related Articles | Metrics

Text matching based on multi-dimensional feature representation

Ming WANG, Te LI, Dingjiang HUANG

2022, 2022 (5): 126-135. doi: 10.3969/j.issn.1000-5641.2022.05.011

Abstract ( 256 )

HTML ( 14 )

PDF (750KB) ( 225 )

Save

Text semantic matching is the basis of many natural language processing tasks. Text semantic matching techniques are required in many scenarios, such as search, question, and answer systems. In practical application scenarios, the efficiency of text semantic matching is crucial. Although the representational learning semantic-matching model is less accurate than the interactive model, it is more efficient. The key to improve the performance of learning-based semantic-matching models is to extract sentence vectors with high-level semantic features. On this basis, this paper presents the design of a feature-fusion module and feature-extraction module based on the ERINE model to obtain sentence vectors with multidimensional semantic features. Further, the performance of the model is improved to obtain semantic information by designing a loss function of semantic prediction. Finally, the accuracy on the Baidu Qianyan dataset reaches 0.851, which indicates good performance.

Figures and Tables | References | Related Articles | Metrics

Survey of few-shot instance segmentation methods

Xueming ZHOU, Dingjiang HUANG

2022, 2022 (5): 136-146. doi: 10.3969/j.issn.1000-5641.2022.05.012

Abstract ( 827 )

HTML ( 24 )

PDF (968KB) ( 365 )

Save

Instance segmentation is an important task in computer vision. In recent years, the development of meta- and few-shot learning has promoted the combination of computer vision learning tasks, which has overcome the bottleneck of detection and classification with regard to objects that are difficult to manually label and those with high labeling costs. Although great progress has been made with few-shot semantic segmentation and object detection, instance segmentation based on few-shot learning has not become a research hotspot until very recently. Beginning with an overview of few-shot instance segmentation, existing approaches are divided into categories of anchor-based and anchor-free algorithms. The architectures and primary technologies behind those approaches are respectively discussed, and common datasets and evaluation indices are described. Additionally, advantages and disadvantages of algorithm performance are analyzed, and future development directions and challenges are presented.

Figures and Tables | References | Related Articles | Metrics

Capacitated route planning for supermarket distribution based on order splitting

Xiao PAN, Dongna LU, Shuhai WANG

2022, 2022 (5): 147-164. doi: 10.3969/j.issn.1000-5641.2022.05.013

Abstract ( 310 )

HTML ( 12 )

PDF (1605KB) ( 206 )

Save

Vehicle stowage and route planning are common problems for various delivery methods related to supermarket distribution. In order to resolve these problems, we propose capacitated route planning for supermarket order distribution based on order splitting. We construct the problem model with the goal of minimizing the total cost of delivery. Combined with real cases, an improved gray wolf optimization algorithm adding a genetic mutation operation is proposed. The effectiveness of the model and algorithm is verified by comparing performance with the genetic algorithm. The results show that when the total demand of supermarkets is close to an integer multiple of the vehicle capacity, our proposed planning approach is better. This is mainly reflected in the fact that the order splitting plan can make full use of vehicle capacity, reduce the empty driving rate for vehicles, and reduce the total distribution cost.

Figures and Tables | References | Related Articles | Metrics

Truck capacity prediction based on self-attention mechanism in the bulk logistic industry

Xiaobian MIAO, Jiajun LIAO, Huajie MEI, Chong FENG, Jiali MAO

2022, 2022 (5): 165-183. doi: 10.3969/j.issn.1000-5641.2022.05.014

Abstract ( 235 )

HTML ( 9 )

PDF (4652KB) ( 153 )

Save

Capacity prediction plays an important role in smart logistics, and its results are important for improving the accuracy of capacity scheduling and truck-cargo matching. Existing researches on capacity prediction in urban road networks aim to determine the number of available vehicles in future periods, while the problem of capacity prediction in bulk logistics aims at predicting the information on the trucks (e.g. the truck’s identity document (ID)) to carry certain types of goods for different flows, which is closely related to whether the trucks can return to the steel plant within the expected time (called capacity accessibility). In the case of bulk logistics, it is necessary to take into account the impact of the time spent on the two trips from the steel plant to the customer’s business and back to the steel plant. Since trucks need to stop several times in the long-distance transportation process but the length of stopping time varies, the uncertainty of stopping time makes the accurate prediction of transportation delivery time difficult. In addition, the freight platform only assigns capacity to one-way transport tasks (i.e. from the steel plant to the customer’s business), and the return trip (i.e. back to the steel plant) is determined by the truck drivers, which leads to the lack of return trajectory and poses a challenge to predict the return time of trucks to the steel plant. In order to solve the above challenges, based on the data sets of waybills, trucks, trajectories, and transport endpoints of logistics enterprises, we extract the stay behavior features, transport endpoint features, and environmental features. Then, the self-attention mechanism is introduced to obtain the weights of different features on the time consumption of two trips respectively to further improve the accuracy of capacity accessibility prediction. On this basis, a truck capacity prediction method based on self-attention mechanism is proposed, including capacity candidate set generation based on historical flow similarity, capacity accessibility prediction based on self-attention mechanism, and capacity carrier flow prediction based on long short-term memory (LSTM). Finally, the experimental results of comparison experiments on real logistics datasets show that the proposed method has higher prediction accuracy and can provide powerful decision support for the optimization of capacity scheduling in bulk logistics.

Figures and Tables | References | Related Articles | Metrics

Two-level trajectory data partition algorithm based on query cost

Mengnan LIU, Jianqiu XU

2022, 2022 (5): 184-194. doi: 10.3969/j.issn.1000-5641.2022.05.015

Abstract ( 221 )

HTML ( 6 )

PDF (1102KB) ( 97 )

Save

Trajectory data are typically large in scale and require frequent updates; hence, there are high performance requirements for trajectory queries. In order to improve the query efficiency of trajectory data, a two-level trajectory partition algorithm is presented herein. In the first partition, trajectory data was divided into sub-trajectories based on optimized minimum bounding rectangle (MBR) to improve the approximate effect of trajectory data. In the second partition, the sub-trajectories were grouped by the grid structure according to spatio-temporal characteristics. A packing method of R-tree was proposed based on the partition algorithm, and the divided trajectory data was packed into the R-tree from bottom to top. Finally, compared with a method based on the average number or average size of trajectory segments, experimental results show that the proposed method offers better query performance than the two other methods based on the average number of trajectory segments and combined movement features; in fact, the query efficiency is improved by 43% and 30.5%, respectively.

Figures and Tables | References | Related Articles | Metrics

Indexing and query technology for steel logistic data

Tao ZOU, Rongtao QIAN, Jiali MAO

2022, 2022 (5): 195-207. doi: 10.3969/j.issn.1000-5641.2022.05.016

Abstract ( 208 )

HTML ( 7 )

PDF (1940KB) ( 131 )

Save

With digital transformation and the development of iron and steel logistics, the scale of iron and steel logistic data has rapidly expanded, and traditional relational databases can no longer meet the storage and query needs. Considering that a distributed not only structured query language (NoSQL) database has a simple expansion capability, fast reading and writing speeds, and low cost, in this study, distributed cloud storage and NoSQL technologies are used to store and build indexes for massive steel logistic data, improving the accuracy of the storage capacity and query performance of the logistic data. First, Spark is used to associate and fuse the data from different sources, and then store and manage the historical and real-time data generated by the freight platform in a hierarchical manner. It then builds spatiotemporal and attribute indexes for the three types of queries mainly involved in steel transportation to achieve an efficient query of multi-source logistic data. Finally, the experimental results based on real steel logistic data show that the proposed scheme is superior to traditional relational database methods in terms of data writing, storage, and querying, and can effectively support the storage and querying of massive logistic data.

Figures and Tables | References | Related Articles | Metrics

Tree structure grid minimal cost repair problem and its corresponding algorithm based on fault prediction

Yang CAI, Danhong TANG, Jiajun CHEN, Zhixin XU, Limeng YANG, Ming WANG, Xueming ZHOU, Dingjiang HUANG

2022, 2022 (5): 208-218. doi: 10.3969/j.issn.1000-5641.2022.05.017

Abstract ( 183 )

HTML ( 5 )

PDF (1412KB) ( 69 )

Save

A long short-term memory (LSTM)-based network employing a fault prediction algorithm and tree-structured network employing a minimal cost repair generation algorithm are proposed in this study to predict possible anomalies using a large amount of historical data for the effective identification of fault treatments. In addition, the minimal cost repair operation sequence was generated based on dynamic programming; the sequence of valid operation orders could be quickly generated. The results of this study indicate that the proposed networks could effectively reduce the dispatch error rate, improve the dispatch efficiency, and reduce the failure time of power grid systems, and therefore can be used to reduce the economic loss caused by the aforementioned factors.

Figures and Tables | References | Related Articles | Metrics

Prediction of remaining useful life of aeroengines based on the Transformer with multi-feature fusion

Yilin MA, Huiling TAO, Qiwen DONG, Ye WANG

2022, 2022 (5): 219-232. doi: 10.3969/j.issn.1000-5641.2022.05.018

Abstract ( 512 )

HTML ( 26 )

PDF (1754KB) ( 470 )

Save

As the core components of aircraft, engines play a vital role during flight. Accurate prediction of the remaining useful life of the aeroengine can help prognostics and health management, thus preventing major accidents and saving maintenance costs. In view of the lack of consideration of different time steps and the relationship between different sensors and operating conditions in existing methods, a remaining useful life prediction method based on the Transformer was proposed, which fuses multi-feature outputs from different encoder layers. This method selects two input data with different time steps, analyzes the relationship between the sensors using permutation entropy, and extracts features independently from the operating condition data. The experimental results on the public aeroengine dataset CMAPSS (Commercial Modular Aero-Propulsion System Simulation) show that the proposed method is superior to other advanced remaining useful life prediction methods.

Figures and Tables | References | Related Articles | Metrics

Table of Content