Select

Rule extraction and reasoning for fusing relation and structure encoding

Jimi HU, Weibing WAN, Feng CHENG, Yuming ZHAO

J* E* C* N* U* N* S* 2025, 2025 (1): 97-110. DOI: 10.3969/j.issn.1000-5641.2025.01.008

Abstract （443）

HTML （13）

PDF（pc）（2201KB）（378）

Save

The domain knowledge graph exhibits characteristics of incompleteness and semantic complexity, which lead to shortcomings in the extraction and selection of rules, thereby affecting its inferential capabilities. A rule extraction model that integrates relationship and structural encoding is proposed to address this issue. A multidimensional embedding approach is achieved by extracting relational and structural information from the target subgraph and conducting feature encoding. A self-attention mechanism is designed to integrate relational and structural information, enabling the model to capture dependency relationships and local structural information in the input sequence better. This enhancement improves the understanding and expressive capabilities of context of the model, thus addressing the challenges of rule extraction and selection in the complex semantic situations. The experimental results for actual industrial datasets of automotive component failures and public datasets demonstrate improvements in the proposed model for link prediction and rule quality evaluation tasks. When the rule length is 3, an average increase of 7.1 percentage points in the mean reciprocal rank (MRR) and an average increase of 8.6 percentage points in Hits@10 are observed. For a rule length of 2, an average increase of 7.4 percentage points in MRR and an average increase of 3.9 percentage points in Hits@10 are observed. This confirms the effectiveness of relational and structural information in rule extraction and inference.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Research on software classification based on the fusion of code and descriptive text

Yuhang CHEN, Shizhou WANG, Zhengting TANG, Liangyu CHEN, Ningkang JIANG

J* E* C* N* U* N* S* 2025, 2025 (1): 46-58. DOI: 10.3969/j.issn.1000-5641.2025.01.004

Abstract （503）

HTML （20）

PDF（pc）（2128KB）（509）

Save

Third-party software systems play a significant role in modern software development. Software developers build software based on requirements by retrieving appropriate dependency libraries from third-party software repositories, effectively avoiding repetitive wheel-building operations and thus speeding up the development process. However, retrieving third-party dependency libraries can be challenging. Typically, third-party software repositories provide preset tags (categories) for software developers to search. However, when a software’s preset tags are incorrectly labeled, software developers are unable to find the libraries required, and this inevitably affects the development process. This study proposes a software clustering model to address the aforementioned challenges. The model combines method vectors, method importance, and text vectors to categorize unknown categories of software into known categories. In addition, because no publicly available dataset exists for this problem, we built a dataset and made it publicly available. This clustering model was tested on a self-built dataset comprising 30 categories and software systems from the Maven repository. The accuracy of the prediction category was 70% for one candidate (top-1) and 90% for three candidates (top-3). The experimental results show that our model can help software developers find suitable software, can be useful for classifying software systems in open-source repositories, and can assist software developers in quickly locating third-party libraries.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Knowledge-distillation-based lightweight crop-disease-recognition algorithm

Wenjing HU, Longquan JIANG, Junlong YU, Yiqian XU, Qipeng LIU, Lei LIANG, Jiahao LI

J* E* C* N* U* N* S* 2025, 2025 (1): 59-71. DOI: 10.3969/j.issn.1000-5641.2025.01.005

Abstract （543）

HTML （19）

PDF（pc）（3454KB）（403）

Save

Crop diseases are one of the main factors threatening crop growth. In this regard, machine-learning algorithms can efficiently detect large-scale crop diseases and are beneficial for timely processing and improving crop yield and quality. In large-scale agricultural scenarios, owing to limitations in power supply and other conditions, the power-supply requirements of high-computing-power devices such as servers cannot be fulfilled. Most existing deep-network models require high computing power and cannot be deployed easily on low-power embedded devices, thus hindering the accurate identification and application of large-scale crop diseases. Hence, this paper proposes a lightweight crop-disease-recognition algorithm based on knowledge distillation. A student model based on a residual structure and the attention mechanism is designed and knowledge distillation is applied to complete transfer learning from the ConvNeXt model, thus achieving the lightweight model while maintaining high-precision recognition. The experimental results show that the accuracy of image classification for 39 types of crop diseases is 98.72% under a model size of 2.28 MB, which satisfies the requirement for deployment in embedded devices and indicates a practical and efficient solution for crop-disease recognition.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Surface-height- and uncertainty-based depth estimation for Mono3D

Yinshuai JI, Jinhua XU

J* E* C* N* U* N* S* 2025, 2025 (1): 72-81. DOI: 10.3969/j.issn.1000-5641.2025.01.006

Abstract （448）

HTML （12）

PDF（pc）（1215KB）（247）

Save

Monocular three-dimensional (3D) object detection is a fundamental but challenging task in autonomous driving and robotic navigation. Directly predicting object depth from a single image is essentially an ill-posed problem. Geometry projection is a powerful depth estimation method that infers an object’s depth from its physical and projected heights in the image plane. However, height estimation errors are amplified by the depth error. In this study, the physical and projected heights of object surface points (rather than the height of the object itself) were estimated to obtain several depth candidates. In addition, the uncertainties in the heights were estimated and the final object depth was obtained by assembling the depth predictions according to the uncertainties. Experiments demonstrated the effectiveness of the depth estimation method, which achieved state-of-the-art (SOTA) results on a monocular 3D object detection task of the KITTI dataset.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Time series uncertainty forecasting based on graph augmentation and attention mechanism

Chaojie MEN, Jing ZHAO, Nan ZHANG

J* E* C* N* U* N* S* 2025, 2025 (1): 82-96. DOI: 10.3969/j.issn.1000-5641.2025.01.007

Abstract （539）

HTML （12）

PDF（pc）（1026KB）（1183）

Save

To improve the ability to predict future events and effectively address uncertainty, we propose a network architecture based on graph augmentation and attention mechanisms for uncertainty forecasting in multivariate time series. By introducing an implicit graph structure and integrating graph neural network techniques, we capture the mutual dependencies among sequences to model the interactions between time series. We utilize attention mechanisms to capture temporal patterns within the same sequence for modeling the dynamic evolution patterns of time series. We utilize the Monte Carlo dropout method to approximate model parameters and model the predicted sequences as a stochastic distribution, thus achieving accurate uncertainty forecasting in time series. The experimental results indicate that this approach maintains a high level of prediction precision while providing reliable uncertainty estimation, thus providing confidence for use in decision-making tasks.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Knowledge graph completion by integrating textual information and graph structure information

Houlong FAN, Ailian FANG, Xin LIN

J* E* C* N* U* N* S* 2025, 2025 (1): 111-123. DOI: 10.3969/j.issn.1000-5641.2025.01.009

Abstract （483）

HTML （14）

PDF（pc）（1436KB）（122）

Save

Based upon path query information, we propose a graph attention model that effectively integrates textual and graph structure information in knowledge graphs, thereby enhancing knowledge graph completion. For textual information, a dual-encoder based on pre-trained language models is utilized to separately obtain embedding representations of entities and path query information. Additionally, an attention mechanism is employed to aggregate path query information, which is used to capture graph structural information and update entity embeddings. The model was trained using contrastive learning and experiments were conducted on multiple knowledge graph datasets, with good results achieved in both transductive and inductive settings. These results demonstrate the advantage of combining pre-trained language models with graph neural networks to effectively capture both textual and graph structural information, thereby enhancing knowledge graph completion.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Label-perception augmented causal analysis of mental health over social media

Yiping LIANG, Luwei XIAO, Linlin WANG

J* E* C* N* U* N* S* 2025, 2025 (1): 124-137. DOI: 10.3969/j.issn.1000-5641.2025.01.010

Abstract （425）

HTML （6）

PDF（pc）（1605KB）（132）

Save

Online social media are frequently used by people as a way of expressing their thoughts and feelings. Among the vast amounts of online posts, there may be more concerning ones expressing potential grievances and mental illnesses. Identifying these along with potential causes of mental health problems is an important task. Observing these posts, it is found that there is a label co-occurrence phenomenon in contexts, i.e., the semantics of multiple candidate labels appear in the context of one sample, which interferes with the modeling and prediction of label patterns. To mitigate the impact of this phenomenon, we propose a label-aware data augmentation method, which leverages large-scale pre-trained language models with excellent text comprehension capability to identify potential candidate labels, abates the noise from irrelevant co-occurring labels by estimating sample-independent label semantic strengths, and constructs well-performing classifiers with pre-trained language models. Extensive experiments validate the effectiveness of our model on the recent datasets Intent_SDCNL and SAD.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Purging diffusion models through CLIP based fine-tuning

Ping WU, Xin LIN

J* E* C* N* U* N* S* 2025, 2025 (1): 138-150. DOI: 10.3969/j.issn.1000-5641.2025.01.011

Abstract （500）

HTML （7）

PDF（pc）（1531KB）（71）

Save

Diffusion models have revolutionized text-to-image synthesis, enabling users to generate high-quality and imaginative artworks from simple natural-language text prompts. Unfortunately, due to the large and unfiltered training dataset, inappropriate content such as nudity and violence can be generated from them. To deploy such models at a higher level of safety, we propose a novel method, directional contrastive language-image pre-training (CLIP) loss-based fine-tuning, dubbed as CLIF. This method utilizes directional CLIP loss to suppress the model’s inappropriate generation ability. CLIF is lightweight and immune to circumvention. To demonstrate the effectiveness of CLIF, we proposed a benchmark called categorized toxic prompts (CTP) to evaluate the ability to generate inappropriate content for text-to-image diffusion models. As shown by our experiments on CTP and common objects in context (COCO) datasets, CLIF is capable of significantly suppressing inappropriate generation while preserving the model’s ability to produce general content.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Infrared small target detection algorithm deployed on HiSilicon Hi3531

Xiaoxue FU, Chang HUANG

J* E* C* N* U* N* S* 2025, 2025 (1): 151-164. DOI: 10.3969/j.issn.1000-5641.2025.01.012

Abstract （400）

HTML （7）

PDF（pc）（1719KB）（57）

Save

In response to the existing shortcomings of large computational complexity, poor real-time performance, and deployment difficulties in current algorithms, and to meet the high requirements of real-time performance and accuracy for infrared detection systems, proposes a lightweight algorithm deployed on domestically produced embedded chips, termed YOLOv5-TinyHisi. The YOLOv5-TinyHisi algorithm undertakes lightweight modifications to the backbone network structure based on the characteristics of infrared small targets. Additionally, it utilizes SIoU optimized loss function for boundary error, thereby enhancing the accuracy of infrared small target localization. The YOLOv5-TinyHisi algorithm model is deployed on Hi3531DV200, utilizing the chip-integrated neural network inference engine (NNIE) to accelerate network inference. Experimental results on public datasets demonstrate that the algorithm achieves a 1.52% improvement in average precision (mAP) compared to YOLOv5, while significantly reducing parameter count and model size. On the Hi3531DV200, the inference speed for a single image with a resolution of (1280 × 512) pixels reaches 35 frames per second (FPS), with a recall rate of 95%, meeting the real-time and accuracy requirements of the infrared detection system.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Machine-learning-based model checker performance prediction

Chengyu ZHANG, Jiayi ZHU, Yihao HUANG, Di YANG, Jianwen LI, Weikai MIAO, Di YAN, Bin GU, Naijun ZHAN, Geguang PU

Journal of East China Normal University(Natural Science) 2024, 2024 (4): 18-29. DOI: 10.3969/j.issn.1000-5641.2024.04.002

Abstract （420）

HTML （15）

PDF（pc）（2393KB）（152）

Save

An and-inverter graph (AIG) is a representation of electrical circuits typically passed as input into a model checker. In this paper, we propose an AIG structural encoding that we use to extract the features of AIGs and construct a portfolio-based model checker called Liquid. The underlying concept of the proposed structural encoding is the enumeration of all possible AIG substructures, with the frequency of each substructure encoded as a feature vector for use in subsequent machine-learning processes. Because the performance of model-checking algorithms varies across different AIGs, Liquid combines multiple such algorithms and selects the algorithm appropriate for a given AIG via machine learning. In our experiments, Liquid outperformed all state-of-the-art model checkers in the portfolio, achieving a high prediction accuracy. We further studied the effectiveness of Liquid from several perspectives.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

A formal verification method for embedded operating systems

Yang WANG, Jingcheng FANG, Xiong CAI, Zhipeng ZHANG, Yong CAI, Weikai MIAO

Journal of East China Normal University(Natural Science) 2024, 2024 (4): 1-17. DOI: 10.3969/j.issn.1000-5641.2024.04.001

Abstract （501）

HTML （19）

PDF（pc）（1364KB）（406）

Save

The operating system is the core and foundation of the entire computer system. Its reliability and safety are vital because faults or vulnerabilities in the operating system can lead to system crashes, data loss, privacy breaches, and security attacks. In safety-critical systems, any errors in the operating system can result in significant loss of life and property. Ensuring the safety and reliability of the operating system has always been a major challenge in industry and academia. Currently, methods for verifying the operating system’s safety include software testing, static analysis, and formal methods. Formal methods are the most promising in ensuring the operating system’s safety and trustworthiness. Mathematical models can be established using formal methods, and the system can be formally analyzed and verified to discover potential errors and vulnerabilities. In the operating system, formal methods can be used to verify the correctness and completeness of the operating system’s functions and system safety. A formal scheme for embedded operating systems is proposed herein on the basis of existing formal verification achievements for operating systems. This scheme uses VCC (verified C compiler), CBMC (C bounded model checker), and PAT (process analysis toolkit) tools to verify the operating system at the unit, module, and system levels, respectively. The schema, upon being successfully applied to a task scheduling architecture case of a certain operating system, exhibits a certain universality for analyzing and verifying embedded operating systems.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Research and design of data synchronization schemes of postgraduate information systems based on microservice

Huiling TAO, Yilin MA, Ye WANG, Qiwen DONG

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 42-52. DOI: 10.3969/j.issn.1000-5641.2024.02.006

Abstract （406）

HTML （13）

PDF（pc）（1018KB）（231）

Save

With the popularization of university information system applications and the increase in their usage frequency, teachers and students have higher requirements for data consistency, accuracy, timeliness, and completeness. The original data synchronization scheme using extensible markup language (XML) for data synchronization has the disadvantages of low synchronization efficiency and difficulty of expansion. The open-source tool, DataX, can complete data synchronization between various heterogeneous databases without damaging the source database. This study used DataX to improve the original data synchronization scheme and proposed different data synchronization schemes for various business requirements and application scenarios in the foundation of university postgraduate information system construction. At the same time, in view of the shortcomings of DataX in which only one read can do one write during the start-up and execution, the method where one read can do multiple writes was designed. The comparison experiment shows that the optimized scheme can improve data synchronization efficiency, has better scalability, and can meet the data synchronization requirements of universities.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Collaborative stranger review-based recommendation

Luping FENG, Liye SHI, Wen WU, Jun ZHENG, Wenxin HU, Wei ZHENG

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 53-64. DOI: 10.3969/j.issn.1000-5641.2024.02.007

Abstract （217）

HTML （14）

PDF（pc）（6769KB）（244）

Save

Review-based recommendations are mainly based on the exploitation of textual information that reflects the characteristics of items and user preferences. However, most existing approaches overlook the influence of information from hidden strangers on the selection of reviews for the target user. However, information from strangers can more accurately measure the relative feelings of the user and provide a complement to the target user’s expression, leading to more refined user modeling. Recently, several studies have attempted to incorporate similar information from strangers but ignore the use of information regarding other strangers. In this study, we proposed a stranger collaborative review-based recommendation model to make effective use of information from strangers by improving accurate modeling and enriching user modeling. Specifically, for capturing potential user preferences elaborately, we first designed a collaborative stranger attention module considering the textual similarities and preference interactions between the target user and the hidden strangers implied by the reviews. We then developed a collaborative gating module to dynamically integrate information from strangers at the preference level based on the characteristics of the target user-item pair, effectively filtering preferences of strangers and enriching target user modeling. Finally, we applied a latent factor model to accomplish the recommendation task. Experimental results have demonstrated the superiority of our model compared to state-of-the-art methods on real-world datasets from various sources.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Dual-path network with multilevel interaction for one-stage visual grounding

Yue WANG, Jiabo YE, Xin LIN

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 65-75. DOI: 10.3969/j.issn.1000-5641.2024.02.008

Abstract （199）

HTML （7）

PDF（pc）（1515KB）（244）

Save

This study explores the multimodal understanding and reasoning for one-stage visual grounding. Existing one-stage methods extract visual feature maps and textual features separately, and then, multimodal reasoning is performed to predict the bounding box of the referred object. These methods suffer from the following two weaknesses: Firstly, the pre-trained visual feature extractors introduce text-unrelated visual signals into the visual features that hinder multimodal interaction. Secondly, the reasoning process followed in these two methods lacks visual guidance for language modeling. It is clear from these shortcomings that the reasoning ability of existing one-stage methods is limited. We propose a low-level interaction to extract text-related visual feature maps, and a high-level interaction to incorporate visual features in guiding the language modeling and further performing multistep reasoning on visual features. Based on the proposed interactions, we present a novel network architecture called the dual-path multilevel interaction network (DPMIN). Furthermore, experiments on five commonly used visual grounding datasets are conducted. The results demonstrate the superior performance of the proposed method and its real-time applicability.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Parallel block-based stochastic computing with adapted quantization

Yongzhuo ZHANG, Qingfeng ZHUGE, Edwin Hsing-Mean SHA, Yuhong SONG

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 76-85. DOI: 10.3969/j.issn.1000-5641.2024.02.009

Abstract （227）

HTML （11）

PDF（pc）（1107KB）（146）

Save

The demands of deep neural network models for computation and storage make them unsuitable for deployment on embedded devices with limited area and power. To solve this issue, stochastic computing reduces the storage and computational complexity of neural networks by representing data as a stochastic sequence, followed by arithmetic operations such as addition and multiplication through basic logic operation units. However, short stochastic sequences may cause discretization errors when converting network weights from floating point numbers to the stochastic sequence, which can reduce the inference accuracy of stochastic computing network models. Longer stochastic sequences can improve the representation range of stochastic sequences and alleviate this problem, but they also result in longer computational latency and higher energy consumption. We propose a design for a differentiable quantization function based on the Fourier transform. The function improves the matching of the model to stochastic sequences during the network’s training process, reducing the discretization error during data conversion. This ensures the accuracy of stochastic computational neural networks with short stochastic sequences. Additionally, we present an adder designed to enhance the accuracy of the operation unit and parallelize computations by chunking inputs, thereby reducing latency. Experimental results demonstrate a 20% improvement in model inference accuracy compared to other methods, as well as a 50% reduction in computational latency.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Multi-view and multi-pose lock pin point cloud model reconstruction based on turntable

Xin LU, Chang HUANG, Zhiwei JIN

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 86-96. DOI: 10.3969/j.issn.1000-5641.2024.02.010

Abstract （209）

HTML （10）

PDF（pc）（3511KB）（128）

Save

The surface structure of a container lock pin is complex, making it difficult to establish a point cloud model with a high surface feature integrity. Therefore, a multi-view and multi-attitude point cloud model reconstruction algorithm based on a turntable was proposed to restore the complete surface features of the locking pin. Considering that sensors at a fixed height are paired with rotating turntables in most scenarios, the collected surface features are usually somewhat missing. Initially, the algorithm uses the parameter calibration results of the turntable to realize the multi-view three-dimensional point cloud stitching, and establishes a fixed attitude point cloud model. Then, through the proposed improved spherical projection algorithm, the positioning of the locking pin on the turntable is selected to establish a point cloud model under another posture. Finally, the point cloud model with multiple attitudes is integrated to improve its surface characteristics. Experimental results show that the proposed algorithm can build a lock-pin point cloud model with high surface feature integrity.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Infrared small-target detection method based on double-layer local energy factor

Lingxiao TANG, Chang HUANG

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 97-107. DOI: 10.3969/j.issn.1000-5641.2024.02.011

Abstract （295）

HTML （11）

PDF（pc）（1822KB）（153）

Save

Infrared small-target detection has always been an important technology in infrared tracking systems. The current infrared approaches for small-target detection in complex backgrounds are prone to generating false alarms and exhibit sluggish detection speeds from the perspective of the human visual system. Using the multiscale local contrast measure using a local energy factor (MLCM-LEF) method, an infrared small-target detection method based on a double-layer local energy factor is proposed. The target detection was performed from the perspectives of the local energy difference and local brightness difference. The double-layer local energy factor was used to describe the difference between the small target and the background from the energy perspective, and the weighted luminance difference factor was used to detect the target from the brightness angle. The infrared small target was extracted by a two-dimensional Gaussian fusion of the processing results of the two approaches. Finally, the image mean and standard deviation were used for adaptive threshold segmentation to extract the small infrared target. In experimental tests on public datasets, this method improved the performance in suppressing background compared with the MLCM-LEF algorithm, DLEF (double-layer local energy factor) reduced the detection of a single frame time by one-third.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Group contrastive learning for weakly-supervised 3D point cloud semantic segmentation

Zhihong ZHENG, Haichuan SONG

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 108-118. DOI: 10.3969/j.issn.1000-5641.2024.02.012

Abstract （543）

HTML （10）

PDF（pc）（1305KB）（302）

Save

Three-dimensional point cloud semantic segmentation is an essential task for 3D visual perception and has been widely used in autonomous driving, augmented reality, and robotics. However, most methods work under a fully-supervised setting, which heavily relies on fully annotated datasets. Many weakly-supervised methods have utilized the pseudo-labeling method to retrain the model and reduce the labeling time consumption. However, the previous methods have failed to address the conformation bias induced by false pseudo labels. In this study, we proposed a novel weakly-supervised 3D point cloud semantic segmentation method based on group contrastive learning, constructing contrast between positive and negative sample groups selected from pseudo labels. The pseudo labels will compete with each other within the group contrastive learning, reducing the gradient contribution of falsely predicted pseudo labels. Results on three large-scale datasets show that our method outperforms state-of-the-art weakly-supervised methods with minimal labeling annotations and even surpasses the performance of some classic fully-supervised methods.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Hidden layer Fourier convolution for non-stationary texture synthesis

Xinxin HE, Haichuan SONG

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 119-130. DOI: 10.3969/j.issn.1000-5641.2024.02.013

Abstract （232）

HTML （16）

PDF（pc）（5109KB）（220）

Save

The remarkable achievements of deep learning in computer vision have led to significant development in example-based texture synthesis. The texture synthesis model using neural networks mainly includes local components, such as convolution and up/down sampling, which is unsuitable for capturing irregular structural attributes in non-stationary textures. Inspired by the frequency and space domain duality, a non-stationary texture synthesis method based on hidden layer Fourier convolution is proposed in this study. The proposed method uses the generative adversarial network as the basic architecture, performs feature splitting along the channel in the hidden layer, and builds a local branch in the image domain and a global branch in the frequency domain to consider visual perception and structural information. Experimental results show that this method can handle structurally challenging non-stationary texture exemplars. Compared with state-of-the-art methods, the method yielded better results in the learning and expansion of large-scale structures.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

An image caption generation algorithm based on decoupling commonsense association

Jiawei LIU, Xin LIN

Journal of East China Normal University(Natural Science) 2024, 2024 (2): 131-142. DOI: 10.3969/j.issn.1000-5641.2024.02.014

Abstract （156）

HTML （7）

PDF（pc）（1781KB）（136）

Save

The image caption generation algorithm based on decoupling commonsense association aims to eliminate the interference of commonsense association between various types of entities on the model reasoning, and improve the fluency and accuracy of the generated description. Aiming at the relationship sentences in the current image description that conform to common sense but do not conform to the image content, the algorithm first uses a novel training method to improve the attention of the relationship detection model to the real relationship in the image and improve the accuracy of relationship reasoning. Then, a relation-aware entity interaction method was used to carry out targeted information interaction for entities with relationships, and the relationship information was strengthened. The experimental results show that the proposed algorithm can correct some commonsense false relationships, generate more accurate image captions, and obtain better experimental results on various evaluation indicators.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Content of Computer Science in our journal