Content of Computer Science in our journal

        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Rule extraction and reasoning for fusing relation and structure encoding
    Jimi HU, Weibing WAN, Feng CHENG, Yuming ZHAO
    J* E* C* N* U* N* S*    2025, 2025 (1): 97-110.   DOI: 10.3969/j.issn.1000-5641.2025.01.008
    Abstract7)   HTML1)    PDF(pc) (2201KB)(0)       Save

    The domain knowledge graph exhibits characteristics of incompleteness and semantic complexity, which lead to shortcomings in the extraction and selection of rules, thereby affecting its inferential capabilities. A rule extraction model that integrates relationship and structural encoding is proposed to address this issue. A multidimensional embedding approach is achieved by extracting relational and structural information from the target subgraph and conducting feature encoding. A self-attention mechanism is designed to integrate relational and structural information, enabling the model to capture dependency relationships and local structural information in the input sequence better. This enhancement improves the understanding and expressive capabilities of context of the model, thus addressing the challenges of rule extraction and selection in the complex semantic situations. The experimental results for actual industrial datasets of automotive component failures and public datasets demonstrate improvements in the proposed model for link prediction and rule quality evaluation tasks. When the rule length is 3, an average increase of 7.1 percentage points in the mean reciprocal rank (MRR) and an average increase of 8.6 percentage points in Hits@10 are observed. For a rule length of 2, an average increase of 7.4 percentage points in MRR and an average increase of 3.9 percentage points in Hits@10 are observed. This confirms the effectiveness of relational and structural information in rule extraction and inference.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Research on software classification based on the fusion of code and descriptive text
    Yuhang CHEN, Shizhou WANG, Zhengting TANG, Liangyu CHEN, Ningkang JIANG
    J* E* C* N* U* N* S*    2025, 2025 (1): 46-58.   DOI: 10.3969/j.issn.1000-5641.2025.01.004
    Abstract4)   HTML1)    PDF(pc) (2128KB)(0)       Save

    Third-party software systems play a significant role in modern software development. Software developers build software based on requirements by retrieving appropriate dependency libraries from third-party software repositories, effectively avoiding repetitive wheel-building operations and thus speeding up the development process. However, retrieving third-party dependency libraries can be challenging. Typically, third-party software repositories provide preset tags (categories) for software developers to search. However, when a software’s preset tags are incorrectly labeled, software developers are unable to find the libraries required, and this inevitably affects the development process. This study proposes a software clustering model to address the aforementioned challenges. The model combines method vectors, method importance, and text vectors to categorize unknown categories of software into known categories. In addition, because no publicly available dataset exists for this problem, we built a dataset and made it publicly available. This clustering model was tested on a self-built dataset comprising 30 categories and software systems from the Maven repository. The accuracy of the prediction category was 70% for one candidate (top-1) and 90% for three candidates (top-3). The experimental results show that our model can help software developers find suitable software, can be useful for classifying software systems in open-source repositories, and can assist software developers in quickly locating third-party libraries.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Knowledge-distillation-based lightweight crop-disease-recognition algorithm
    Wenjing HU, Longquan JIANG, Junlong YU, Yiqian XU, Qipeng LIU, Lei LIANG, Jiahao LI
    J* E* C* N* U* N* S*    2025, 2025 (1): 59-71.   DOI: 10.3969/j.issn.1000-5641.2025.01.005
    Abstract11)   HTML1)    PDF(pc) (3454KB)(0)       Save

    Crop diseases are one of the main factors threatening crop growth. In this regard, machine-learning algorithms can efficiently detect large-scale crop diseases and are beneficial for timely processing and improving crop yield and quality. In large-scale agricultural scenarios, owing to limitations in power supply and other conditions, the power-supply requirements of high-computing-power devices such as servers cannot be fulfilled. Most existing deep-network models require high computing power and cannot be deployed easily on low-power embedded devices, thus hindering the accurate identification and application of large-scale crop diseases. Hence, this paper proposes a lightweight crop-disease-recognition algorithm based on knowledge distillation. A student model based on a residual structure and the attention mechanism is designed and knowledge distillation is applied to complete transfer learning from the ConvNeXt model, thus achieving the lightweight model while maintaining high-precision recognition. The experimental results show that the accuracy of image classification for 39 types of crop diseases is 98.72% under a model size of 2.28 MB, which satisfies the requirement for deployment in embedded devices and indicates a practical and efficient solution for crop-disease recognition.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Surface-height- and uncertainty-based depth estimation for Mono3D
    Yinshuai JI, Jinhua XU
    J* E* C* N* U* N* S*    2025, 2025 (1): 72-81.   DOI: 10.3969/j.issn.1000-5641.2025.01.006
    Abstract9)   HTML1)    PDF(pc) (1215KB)(0)       Save

    Monocular three-dimensional (3D) object detection is a fundamental but challenging task in autonomous driving and robotic navigation. Directly predicting object depth from a single image is essentially an ill-posed problem. Geometry projection is a powerful depth estimation method that infers an object’s depth from its physical and projected heights in the image plane. However, height estimation errors are amplified by the depth error. In this study, the physical and projected heights of object surface points (rather than the height of the object itself) were estimated to obtain several depth candidates. In addition, the uncertainties in the heights were estimated and the final object depth was obtained by assembling the depth predictions according to the uncertainties. Experiments demonstrated the effectiveness of the depth estimation method, which achieved state-of-the-art (SOTA) results on a monocular 3D object detection task of the KITTI dataset.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Time series uncertainty forecasting based on graph augmentation and attention mechanism
    Chaojie MEN, Jing ZHAO, Nan ZHANG
    J* E* C* N* U* N* S*    2025, 2025 (1): 82-96.   DOI: 10.3969/j.issn.1000-5641.2025.01.007
    Abstract17)   HTML2)    PDF(pc) (1026KB)(2)       Save

    To improve the ability to predict future events and effectively address uncertainty, we propose a network architecture based on graph augmentation and attention mechanisms for uncertainty forecasting in multivariate time series. By introducing an implicit graph structure and integrating graph neural network techniques, we capture the mutual dependencies among sequences to model the interactions between time series. We utilize attention mechanisms to capture temporal patterns within the same sequence for modeling the dynamic evolution patterns of time series. We utilize the Monte Carlo dropout method to approximate model parameters and model the predicted sequences as a stochastic distribution, thus achieving accurate uncertainty forecasting in time series. The experimental results indicate that this approach maintains a high level of prediction precision while providing reliable uncertainty estimation, thus providing confidence for use in decision-making tasks.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Knowledge graph completion by integrating textual information and graph structure information
    Houlong FAN, Ailian FANG, Xin LIN
    J* E* C* N* U* N* S*    2025, 2025 (1): 111-123.   DOI: 10.3969/j.issn.1000-5641.2025.01.009
    Abstract7)   HTML1)    PDF(pc) (1436KB)(3)       Save

    Based upon path query information, we propose a graph attention model that effectively integrates textual and graph structure information in knowledge graphs, thereby enhancing knowledge graph completion. For textual information, a dual-encoder based on pre-trained language models is utilized to separately obtain embedding representations of entities and path query information. Additionally, an attention mechanism is employed to aggregate path query information, which is used to capture graph structural information and update entity embeddings. The model was trained using contrastive learning and experiments were conducted on multiple knowledge graph datasets, with good results achieved in both transductive and inductive settings. These results demonstrate the advantage of combining pre-trained language models with graph neural networks to effectively capture both textual and graph structural information, thereby enhancing knowledge graph completion.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Label-perception augmented causal analysis of mental health over social media
    Yiping LIANG, Luwei XIAO, Linlin WANG
    J* E* C* N* U* N* S*    2025, 2025 (1): 124-137.   DOI: 10.3969/j.issn.1000-5641.2025.01.010
    Abstract14)   HTML1)    PDF(pc) (1605KB)(0)       Save

    Online social media are frequently used by people as a way of expressing their thoughts and feelings. Among the vast amounts of online posts, there may be more concerning ones expressing potential grievances and mental illnesses. Identifying these along with potential causes of mental health problems is an important task. Observing these posts, it is found that there is a label co-occurrence phenomenon in contexts, i.e., the semantics of multiple candidate labels appear in the context of one sample, which interferes with the modeling and prediction of label patterns. To mitigate the impact of this phenomenon, we propose a label-aware data augmentation method, which leverages large-scale pre-trained language models with excellent text comprehension capability to identify potential candidate labels, abates the noise from irrelevant co-occurring labels by estimating sample-independent label semantic strengths, and constructs well-performing classifiers with pre-trained language models. Extensive experiments validate the effectiveness of our model on the recent datasets Intent_SDCNL and SAD.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Purging diffusion models through CLIP based fine-tuning
    Ping WU, Xin LIN
    J* E* C* N* U* N* S*    2025, 2025 (1): 138-150.   DOI: 10.3969/j.issn.1000-5641.2025.01.011
    Abstract11)   HTML1)    PDF(pc) (1531KB)(3)       Save

    Diffusion models have revolutionized text-to-image synthesis, enabling users to generate high-quality and imaginative artworks from simple natural-language text prompts. Unfortunately, due to the large and unfiltered training dataset, inappropriate content such as nudity and violence can be generated from them. To deploy such models at a higher level of safety, we propose a novel method, directional contrastive language-image pre-training (CLIP) loss-based fine-tuning, dubbed as CLIF. This method utilizes directional CLIP loss to suppress the model’s inappropriate generation ability. CLIF is lightweight and immune to circumvention. To demonstrate the effectiveness of CLIF, we proposed a benchmark called categorized toxic prompts (CTP) to evaluate the ability to generate inappropriate content for text-to-image diffusion models. As shown by our experiments on CTP and common objects in context (COCO) datasets, CLIF is capable of significantly suppressing inappropriate generation while preserving the model’s ability to produce general content.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Infrared small target detection algorithm deployed on HiSilicon Hi3531
    Xiaoxue FU, Chang HUANG
    J* E* C* N* U* N* S*    2025, 2025 (1): 151-164.   DOI: 10.3969/j.issn.1000-5641.2025.01.012
    Abstract7)   HTML1)    PDF(pc) (1719KB)(0)       Save

    In response to the existing shortcomings of large computational complexity, poor real-time performance, and deployment difficulties in current algorithms, and to meet the high requirements of real-time performance and accuracy for infrared detection systems, proposes a lightweight algorithm deployed on domestically produced embedded chips, termed YOLOv5-TinyHisi. The YOLOv5-TinyHisi algorithm undertakes lightweight modifications to the backbone network structure based on the characteristics of infrared small targets. Additionally, it utilizes SIoU optimized loss function for boundary error, thereby enhancing the accuracy of infrared small target localization. The YOLOv5-TinyHisi algorithm model is deployed on Hi3531DV200, utilizing the chip-integrated neural network inference engine (NNIE) to accelerate network inference. Experimental results on public datasets demonstrate that the algorithm achieves a 1.52% improvement in average precision (mAP) compared to YOLOv5, while significantly reducing parameter count and model size. On the Hi3531DV200, the inference speed for a single image with a resolution of (1280 × 512) pixels reaches 35 frames per second (FPS), with a recall rate of 95%, meeting the real-time and accuracy requirements of the infrared detection system.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Machine-learning-based model checker performance prediction
    Chengyu ZHANG, Jiayi ZHU, Yihao HUANG, Di YANG, Jianwen LI, Weikai MIAO, Di YAN, Bin GU, Naijun ZHAN, Geguang PU
    Journal of East China Normal University(Natural Science)    2024, 2024 (4): 18-29.   DOI: 10.3969/j.issn.1000-5641.2024.04.002
    Abstract152)   HTML11)    PDF(pc) (2393KB)(118)       Save

    An and-inverter graph (AIG) is a representation of electrical circuits typically passed as input into a model checker. In this paper, we propose an AIG structural encoding that we use to extract the features of AIGs and construct a portfolio-based model checker called Liquid. The underlying concept of the proposed structural encoding is the enumeration of all possible AIG substructures, with the frequency of each substructure encoded as a feature vector for use in subsequent machine-learning processes. Because the performance of model-checking algorithms varies across different AIGs, Liquid combines multiple such algorithms and selects the algorithm appropriate for a given AIG via machine learning. In our experiments, Liquid outperformed all state-of-the-art model checkers in the portfolio, achieving a high prediction accuracy. We further studied the effectiveness of Liquid from several perspectives.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    A formal verification method for embedded operating systems
    Yang WANG, Jingcheng FANG, Xiong CAI, Zhipeng ZHANG, Yong CAI, Weikai MIAO
    Journal of East China Normal University(Natural Science)    2024, 2024 (4): 1-17.   DOI: 10.3969/j.issn.1000-5641.2024.04.001
    Abstract239)   HTML13)    PDF(pc) (1364KB)(322)       Save

    The operating system is the core and foundation of the entire computer system. Its reliability and safety are vital because faults or vulnerabilities in the operating system can lead to system crashes, data loss, privacy breaches, and security attacks. In safety-critical systems, any errors in the operating system can result in significant loss of life and property. Ensuring the safety and reliability of the operating system has always been a major challenge in industry and academia. Currently, methods for verifying the operating system’s safety include software testing, static analysis, and formal methods. Formal methods are the most promising in ensuring the operating system’s safety and trustworthiness. Mathematical models can be established using formal methods, and the system can be formally analyzed and verified to discover potential errors and vulnerabilities. In the operating system, formal methods can be used to verify the correctness and completeness of the operating system’s functions and system safety. A formal scheme for embedded operating systems is proposed herein on the basis of existing formal verification achievements for operating systems. This scheme uses VCC (verified C compiler), CBMC (C bounded model checker), and PAT (process analysis toolkit) tools to verify the operating system at the unit, module, and system levels, respectively. The schema, upon being successfully applied to a task scheduling architecture case of a certain operating system, exhibits a certain universality for analyzing and verifying embedded operating systems.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Research and design of data synchronization schemes of postgraduate information systems based on microservice
    Huiling TAO, Yilin MA, Ye WANG, Qiwen DONG
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 42-52.   DOI: 10.3969/j.issn.1000-5641.2024.02.006
    Abstract181)   HTML10)    PDF(pc) (1018KB)(203)       Save

    With the popularization of university information system applications and the increase in their usage frequency, teachers and students have higher requirements for data consistency, accuracy, timeliness, and completeness. The original data synchronization scheme using extensible markup language (XML) for data synchronization has the disadvantages of low synchronization efficiency and difficulty of expansion. The open-source tool, DataX, can complete data synchronization between various heterogeneous databases without damaging the source database. This study used DataX to improve the original data synchronization scheme and proposed different data synchronization schemes for various business requirements and application scenarios in the foundation of university postgraduate information system construction. At the same time, in view of the shortcomings of DataX in which only one read can do one write during the start-up and execution, the method where one read can do multiple writes was designed. The comparison experiment shows that the optimized scheme can improve data synchronization efficiency, has better scalability, and can meet the data synchronization requirements of universities.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Collaborative stranger review-based recommendation
    Luping FENG, Liye SHI, Wen WU, Jun ZHENG, Wenxin HU, Wei ZHENG
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 53-64.   DOI: 10.3969/j.issn.1000-5641.2024.02.007
    Abstract120)   HTML11)    PDF(pc) (6769KB)(172)       Save

    Review-based recommendations are mainly based on the exploitation of textual information that reflects the characteristics of items and user preferences. However, most existing approaches overlook the influence of information from hidden strangers on the selection of reviews for the target user. However, information from strangers can more accurately measure the relative feelings of the user and provide a complement to the target user’s expression, leading to more refined user modeling. Recently, several studies have attempted to incorporate similar information from strangers but ignore the use of information regarding other strangers. In this study, we proposed a stranger collaborative review-based recommendation model to make effective use of information from strangers by improving accurate modeling and enriching user modeling. Specifically, for capturing potential user preferences elaborately, we first designed a collaborative stranger attention module considering the textual similarities and preference interactions between the target user and the hidden strangers implied by the reviews. We then developed a collaborative gating module to dynamically integrate information from strangers at the preference level based on the characteristics of the target user-item pair, effectively filtering preferences of strangers and enriching target user modeling. Finally, we applied a latent factor model to accomplish the recommendation task. Experimental results have demonstrated the superiority of our model compared to state-of-the-art methods on real-world datasets from various sources.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Dual-path network with multilevel interaction for one-stage visual grounding
    Yue WANG, Jiabo YE, Xin LIN
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 65-75.   DOI: 10.3969/j.issn.1000-5641.2024.02.008
    Abstract118)   HTML6)    PDF(pc) (1515KB)(170)       Save

    This study explores the multimodal understanding and reasoning for one-stage visual grounding. Existing one-stage methods extract visual feature maps and textual features separately, and then, multimodal reasoning is performed to predict the bounding box of the referred object. These methods suffer from the following two weaknesses: Firstly, the pre-trained visual feature extractors introduce text-unrelated visual signals into the visual features that hinder multimodal interaction. Secondly, the reasoning process followed in these two methods lacks visual guidance for language modeling. It is clear from these shortcomings that the reasoning ability of existing one-stage methods is limited. We propose a low-level interaction to extract text-related visual feature maps, and a high-level interaction to incorporate visual features in guiding the language modeling and further performing multistep reasoning on visual features. Based on the proposed interactions, we present a novel network architecture called the dual-path multilevel interaction network (DPMIN). Furthermore, experiments on five commonly used visual grounding datasets are conducted. The results demonstrate the superior performance of the proposed method and its real-time applicability.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Parallel block-based stochastic computing with adapted quantization
    Yongzhuo ZHANG, Qingfeng ZHUGE, Edwin Hsing-Mean SHA, Yuhong SONG
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 76-85.   DOI: 10.3969/j.issn.1000-5641.2024.02.009
    Abstract130)   HTML9)    PDF(pc) (1107KB)(102)       Save

    The demands of deep neural network models for computation and storage make them unsuitable for deployment on embedded devices with limited area and power. To solve this issue, stochastic computing reduces the storage and computational complexity of neural networks by representing data as a stochastic sequence, followed by arithmetic operations such as addition and multiplication through basic logic operation units. However, short stochastic sequences may cause discretization errors when converting network weights from floating point numbers to the stochastic sequence, which can reduce the inference accuracy of stochastic computing network models. Longer stochastic sequences can improve the representation range of stochastic sequences and alleviate this problem, but they also result in longer computational latency and higher energy consumption. We propose a design for a differentiable quantization function based on the Fourier transform. The function improves the matching of the model to stochastic sequences during the network’s training process, reducing the discretization error during data conversion. This ensures the accuracy of stochastic computational neural networks with short stochastic sequences. Additionally, we present an adder designed to enhance the accuracy of the operation unit and parallelize computations by chunking inputs, thereby reducing latency. Experimental results demonstrate a 20% improvement in model inference accuracy compared to other methods, as well as a 50% reduction in computational latency.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Multi-view and multi-pose lock pin point cloud model reconstruction based on turntable
    Xin LU, Chang HUANG, Zhiwei JIN
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 86-96.   DOI: 10.3969/j.issn.1000-5641.2024.02.010
    Abstract131)   HTML8)    PDF(pc) (3511KB)(79)       Save

    The surface structure of a container lock pin is complex, making it difficult to establish a point cloud model with a high surface feature integrity. Therefore, a multi-view and multi-attitude point cloud model reconstruction algorithm based on a turntable was proposed to restore the complete surface features of the locking pin. Considering that sensors at a fixed height are paired with rotating turntables in most scenarios, the collected surface features are usually somewhat missing. Initially, the algorithm uses the parameter calibration results of the turntable to realize the multi-view three-dimensional point cloud stitching, and establishes a fixed attitude point cloud model. Then, through the proposed improved spherical projection algorithm, the positioning of the locking pin on the turntable is selected to establish a point cloud model under another posture. Finally, the point cloud model with multiple attitudes is integrated to improve its surface characteristics. Experimental results show that the proposed algorithm can build a lock-pin point cloud model with high surface feature integrity.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Infrared small-target detection method based on double-layer local energy factor
    Lingxiao TANG, Chang HUANG
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 97-107.   DOI: 10.3969/j.issn.1000-5641.2024.02.011
    Abstract111)   HTML9)    PDF(pc) (1822KB)(114)       Save

    Infrared small-target detection has always been an important technology in infrared tracking systems. The current infrared approaches for small-target detection in complex backgrounds are prone to generating false alarms and exhibit sluggish detection speeds from the perspective of the human visual system. Using the multiscale local contrast measure using a local energy factor (MLCM-LEF) method, an infrared small-target detection method based on a double-layer local energy factor is proposed. The target detection was performed from the perspectives of the local energy difference and local brightness difference. The double-layer local energy factor was used to describe the difference between the small target and the background from the energy perspective, and the weighted luminance difference factor was used to detect the target from the brightness angle. The infrared small target was extracted by a two-dimensional Gaussian fusion of the processing results of the two approaches. Finally, the image mean and standard deviation were used for adaptive threshold segmentation to extract the small infrared target. In experimental tests on public datasets, this method improved the performance in suppressing background compared with the MLCM-LEF algorithm, DLEF (double-layer local energy factor) reduced the detection of a single frame time by one-third.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Group contrastive learning for weakly-supervised 3D point cloud semantic segmentation
    Zhihong ZHENG, Haichuan SONG
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 108-118.   DOI: 10.3969/j.issn.1000-5641.2024.02.012
    Abstract214)   HTML9)    PDF(pc) (1305KB)(247)       Save

    Three-dimensional point cloud semantic segmentation is an essential task for 3D visual perception and has been widely used in autonomous driving, augmented reality, and robotics. However, most methods work under a fully-supervised setting, which heavily relies on fully annotated datasets. Many weakly-supervised methods have utilized the pseudo-labeling method to retrain the model and reduce the labeling time consumption. However, the previous methods have failed to address the conformation bias induced by false pseudo labels. In this study, we proposed a novel weakly-supervised 3D point cloud semantic segmentation method based on group contrastive learning, constructing contrast between positive and negative sample groups selected from pseudo labels. The pseudo labels will compete with each other within the group contrastive learning, reducing the gradient contribution of falsely predicted pseudo labels. Results on three large-scale datasets show that our method outperforms state-of-the-art weakly-supervised methods with minimal labeling annotations and even surpasses the performance of some classic fully-supervised methods.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    Hidden layer Fourier convolution for non-stationary texture synthesis
    Xinxin HE, Haichuan SONG
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 119-130.   DOI: 10.3969/j.issn.1000-5641.2024.02.013
    Abstract142)   HTML13)    PDF(pc) (5109KB)(185)       Save

    The remarkable achievements of deep learning in computer vision have led to significant development in example-based texture synthesis. The texture synthesis model using neural networks mainly includes local components, such as convolution and up/down sampling, which is unsuitable for capturing irregular structural attributes in non-stationary textures. Inspired by the frequency and space domain duality, a non-stationary texture synthesis method based on hidden layer Fourier convolution is proposed in this study. The proposed method uses the generative adversarial network as the basic architecture, performs feature splitting along the channel in the hidden layer, and builds a local branch in the image domain and a global branch in the frequency domain to consider visual perception and structural information. Experimental results show that this method can handle structurally challenging non-stationary texture exemplars. Compared with state-of-the-art methods, the method yielded better results in the learning and expansion of large-scale structures.

    Table and Figures | Reference | Related Articles | Metrics | Comments0
    An image caption generation algorithm based on decoupling commonsense association
    Jiawei LIU, Xin LIN
    Journal of East China Normal University(Natural Science)    2024, 2024 (2): 131-142.   DOI: 10.3969/j.issn.1000-5641.2024.02.014
    Abstract81)   HTML5)    PDF(pc) (1781KB)(98)       Save

    The image caption generation algorithm based on decoupling commonsense association aims to eliminate the interference of commonsense association between various types of entities on the model reasoning, and improve the fluency and accuracy of the generated description. Aiming at the relationship sentences in the current image description that conform to common sense but do not conform to the image content, the algorithm first uses a novel training method to improve the attention of the relationship detection model to the real relationship in the image and improve the accuracy of relationship reasoning. Then, a relation-aware entity interaction method was used to carry out targeted information interaction for entities with relationships, and the relationship information was strengthened. The experimental results show that the proposed algorithm can correct some commonsense false relationships, generate more accurate image captions, and obtain better experimental results on various evaluation indicators.

    Table and Figures | Reference | Related Articles | Metrics | Comments0