Consistent Interpretation of Ensemble Classifiers in Trojan-Horse Detection.docx
1. Base paper Title: Consistent Interpretation of Ensemble Classifiers in Trojan-Horse Detection
Modified Title: Interpreting Ensemble Classifiers Consistently for Trojan-Horse Detection
Abstract
Hardware trojan classification/detection systems (HTDs) based on machine or deep
learning have recently been proven to be effective. However, the existence of irrelevant
features as well as class imbalance reduces the effectiveness of these models. To address these
issues, this work describes a hardware trojan detection method based on gate-level net-list
structural features. To begin with, SMOTE-Tomek is used for data augmentation. The best
features are then selected using a hybrid feature selection technique that combines the filter
and wrapper. The results show that using the optimal features and tuned parameters, KNORA-
U and KNORA-E, dynamic ensemble classifiers, outperform existing techniques with area
under the receiver operating characteristic curve (AUC-ROC) values of 0.988 and 0.982,
respectively. The circuit and systems (CAS) lab dataset is used to analyze these evaluations.
Furthermore, knowing the details of the prediction is extremely crucial for the model’s
transparency and generalizability. As a result, when using a model agonistic framework such
as SHapley Additive exPlanations (SHAP), it is proved that, in addition to other features, the
number of references is consistent across models and has a significant impact on prediction.
Due to consistent interpretations, this methodology strengthens the hardware security
professionals’ trust in HTDs.
Existing System
The incredible advancements in semiconductor technology have a significant impact
on the Internet of Things (IoT) applications such as personal health monitoring in our daily
lives [1], [2], [3]. The hardware is the foundation of these systems and integrated circuits (ICs)
are its core components [4]. Due to the distributed nature of chip production, IC firms are
forced to rely on untrustworthy third-party intellectual property vendors, as well as global
outsourcing foundries, for design and fabrication in order to save money and reduce time to
market [5], [6]. This could lead to the development of synchronous viruses, such as hardware
trojan (HT), that cause malicious changes to integrated circuits [7]. HTs can modify the
functionality, decrease the reliability, leak important data from ICs, and sometimes cause a
denial of service [8], [9]. Due to its stealthy nature and miniature size, it frequently escaped
2. regular design verification as well as post-manufacturing tests [10]. Hence, this detection has
grown in importance as a research topic in the IC industry. The three types of detection
techniques currently in use are pre-silicon/design-time, post-silicon/test-time and run-time.
Pre-silicon trojan detection has recently gained popularity because it is preferable to detect
trojans as soon as possible in order to eliminate unnecessary challenges caused by using a
design with inserted trojans [11]. It is further classified into static and dynamic detection. While
dynamic detection is less expensive, it is not recommended due to limited test coverage and
the need for a golden model as a reference [12]. Static detection converts the detection problem
into a binary classification in machine learning. Due to its effectiveness and generalizability,
machine learning-based static detection is becoming more popular nowadays. This strategy
collects HT-related information from design source code, such as gate-level net-list or register
transfer level (RTL) code, without simulating the circuit. In recent times, some HTDs have
used machine learning classifiers such as SVM and RFC to classify trojan nets [13], [14], [15],
[16]. But, an increasing number of HTDs have started to use deep learning strategies such as
recurrent neural networks (RNN), generative adversarial networks (GAN), and long short-term
memory (LSTM) [17], [18], [19]. The majority of these techniques have excellent detection
accuracy and low false positive rates. But, when humans are involved, their growing
complexity is a major drawback, since these cannot provide any details about the reasons
underlying their decisions. As a result, an explanation of the model’s prediction is necessary.
Currently, [20] provides some explanations for HTD findings based on feature importance, but
it is model dependent
Drawback in Existing System
Lack of Transparency: Ensemble models, especially complex ones like Random
Forests or Gradient Boosting Machines, are often considered as black boxes. It can be
challenging to interpret the decision-making process of the entire ensemble, making it
difficult to understand how individual models contribute to the final decision.
Adversarial Attacks: Trojan-Horse attacks may be designed to exploit the
vulnerabilities of ensemble models. Adversarial attacks can manipulate the input data
to deceive individual classifiers within the ensemble, leading to inconsistent and
potentially incorrect predictions.
3. Dynamic Nature of Trojan Attacks: Trojan attacks evolve over time, and new attack
strategies may emerge. Ensembles may struggle to adapt quickly to new attack patterns,
leading to a lag in detection capabilities.
Difficulty in Identifying Trojan Signatures: Trojans often manifest as subtle changes
in the model's decision boundary, making them difficult to detect. Inconsistent
interpretations across ensemble members may hinder the identification of Trojan
signatures or patterns, as the contribution of individual classifiers might vary.
Proposed System
Proposed methodology are summarization of data, pre-processing, evaluation of base
models, model performance improvement, classification, and interpretation.
Define metrics to assess the consistency of feature importance across different base
classifiers. This ensures that the ensemble's interpretation is stable and reliable, even
when using diverse models within the ensemble.
The proposed method can only detect whether a third-party IP is suspicious (i.e., trojan-
infected) or not, but it is unable to pinpoint the trojan’s location within the chip.
The proposed framework, summary plots are utilized to analyze the most influential
features in the prediction of Trojan detection, Additionally, decision plots are employed
to identify the relationships between the value of a feature and its impact on the
prediction.
Algorithm
Model-Agnostic Interpretability Techniques: Employ model-agnostic
interpretability techniques that can be applied to any machine learning model, including
ensemble classifiers. Techniques such as SHAP values, LIME (Local Interpretable
Model-agnostic Explanations), and Partial Dependence Plots can provide insights into
the contribution of individual features across the ensemble.
Feature Importance Consistency Metrics: Define metrics to assess the consistency
of feature importance across different base classifiers. Consistent feature importance
rankings can provide more reliable insights into the relevant features for Trojan-Horse
detection.
4. Training with Interpretable Features: If possible, train the ensemble on features that
are inherently more interpretable. Using domain-specific features or incorporating
explainable representations of input data can enhance the interpretability of the model.
Advantages
Improved Transparency: Consistent interpretation makes the decision-making
process of the ensemble more transparent. This transparency is crucial for security
professionals and end-users to understand how the ensemble arrives at its predictions,
promoting trust in the Trojan-Horse detection system.
Effective Troubleshooting: In cases where the ensemble may produce unexpected
results or false positives/negatives, consistent interpretation enables effective
troubleshooting. Analysts can trace back the decision process, identify sources of
inconsistency, and refine the model or address data quality issues.
Facilitates Model Maintenance: When interpreting ensemble classifiers consistently,
it becomes easier to maintain and update the detection system over time. Understanding
how each base classifier contributes to the ensemble's decisions facilitates model
updates, retraining, and improvements without sacrificing interpretability.
Identification of Adversarial Attacks: Inconsistencies in the interpretation of
ensemble classifiers can be indicative of adversarial attacks or attempts to manipulate
the model. By ensuring consistency, the detection system becomes more robust against
adversarial attempts to deceive or evade detection.
Software Specification
Processor : I3 core processor
Ram : 4 GB
Hard disk : 500 GB
Software Specification
Operating System : Windows 10 /11
Frond End : Python
Back End : Mysql Server
IDE Tools : Pycharm