Analysis of Embedding Locations in the Subband Frequency DCT on Scanned Images

Uploading an identity card as an image for the account verification process or transactions online can be a threat to application users. Identity card theft can be carried out by irresponsible persons if the application can be hacked. Therefore, protection of the image is required for authentication. In this study, the proposed technique is watermarking. A watermark in the form of a binary image will be embedded into the image as ownership using a Discrete Cosine Transform. The Discrete Cosine Transform works in the frequency domain. The location of the embedding of different watermarks was analysed in each 8 × 8 DCT block. The results of the analysis to assess the imperceptibility of original images and watermarked images using PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index Measure), while assessing the watermark robustness embedded using NCC (Normalized Cross Correlation). The results show PSNR (Peak Signal to Noise Ratio) ≥ 54 dB with a watermark strength of 0,1 and an average SSIM (Structural Similarity Index Measure) ≥ 0,9 on 4 scanned images in BMP format with a resolution of 100 DPI. A good watermark embedding is done on the green component at middle frequencies to maintain a balance between imperceptibility and robustness. In contrast, the red component at low frequency is vulnerable to attacks in the form of brightness +20 and contrast +50 with an average NCC (Normalized Cross Correlation) ≤ 0,85.


Introduction
Currently, the dissemination of identity card information has been widespread in line with the development of information technology for account verification or online transactions. Weaknesses in the system can threaten users who are not guaranteed the security and protection of user data. Identity card theft can occur if an irresponsible person can hack a user's account.
Many techniques have been proposed to protect images, such as cryptography, watermarking, digital signatures, and so on [1]. In this study, the proposed technique is watermarking. Digital watermarking embeds information into digital images to verify ownership and authenticity [2]. Image watermarking techniques can be divided based on spatial and frequency domains. The spatial domain works directly on the pixels, while the frequency domain works on the image transformation coefficient [3].
In this study, the proposed watermark embedding works in the frequency domain using Discrete Cosine Transform (DCT). The Discrete Cosine Transform (DCT) converts the signal into fundamental frequency components, representing the image as a sum of sinusoids of various magnitudes and frequencies. Discrete Cosine Transform (DCT) segmented non-overlapping image blocks and applied them to each block resulting in three frequency subbands, namely low, middle, and high frequencies [3].
Several watermark embedding methods have been proposed in the last five years. In 2017, embedding a watermark using the DCT method in the Y-component at high frequency by setting the coordinate (7,7) as the embedding location resulted in a PSNR of 40,82 dB [4]. The combination of watermarking techniques on medical images uses Fast Discrete Curvelet Transform (FDCuT) and Discrete Cosine Transform (DCT) with different embedding factor scales. The results showed the highest PSNR with a scale factor of α=2 and the highest NCC with a scale factor of α=8, indicating that the higher the scale factor value will increase robustness and, vice versa, will reduce imperceptibility performance [5]. The analysis of copyright concealment techniques using DCT transformation was carried out on RGB images in JPG and BMP formats. The results show the highest average PSNR value with the BMP format [6].
An essential parameter of watermarking is determining the location of the watermark embedding on the original image [7]. Few previous studies have implemented different embedding locations in each 8×8 DCT block. Therefore, this research was conducted to select other embedding locations in the low, middle, and high frequencies based on the cosine graph with the functional domain −1 ≤ ≤ 1, set of principle values (0 ≤ ≤ ) where representing the frequency wave. Sorting on the AC DCT coefficient in each frequency subband is applied to obtain the coordinates of the selected pixels. This study aimed to analyze the location of the watermark embedding in the Discrete Cosine Transform frequency subband on the scanned image. The embedding technique and the selection of the embedding coefficient greatly determine the imperceptibility and robustness qualities [8].

Research Methods
In this study, there are two processes: embedding and watermark extraction. The proposed method is a watermarking technique in the frequency domain using a Discrete Cosine Transform (DCT) with a block size of 8×8. Watermark embedding and extraction are performed on each colour component (RGB) and frequency subband. AC DCT coefficients are sorted for different embedding locations in each 8×8 block based on the frequency subband. The type of image proposed is a colour image (RGB) for the original image and a binary image for the watermark image.

Watermark Embedding
The proposed watermark embedding is shown in Figure 1. The conditions that must be met by an original image in the watermark embedding process using Discrete Cosine Transform (DCT) are stated in Equation (1).
Where is the maximum number of 8×8 blocks that will be embedded with watermark bits, is the side length of the original image, and is the side width of the original image. From Equation (1), it can be concluded that the requirements of a watermark can be embedded into the original image if it meets Equation (2).
is the length of the side of the watermark image and is the width of the side of the watermark image. Once the embedding conditions are met, the original image is divided into 8×8 sized blocks across all colour component (RGB). The original image colour intensity value is obtained at this stage, and 128 is reduced [9]. Then, every 8×8 block in all colour components (RGB) subtracted by 128 is transformed by Equation (3) to obtain the DCT coefficient. (3) = 0,1,2, … ,7 and = 0,1,2, … ,7, where the magnitude of ( ) is 1 √2 for = 0 and 1 for the others. ( , ) is the colour intensity value in the original image, which has been reduced by 128 and = 180°. The DCT coefficient is divided into three frequency subbands: low, middle, and high frequencies [10]. The distribution of the proposed DCT coefficient is similar to that of previous studies [11]. Declaration of the frequency subband variable and initialized in the form of pixel coordinates as shown in Table 1, except that the DC coefficient (0,0) is not used to maintain good visual quality [2].  In the next stage, the AC DCT coefficient sorting process is carried out in three steps: taking the AC DCT coefficient on the colour component (RGB) and frequency subband, sorting the AC DCT coefficient in ascending order (from smallest to largest), and fetching the selected index as the embedding location shown in Algorithm 1.
In Algorithm 1, the declaration of the index variable with the initial value = 0 because first element in an array using 0 index, where index < the total number of elements in the array is the AC DCT coefficient at the pixel coordinates that have been declared previously based on the frequency subband. The index is the sequential number of the AC DCT coefficients that are not sequential so that the AC DCT coefficients are sorted from the smallest to the largest. Taking the AC DCT coefficient index sorted by the rules of the AC DCT coefficient for low frequencies, is -1 (lowest in the subband). The AC DCT coefficient for middle frequencies is 0 or close to 0, and the AC DCT coefficient for high frequencies is 1 (highest in the subband). Therefore, the method of taking AC DCT coefficients for low frequencies at the first index. Middle frequencies with the condition that all negative AC DCT coefficients are changed to positive and sorted from smallest to largest so that the first index is taken because it is close to the value 0. High frequency on the last index with the condition that the number of indexes or the AC DCT coefficient is reduced by 1, because if there are elements in a list, the last element is − 1. Then, the value of the colour intensity of the watermark image is extracted. In this study, when a binary image is displayed in the form of an image, it will be worth 255 for white and 0 for black. The value 255 is changed to value one, and the value 0 remains. The next step is to modify the AC DCT coefficient selected from the sorting results with Equation (4).
Where ( , ) is the DCT coefficient of the sorted result, is the strength of the watermark that adjusts the tradeoff between imperceptibility and robustness [12], is one-bit binary data (0 or 1), where the bit value 0 becomes -1, and the bit is one fixed. The decrease in the quality of the watermarked image is influenced by the higher value but the better level of accuracy in the extraction process [13]. In this study, the value of used is 0,1, which has been proposed in previous studies with a PSNR value of ≥ 40 dB [2]. After all the watermark bits are embedded, all of the DCT coefficients in colour components (RGB) are returned to the spatial domain by Equation (5). is the value of the DCT coefficient that has been embedded with a watermark bit and = 180°. Next, all 8×8 DCT blocks are combined into the original image size, and the value of colour intensity in all colour components (RGB) is added by 128 to produce a watermarked image.

Watermark Extraction
The proposed watermark extraction is shown in Figure 2. The original image is required for the extraction process which is useful for proving ownership. To prevent watermarked images from being given a new watermark or manipulation of digital image attacks by a second party and claiming it as their own, the original owner can use the original image to prove ownership through watermark insertion and detection.
The original image and the watermarked image are divided into 8×8 blocks, and then the value of the colour intensity (RGB) of the two images is subtracted by 128. After that, every 8×8 block of the two images in all colour components (RGB) subtracted by 128 is transformed to the frequency domain with Equation (3). The watermark extraction process is carried out on the colour component (RGB) and the same frequency subband as when embedding.
The next step is sorting the AC DCT coefficient given in Algorithm (1), which is only done on the original image so that the pixel coordinates are the same as the watermark embedding process in each 8×8 block. Then, the AC DCT coefficient of the original image at the pixel coordinates selected from the sorting results is compared with the AC DCT coefficient of the watermarked image at the same pixel coordinate given in Algorithm (2). If the AC DCT coefficient of the watermarked image > the AC DCT coefficient of the original image, then the watermark bit extraction result = 1, otherwise if the AC DCT coefficient of the watermarked image < AC DCT coefficient of the original image, then the watermark bit extraction result = 0, carried out until the bit extraction result watermark as the size of the original watermark image. To be displayed in the image, the value of 1 is changed to 255, and the value of 0 will still be 0.

Result and Discussion
The test was carried out on four scanned images with the HP DeskJet Ink Advantages 2135 printer with a resolution of 100 DPI in BMP format, shown in Table 2. The watermark image in the form of a combination of letters and numbers in Arial font is shown in Table 3. The programming language used for embedding analysis and watermark extraction from the proposed method is C# (SharpDevelop 5.1) with device specifications Intel (R) Core (TM) i5-1035G1 CPU @ 1.00GHz 1.19 GHz, 8,00 GB RAM, 64-bit operating system, x64-based processor and Windows 10 Home Single Language.

Imperceptibility Analysis
A good watermark is invisible to the human eye, so it cannot distinguish between the original and the watermarked images. Reference [14] stated that the image watermarking technique aims at high imperceptibility of the embedded watermark with minimal distortion in the watermarked image. In imperceptibility, the original image and the watermarked image will be evaluated using two image quality assessment metrics commonly used in previous studies [7], [12], [15], [16], [17], namely, Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). Peak Signal to Noise Ratio (PSNR) is a classic metric to estimate the similarity between two images [18]. Typical values for PSNR are between 30 and 50 dB for 8bit depth, the higher the better. PSNR qualification for the invisible watermarked image is above 45 dB on average [14]. PSNR is measured in decibels (dB) which can be obtained by Equation (6).
Where 255 is the maximum pixel value for an 8-bit image depth, the amount of noise affecting the measured signal is represented by the MSE. Mean Square Error (MSE) is a measure used to assess how well a method is in reconstructing or restoring an image relative to the original image. The smaller the MSE value, it indicates that the image processing results are getting better or the image after processing is getting closer to the original image [19]. MSE can be obtained by Equation (7).
Where and are the lengths of the rows and columns of the two images (original image and watermarked image), 1 ( , ) is the intensity of the image before processing, and 2 ( , ) is the intensity of the image after processing.
The Structural Similarity Index Measure (SSIM) was designed to improve traditional methods, such as PSNR, which are inconsistent with human visual perception [20]. The main emphasis in the latest research is more focused on analysing features of Human Visual System (HVS). Knowledge of combining HVS and human perception in an objective assessment of image quality can increase its accuracy [19]. SSIM measures similarity using a combination of three heuristic factors, namely the ratio of luminance, contrast, and structure [12]. The SSIM value can be obtained by Equation (8).

=1
is the variance of the image before processing, is the variance of the image after processing, is the covariance of both images, 1 = ( 1 ) 2 ; 2 = ( 2 ) 2 , = 255 if the image depth is 8 bits, 1 = 0,01; 2 = 0,03 is the default value of the scalar constant, is the colour intensity value in the image before processing, is the colour intensity value in the image after processing, and is the number of pixels. SSIM values range from 0 to 1. If two images are similar, then SSIM is worth 1 [20].
The results of the PSNR and SSIM values for all test images are given in Table 4. The proposed method for watermark embedding produces a PSNR value of ≥ 54 dB with an average SSIM value of ≥ 0,9. Comparison of imperceptibility for watermark embedding in each colour component (RGB) and frequency subband given the average PSNR value in Figure 3.   This section involves describing the results obtained from the research and drawing similarities and differences between the research and previous others from methods, data, and results. However, describe whether the problems have been researched successfully according to the objectives using the proposed methods. This should involve the description of the analysis conducted, cause and benchmark of success/failure, and the unfinished part of the research followed with the steps to be taken as follow up process.
In Figure 3, it can be seen that embedding a watermark for the middle frequency in the red component produces high imperceptibility with an average PSNR value of ≥ 55 dB. In contrast, the low frequency in the blue component is lower than the high frequency in the green component. There is an effect on imperceptibility in selecting colour components to embed the watermark. The original image given in Table 2 is extracted from the value of the colour intensity in each colour component (RGB) and displayed in the form of an image so that the results show that the lighter image dominantly produces a higher PSNR value than the darker image. The lighter image has higher intensity values or grey levels than the darker image.

Robustness Analysis
Robustness is a condition in the embedded watermark that can be detected again after some common signal processing manipulation operations [21]. To analyze the robustness performance, the watermark extracted under attack was evaluated by Normalized Cross Correlation (NCC) and Bit Error Rate (BER) between the original watermark and the extracted watermark [12], [22]. Several previous studies have also used these two metrics to measure watermarking resistance [14], [16], [17]. The correlation function can determine the closeness between two digital images [19]. NCC value ranges from 0 to 1. The extracted watermark is similar to the embedded watermark when the NCC value is close to 1. The average NCC value is 1 in the case of no distortion and pixelation distortion attacks [23]. NCC can be calculated by Equation (9).
1 ( , ) is the image intensity value before processing, and 2 ( , ) is the image intensity value after processing. Bit Error Rate (BER) calculates the bit error rate between the original and extracted watermark [12]. The lowest BER value indicates less distortion from watermark recovery [14]. The opposite of the NCC, if the value is 1, the watermark extraction results are not similar or destroyed. BER can be calculated by Equation (10). Where , and ′ , are the original and extracted watermark bits with size ( × ), and ⊕ refers to the XOR operation [12].
The results show the average value of NCC 1 and BER 0 for all original images that were watermarked without attack. The extracted watermark in the colour component (RGB) and the same frequency subband when embedding was similar to the original watermark. For robustness evaluation, watermarked images are subjected to attacks in the form of brightness +20 and contrast +50. The results of the NCC and BER values with attacks of brightness +20 are given in Table 5 for low frequency, Table 6 for middle frequency, and Table 7 for high frequency. The highest NCC value at low frequency was 0,9580 with a BER of 0,0614, and the lowest NCC was 0,7476 with a BER of 0,3429. The highest NCC value in the middle frequency is 0,9758 with a BER of 0,0357, and the lowest NCC is 0,7995 with a BER of 0,2771. The highest NCC value at high frequency is 0,9875 with a BER of 0,0186, and the lowest NCC is 0,9220 with a BER of 0,1129.   The results of NCC and BER values with attacks of contrast +50 are given in Table 8 for low frequency, Table 9 for middle frequency, and Table 10 for high frequency. The highest NCC value in the low frequency was 0,9420 with a BER of 0,0843, and the lowest NCC was 0,7007 with a BER of 0,3957. The highest NCC value in the middle frequency is 0,9650 with a BER of 0,0514, and the lowest NCC is 0,7560 with a BER of 0,3343. The highest NCC value at high frequency is 0,9836 with a BER of 0,0243, and the lowest NCC is 0,8519 with a BER of 0,2100.
The extracted watermark image with an NCC value of ≤ 0,90 looks damaged and complex to detect compared to the NCC value of ≥ 0,98, which is almost identical to the original watermark. Embedding watermarks at low frequencies is vulnerable to both attacks. Brightness makes each pixel in the image brighter by shifting the output transformation to the left by the desired constant amount, in this case the constant 20 is added to the pixel value. This results in a smaller pixel value changing. The brightness of an image has a high level of grey-scale values. Likewise, images that have a higher level of contrast generally display a higher level of grey-scale. Contrast enhancement in colour (RGB) images is usually accomplished by converting the image into a colour space that has luminosity. Contrast adjustment is performed on the luminosity layer only, and then the image is converted back to the RGB colour space. Manipulating luminosity affects pixel intensity. The luminance signal is performed by combining a proportion of 30 % red, 59 % green, and 11 % blue from the colour signal. Based on light waves, red has a low energy and blue has a greater energy. In this result, the green component maintains stability from attack of brightness +20 and contrast +50, while the red component is susceptible to both attacks, which are inversely proportional to the high imperceptibility.
For every original image that is watermarked in all colour components or frequency subbands with an average PSNR value of ≥ 55 dB, the extracted watermark can still be detected by the human eye compared to the average PSNR value of ≤ 55 dB. The comparison graph of NCC with both attacks is given in Figure 4 and Figure 5 to see the difference in watermark resistance in the colour component (RGB) and the frequency subband of the proposed method.

Conclusions and Future Research
In this study, the embedding technique was carried out on each colour component (RGB) and Discrete Cosine Transform (DCT) frequency subband. Each 8×8 block is embedded by a 1-bit watermark with a different embedding location. Before the watermark bit is embedded, the AC coefficient is sorted for the selected frequency subband. Each frequency subband represents the selected AC waveform and coefficient based on the cosine graph generated from the transformation. The analysis was carried out to meet the watermarking imperceptibility and robustness criteria. The proposed scheme is simple but gives results that can be knowledgeable in selecting suitable colour components and frequency subband for embedding. The results show that the green colour components and middle frequency balance imperceptibility and robustness. The red colour component produces good imperceptibility but not robustness, while the low frequency does not meet the two watermarking criteria. The proposed method successfully recovered the embedded watermark with the highest NCC of ≥ 0,98. In future work, the watermark extracted from this research can be developed for edge detection to the extracted embedded information in numbers and letters to build identity card authentication applications.