In a IoT environment, many devices will periodically transmit data. However, most of the data are redundant, but sensor itself may not have a good standard to decide transmit or not. Some static rule maybe useful on specific scenario, and become ineffective when we change the usage of the sensor. Hence, we design an algorithm to solve the problem of data redundant. In the algorithm, we iteratively separate an image into some smaller regions. Each round, choose a region with highest variability, and separate it into four regions. Finally, each region has different size and uses its average value to represent itself. If a area is more various, the density of regions will be higher. In this paper, we present a method to reduce the file size of thermal sensor which can sense the temperature of a surface and outputs a two dimension gray scale image. In our evaluation result, we can reduce the file size to $50\%$ less than JPEG when there is$0.5\%$ of distortion, and up to $93\%$ less when there is $2\%$ of distortion.
In a IoT environment, many devices will periodically transmit data. However, most of the data are redundant, but sensor itself may not have a good standard to decide send or not. Some static rule maybe useful on specific scenario, and become ineffective when we change the usage of the sensor. Hence, we design an algorithm to solve the problem of data redundant. In the algorithm, we iteratively separate an image into some smaller regions. Each round, choose a region with highest variability, and separate it into four regions. Finally, each region has different size and uses its average value to represent itself. If an area is more various, the density of regions will be higher. In this paper, we present a method to reduce the file size of thermal sensor which can sense the temperature of a surface and outputs a two dimension gray scale image. In our evaluation result, we can reduce the file size to $50\%$ less than JPEG when $0.5\%$ of distortion, and up to $93\%$ less when there is $2\%$ of distortion.
First, we study the sensor Panasonic Grid-EYE which is a thermal camera that can output $8\times8$ pixels image with $2.5^\circ C$ accuracy and $0.25^\circ C$ resolution at $10$ frames per second. In normal mode, the current consumption is 4.5mA. It is a low resolution camera and infrared array sensor, so we install it in our house at ease without some privacy issue that may cause by a surveillance camera.
When someone walks under a Grid-EYE sensor, we will see some pixels with higher temperature than others. Figure~\ref{fig:GridEye} shows an example of image from Grid-EYE sensor. The sensor value will look like a cone shape. The pixel with our head will have the highest temperature, body is lower, and leg is the lowest except background because when the distance from camera to our body is longer, the area cover by the camera will be wider and the ratio of background temperature in the pixel will increase, also our head do not cover by cloth, so the surface temperature will higher than other place. while we are walking in a area, the temperature of air in the area will become warmer, and the shape of human will be harder to recognize.
When someone walks under a Grid-EYE sensor, we will see some pixels with higher temperature than others. Figure~\ref{fig:GridEye} shows an example of image from Grid-EYE sensor. The sensor value will look like a cone shape. The pixel with our head will have the highest temperature, body is lower, and leg is the lowest except background because when the distance from camera to our body is longer, the area cover by the camera will be wider and the ratio of background temperature in the pixel will increase, also our head does not cover by cloth, so the surface temperature will higher than other place. while we are walking in a area, the temperature of air in the area will become warmer, and the shape of human will be harder to recognize.
\begin{figure}[htbp]
\centering
@ -13,7 +13,7 @@ When someone walks under a Grid-EYE sensor, we will see some pixels with higher
\label{fig:GridEye}
\end{figure}
The data we used is from a solitary elder's home. We deployed four Grid-EYE sensor at the corner of her living room, and recorded the thermal video for three weeks at $10$ frames per second data rate.
The data we used is from a solitary elder's home. We deployed four Grid-EYE sensors at the corner of her living room, and recorded the thermal video for three weeks at $10$ frames per second data rate.
\subsection{FLIR ONE PRO}
@ -31,8 +31,8 @@ If we save a frame in a readable format, it will take about 380 bytes storage. H
Huffman coding is a lossless data compressing. In average, it can reduce the frame size from $64$ bytes to $40.7$ bytes with $6$ bytes standard deviation.
\subsubsection{Z-score Threshold}
We can only transmit the pixels with higher temperature since thermal sensors are mostly used for detect heat source. Z-score is define as $z =\frac{\chi-\mu}{\sigma}$, where $\chi$ is the value of the temperature, $\mu$ is the average of the temperature and $\sigma$ is the standard deviation of the temperature. In our earlier work~\cite{Shih17b}, we use Z-score instead of a static threshold to detect human because the background temperature may have a $10^\circ C$ difference between day and night, and when people walk through the sensing area the Grid-EYE, the temperature reading will only increase $2^\circ C$ to $3^\circ C$. Hence, it is impossible to use a static threshold to detect human. In~\cite{Shih17b}, we only use the pixels with the Z-score higher than $2$, so we can reduce the frame size from $64$ bytes to $12.6$ bytes with $2.9$ bytes standard deviation by Z-score threshold $2$ and compress by Huffman coding.
We can only transmit the pixels with higher temperature since thermal sensors are mostly used for detect heat source. Z-score is defined as $z =\frac{\chi-\mu}{\sigma}$, where $\chi$ is the value of the temperature, $\mu$ is the average of the temperature and $\sigma$ is the standard deviation of the temperature. In our earlier work~\cite{Shih17b}, we use Z-score instead of a static threshold to detect human because the background temperature may have a $10^\circ C$ difference between day and night, and when people walk through the sensing area the Grid-EYE, the temperature reading will only increase $2^\circ C$ to $3^\circ C$. Hence, it is impossible to use a static threshold to detect human. In~\cite{Shih17b}, we only use the pixels with the Z-score higher than $2$, so we can reduce the frame size from $64$ bytes to $12.6$ bytes with $2.9$ bytes standard deviation by Z-score threshold $2$ and compress by Huffman coding.
\subsubsection{Gaussian Function Fitting}
Since the shape of human in a thermal image looks like a cone, we may use a gaussian function to fit the image. A Gaussian function $y = Ae^{-(x-B)^2/2C^2}$ has three parameter $A, B and C$. The parameter $A$ is the height of the cone, $B$ is the position of the cone's peak and $C$ controls the width of the cone. We let the pixel with highest temperature be the peak of the cone, so we only need to adjust $A and C$ to fit the image. Guo~\cite{guo2011simple}provide a fast way to get the fitting Gaussian function. In our testing, it will be about $0.5^\circ C$ root-mean-square error, and only needs $5$ bytes to store the position of peak and two parameters.
Since the shape of human in a thermal image looks like a cone, we may use a Gaussian function to fit the image. A Gaussian function $y = Ae^{-(x-B)^2/2C^2}$ has three parameter $A$, $B$ and $C$. The parameter $A$ is the height of the cone, $B$ is the position of the cone's peak and $C$ controls the width of the cone. We let the pixel with highest temperature be the peak of the cone, so we only need to adjust $A$ and $C$ to fit the image. Guo~\cite{guo2011simple}offer a fast way to get the fitting Gaussian function. In our testing, it will be about $0.5^\circ C$ root-mean-square error, and only needs $5$ bytes to store the position of peak and two parameters.
This section presents the proposed method to outcome a data array than have less size compare to jpeg image when we can tolerate some error of data. We use the image captured by FLIR ONE PRO. In a thermal image, the temperature variation between nearby pixels are very small except the edge of objects. Hence, we can separate an image into several regions, and the pixels in a same region has similar value so we can use the average value to represent it and do not cause too much error. However, precisely separate an image into some polygon region takes a lot of computation time and hard to describe the edge of each region. Also, decide the number of region also a problem. Hence, to effectively describe regions we design that every region most be a rectangle, and every region can only separate into 4 regions by cut in half at the middle of horizontal and vertical. The image will start from only contains one region, and 3 regions will be added per round since we cut a region into 4 pieces.
Our method is shown in Figure~\ref{fig:SystemArchitecture}. Data structure initialize only need to do once if the size of image doesn't change. A thermal image will be loaded into our data structure and separate into several regions. Finally, output data will be encoded by Huffman coding, and transmit to database. When users want to use the image, they can get the encoded data from database.
@ -46,7 +55,7 @@ To help us choose which region to be separated, we give every region a score, an
\end{tabular}
\end{center}
By the equation shows above, we just need to know the sum of squared and the mean of a region, we can get its score. we can use a segment tree to store all possible regions and its scores. For each node, it store the range on both width and height it covered, sum $\sum\limits_{X\in R} X$, and squared sum $\sum\limits_{X\in R} X^2$ of pixels in the region. By the property of segment tree, tree root start from $0$, and each node $X_i$ has four child $X_{i\times4+1}$, $X_{i\times4+2}$, $X_{i\times4+3}$ and $X_{i\times4+4}$. Hence, we only need to allocate an large array and recursively process all nodes form root. Algorithm~\ref{code:SegmentTreePreprocess} shows how we generate the tree.
By the equation shows above, we just need to know the sum of squared and the mean of a region, we can get its score. we can use a 4-dimension segment tree to store all possible regions and its scores. For each node, it store the range on both width and height it covered, sum $\sum\limits_{X\in R} X$, and squared sum $\sum\limits_{X\in R} X^2$ of pixels in the region. By the property of segment tree, tree root start from $0$, and each node $X_i$ has four child $X_{i\times4+1}$, $X_{i\times4+2}$, $X_{i\times4+3}$ and $X_{i\times4+4}$. Hence, we only need to allocate a large array and recursively process all nodes form root. Algorithm~\ref{code:SegmentTreePreprocess} shows how we generate the tree.
\begin{algorithm*}[h]
\caption{Segment Tree Preprocess}
@ -91,4 +100,4 @@ For region selection, we use a priority queue to retrieve the region of consider
\end{algorithmic}
\end{algorithm*}
The complexity of our algorithm can be separated into 3 parts. First part is to initialize the segment tree. The size of segment is depends on the size of the image. If the number of pixels is $N$, the height of segment tree is $O(Nlog(N))$, and the number of nodes will be $O(N)$. The time complexity of initialize is $O(N)$. Second part is loading the image. It will need to traverse whole tree from leaf to root. Since segment tree can be store in an array, it also takes $O(N)$ time to load the image. Third part is to separate regions. For each round, we pop an element from heap and push four elements into heap. If we have separated image $K$ times, the size of heap will be $3K+1$. Time complexity of pop and push will be $O(log(K))$, and do it $5K$ times will be $O(Klog(K))$.
The complexity of our algorithm can be separated into 3 parts. First part is to initialize the segment tree. The size of segment is depends on the size of the image. If the number of pixels is $N$, the height of segment tree is $O(Nlog(N))$, and the number of nodes will be $O(N)$. The time complexity of initialize is $O(N)$. Second part is loading the image. It will need to traverse whole tree from leaf to root. Since segment tree can be stored in an array, it also takes $O(N)$ time to load the image. Third part is to separate regions. For each round, we pop an element from heap and push four elements into heap. If we have separated image $K$ times, the size of heap will be $3K+1$. Time complexity of pop and push will be $O(log(K))$, and do it $5K$ times will be $O(Klog(K))$.
To evaluate the effectiveness of the proposed method, we do the different ratios of compressing on a thermal image by our method compare to JPEG image using different quality and png image, a lossless bit map image. We set the camera at the ceiling and view direction is perpendicular to the ground, and the image size is $480\times640$ pixels. The JPEG image is generated by OpenCV $3.3.0$, and image quality from $1$ to $99$.
To evaluate the effectiveness of the proposed method, we do the different ratios of compressing on a thermal image by our method compared to JPEG image using different quality and png image, a lossless bit map image. We set the camera at the ceiling and view direction is perpendicular to the ground, and the image size is $480\times640$ pixels. The JPEG image is generated by OpenCV $3.3.0$, and image quality from $1$ to $99$.
Figure~\ref{fig:4KMy} and Figure~\ref{fig:4KJpeg} show the different of JPEG and our method. JPEG image id generated by image quality level $3$, and image of our method does $1390$ rounds of separate and compressed by Huffman Coding. In this case, Huffman Coding can reduce $39\%$ of our image size.
@ -19,7 +19,7 @@ Figure~\ref{fig:4KMy} and Figure~\ref{fig:4KJpeg} show the different of JPEG and
\label{fig:4KJpeg}
\end{figure}
Figure~\ref{fig:compareToJpeg} shows that the size of file can reduce more than $50\%$ compare to JPEG image when both have $0.5\%(0.18^\circ C)$ of root-mean-square error. Our method has $82\%$ less error rate when both size are $4KB$ image. The percentage of file size is compare to PNG image.
Figure~\ref{fig:compareToJpeg} shows that the size of file can reduce more than $50\%$ compared to JPEG image when both have $0.5\%(0.18^\circ C)$ of root-mean-square error. Our method has $82\%$ less error rate when both size are $4KB$ image. The percentage of file size is compared to PNG image.
\begin{figure}[ht]
\centering
@ -32,7 +32,7 @@ The computing time of a $480 \times 640$ image on Raspberry Pi 3 is:
\subsubsection{Date Structure Initialize}
0.233997 second.
\subsubsection{Image Loading}
1.364710 second.
1.268126 second.
\subsubsection{Region Separation}
About 4.6 microsecond per separation. Figure~\ref{fig:computeTime}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{4}{\ignorespaces Region separate by CFG}}{7}}
\newlabel{fig:SeparateImage}{{4}{7}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{3}{\ignorespaces System Architecture}}{7}}
\newlabel{fig:SystemArchitecture}{{3}{7}}
\@writefile{toc}{\defcounter{refsection}{0}\relax}\@writefile{toc}{\contentsline{subsection}{\numberline{\unhbox\voidb@x \hbox{III-B}}Data Structure and Region Selection Algorithm}{7}}
\@writefile{loa}{\defcounter{refsection}{0}\relax}\@writefile{loa}{\contentsline{algorithm}{\numberline{1}{\ignorespaces Segment Tree Preprocess}}{8}}
\newlabel{code:SegmentTreePreprocess}{{1}{8}}
\@writefile{loa}{\defcounter{refsection}{0}\relax}\@writefile{loa}{\contentsline{algorithm}{\numberline{2}{\ignorespaces Region Selection}}{8}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{5}{\ignorespaces 4KB Image by Proposed Method}}{10}}
\newlabel{fig:4KMy}{{5}{10}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{6}{\ignorespaces 4KB Image by JPEG}}{11}}
\newlabel{fig:4KJpeg}{{6}{11}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{7}{\ignorespaces Proposed method and JPEG comparing}}{12}}
\newlabel{fig:compareToJpeg}{{7}{12}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{8}{\ignorespaces Computation Time of Separate Regions}}{12}}
\newlabel{fig:computeTime}{{8}{12}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{6}{\ignorespaces 4KB Image by Proposed Method}}{10}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{8}{\ignorespaces Proposed method and JPEG comparing}}{12}}
\newlabel{fig:compareToJpeg}{{8}{12}}
\@writefile{lof}{\defcounter{refsection}{0}\relax}\@writefile{lof}{\contentsline{figure}{\numberline{9}{\ignorespaces Computation Time of Separate Regions}}{12}}