A survey report on software watermarking technique with graph coloring approach

by admin on April 9, 2011

A survey report on software watermarking technique with graph coloring approach

Abstract In the recent times the main aspect of any service provider or any product provider is the security. Everybody is very much conscious about the security of contents because in this internet savvy world, hacking information is nowadays just a matter of joke to the professional hackers. Nowadays digitization of information is a very important because digital can be used very efficiently and it can be sent through internet in a few seconds to anywhere of this world. So, there is a need to protect the digital content. Software watermarking is a unique process that gives a digital content protection from the hackers. Several processes are invented to make a digital content software watermarked. This survey report covers the prestigious work that has been done on software watermarking using graph-coloring approach.
1. Introduction This article surveys the software watermarking [1] technique with graph coloring approach. The primary object of this software watermarking is to prevent any digital content from being tampered. The main advantage of this software watermarking approach is, ·It prove the ownership of the content ·It make the content an anti-pirated By the software watermarking technique we can prevent the pirates using reverse algorithm and de-compilation technique to steal software. Intellectual property protection or IPP [7] is basically a theoretical framework to evaluate watermarking techniques. The main aspects of any efficient watermarking technique are credibility and overhead. Software Watermarking  [1] is basically a way of hiding the digital information into a digital content (Cover Text) to prevent piracy. In this approach watermark is a special data structure and cover text is a software program. The hidden digital information is invisible to the user. It is of two types, ·Static ·Dynamic There are two main types of approach in software watermarking, · Frequency domain approach · Path Based ·Graph-Based In graph theory’s term, graph coloring [6] is a special case of “graph labeling”. Graph coloring is of several types, Vertex Coloring is to color a graph in such a way that no two adjacent vertices share the same color. Edge Coloring is to Color a graph in such a way that no two adjacent edges share the same color. Face Coloring is to Color a planer graph in such a way that no two faces that share a boundary have the same color. The chromatic polynomial counts the number of ways a graph can be colored using no more than a given number of colors. The smallest number of colors needed to color a graph G is called its chromatic number, ?(G). Graph coloring is computationally hard. It is NP-complete to decide if a given graph admits a k-coloring for given k except for the cases k = 1 and k = 2. Especially, it is NP-hard to compute the chromatic number. Graph coloring remains NP-complete even on planer graphs of degree at most 4.The graph coloring problem is, Given an Undirected Graph G=(V , E) where V= Set of  Vertices and E=Set of Edges , it is required to find out an assignment of colors to vertices, such that no two vertices which are connected by an edge would  get the same color. The main idea behind the solution is, once a vertex is assigned a color then all the vertices which are connected to that are refrained from using the same color. This basic idea is used in software watermarking [7] approaches. Whenever the system get an 1(of the main content) color it and when in case of 0 color it with different color or color the MISes with different but minimum color to make it a distinguished one. The approaches of watermarking techniques for graph coloring problem both credibility and overhead are maintained properly. 2. Software Watermarking approaches A static algorithm [3] recognizes the watermark by examining the (source or compiled) code of the watermarked program. A dynamic algorithm recognizes the watermark by examining the state of the program after it has been executed with a special finite input sequence (i1,………in). Thus, for a dynamic algorithm the recognizer will have the signature R(Pw[i1,………in])) -> W where P[I] is the state of program P after input I. Mr. Stern, Hachez, Koeune and Quisquater [2] invented an algorithm for frequency domain software watermarking approach , this approach is called SHKQ algorithm [8]. The basic performance of SHQK algorithm stated below, 1. Watermarks can be embedded by re-ordering parts of the code [9], where such re-ordering can be shown to be semantics preserving. 2. Watermarks can be embedded by inserting new (non-functional) code in the program, such that this code encodes a watermark number. The three algorithms [2] Vector Extraction, Watermark Embedding, and Watermark Extraction below formalize these steps. Vector Extraction: 1. Define n as a security parameter. 2. Define a vector S= (s1,………..sn) of n ordered groups of machine language instructions. 3. For each group i in S, count the frequency c of the group in the code, and form the vector c = (c1,……….cn). Return c. Watermark Embedding: 1. Apply the vector extraction step to obtain a vector c of length n. 2. Choose an n-coordinate vector w= (w1,……………..wn) whose coefficients are randomly distributed following a normal law with standard deviation ? 3. Modify the code in such a way that the new extracted vector c?= c + w Watermark Extraction 1. Set a detection threshold ?, (0 < ? < 1). 2. Apply the vector extraction step to obtain a vector c from the original (non watermarked) application. 3. Apply the extraction step to obtain a vector d from the watermarked application. 4. Compute a similarity measure Q between d-c and w. 5. If Q is higher than then the algorithm outputs marked, else it outputs unmarked. C. Collberg, E. Carter, S. Debray, A. Huntwork, C. Linn, M. Stepp [5] discussed path-based watermarking where the basic idea is to embed the mark in the branching structure of the program. There are some consequences of this approach, First, the forward branches that a program takes are an essential part of what makes the program unique. This makes branches inherently difficult to change or remove, making path-based watermarks resilient to many distortive attacks. Second, branches are inherently binary in nature (they are either taken or not taken), which makes it easy to embed a bit-string. Third, path-based watermarking lends itself well to error-correction and tamper proofing. Fourth, branches are ubiquitous in real programs, hopefully making path-based marks insusceptible to statistical attacks. Two basic algorithms [3] have been discussed below for graph-based approach, Venkatesan et. al. [10] discussed The Graph-Theoretic Watermark (GTW) algorithm whose main objective is to embed a watermark by inserting a code segment which has no semantic effect Fig.1: The GTW algorithm. Dashed control flow edges connect the watermark graph G with the cover control flow graph P. The watermark nodes of Pw are marked to distinguish them from the nodes of P The basic embedding technique is stated below, 1. Encode the integer watermark value W as a control-flow graph (CFG) G?e(W). 2. Construct the watermarked program Pw by combining G and the program CFG P. 3. Connect G and P by adding bogus control-flow edges. 4. Mark the nodes (basic blocks) of G. This process is illustrated in Figure 1. In order for the watermark to be stealthy G and P should be tightly integrated. Integration is achieved by performing a random walk over the nodes of G and P and adding edges until Pw is uniformly dense. The watermark is extracted from Pw as follows: 1. Make the nodes of the watermark graph G is the set of marked blocks from 2. Make the edges of G be the set of control-flow edges between marked blocks in . 3. Compute the watermark using the decoder d (G). Collberg and Thomborson [11] discussed the algorithm that embeds the watermark in a dynamically built, linked, graph structure. Fig. 2: The Collberg-Thomborson watermarking algorithm 1. At A, the watermark number is encoded into a graph G. 2. At B, G is split into a number of sub graphs. 3. At C, each sub graph is converted into a code sequence that builds the graph. Pw is constructed by inserting the graph-building statements along a special execution path  <P1,P2,P3,P4> of the program. 4. At D, the recognition sequence is started by executing with a special input sequence. This causes the program to follow the execution path Pw and the watermark graph G to be built on the heap. 5. At E, G is extracted from the heap and decoded into W. In graph based approach two main techniques are discussed below, A] Graph theoretic approach B] Graph coloring approach Christian Collberg, Andrew Huntwork, Edward Carter, Gregg Townsend [4] discussed this approach to achieve a technique for software watermarking; in this approach the GTW algorithm has been used. The GTW embedding algorithm takes as input application code P, watermark code W, secret keys w1 and w2 , and integers m and n. GTWSM uses a smaller and simpler set of parameters. Values of m and n are inferred from P, W, and w1. The clustering step is unkeyed, so w1 is unused. Thus, our implementation takes as input application code P, a secret key w, and a watermark value. 1.The watermark value v is split into k values, { v1,………, vk-1} 2.The split values are encoded as directed graphs { G1,………, Gk-1} 3.The generated graphs are converted into CFGs  {W1,………, Wk-1} by generating executable code for each basic block. 4.The applications clusters are identified 5.The watermark is merged with the application by adding control-flowed edges to the graphs. 6.Each basic block is marked to indicate whether it is part of the watermark. Embedding: In GTWSM we accept an integer value for transformation into a watermark CFG. The recognition process performs the inverse transformation from CFG to integer. First, the embedding process involves several steps; secondly, splitting the watermark value into small integers; thirdly, constructing directed graphs that encode these values; generating code that corresponds to the graphs; and connecting the code to the program. Watermark Value Splitting: GTWSM splits a watermark value v into a multiset S of k integers, k >? 2. Empirically, we have determined that values of k between 5 and 15 produce watermark methods that are neither overly large nor overly numerous. A watermark value v is split as follows: 1. Compute the minimum exponent l such that v can be represented using k-1digits of base 2l. 2. Split the value v into digits (v0, v1,……,vk-2) such that 0 <? vj < 2lv = ?k-2j=02jlvj. and 3. Encode the digits in the multi set {s0, s1,……,, sk-1} where s0 = l-1si = si-1 + vi-1. and Encoding Integers as Graphs: Each integer is converted into a graph for embedding in the application. Several issues must be considered when choosing a graph encoding: 2. The graph must have the structure of a valid CFG. It should have a header node within-degree zero and out degree one from which every node is reachable, and it should have a footer node without-degree zero that is reachable from every node. 3. The graph should have a maximum out-degree of two. 4. The graph should be reducible [12] because true Java code produces only reducible graphs. 5.The control structures represented by the graph should not be deeply nested, because real programs seldom nest deeply. Mr Gang Qu and Mr. Miodrag Potkonjak [7] analyzed ‘adding edge’ watermarking techniques for graph coloring problem. Suppose a random graph G(V,E) has been given. A randomly selected message MV is the vertex set of graph G where V = {v0, v1,…….., vn-1}. Convert the message M into binary (using ASCII) M=m0m1………….. for embedding. Now, the message M have to be embed into graph G by the following manner, lCopy the graph G(V,E) to G'(V,E’). lFor each bit mi , take two nearest bit of vi that are not connected, i.e. vi1 and vi2 in such a manner that  i2 > i1 > i (mod n) and (vi,vi2),(vi,vi1) Ï E and (vi,vj) ÎE for all i < j < i1 , i1 < j < i2 (mod n). lIf mi=0 then add edge (vi,vi1) else if mi=1 then add edge (vi,vi2) Those edges are extra edges and connected vertex has to be colored by different colors that may not be essential at all in the original graph. Mr Gang Qu and Mr. Miodrag Potkonjak [8] demonstrate technique to recover the hiding signatures from the colored graph, which is basically watermarked content. The approach for recovery from the colored graph, which is embedded by the “adding edge” technique, is as follows, lAfter embedding the message to the original graph several new edges have been created and the vertex pair may be colored by different colors. lFor each pair {vi, vj} we can receive one bit of information by counting how many nodes are in between (i.e. Nodes with indices between i and j) vi and vj are not connected with vi. lIf the difference is 0 then the hidden bit is 0 and if the difference is 1 then the hidden is 1. Otherwise if the difference is more than 1 then reverse the order of vi and vj and try the above steps. lBy this single binary information the whole binary string can be made. That string is basically the hidden message. Mr Gang Qu and Mr. Miodrag Potkonjak [7] analyzed ‘selecting MIS’ watermarking techniques for graph coloring problem. MIS means Maximal independent set. MIS of a graph means it is a subset of vertex set S such that the vertex in MIS are not connected and those are not in S is connected with at least one vertex of the graph. Main advantage of this technique is all the vertex of one MIS be labeled by one single color. Suppose a random graph G(V,E) has been given. A randomly selected message M for embedding. V is the vertex set of graph G where V={v0, v1…………, vn-1}.Convert the message M into binary (using ASCII) M=m0m1………….. Now, the message M have to be embed into graph G by the following manner, lThe idea of selecting no of MISes according to M. lTake vi as the first vertex of the MIS where the binary exponent of ilog2 n bits of M. coincides the first lCut the vi and it’s neighbors as they can’t be in same MIS as vi. lReorder the vertex and get the next MIS. lColor each MIS with separate colors. lConstruct the MIS till the M has not been completely embedded. Mr Gang Qu and Mr. Miodrag Potkonjak [8] demonstrate technique to recover the hiding signatures from the colored graph, which is basically watermarked content. The technique to recover the hidden signature from the colored graph embedded by “selecting MIS” approach is as follows, In this approach several number of MISes are generated. The selected MIS with a particular order of vertex represent the watermark. Binary string can be retrieved from that MIS by reconstruction. lSuppose the graph contain 11 vertex and 3 MISes have been generated colored by 3 different colors.The binary string needed is 11111. lTake a certain MIS like {v7, v4, v1, v10 }. lThe first vertex is v7. As (7)10=(111)2, then the binary string we retrieved is 111. Now remove v7 and it’s neighbors from the original graph. lThe second vertex in the MIS is v4 and the binary string we retrieved is 11. lRemove v4 and its neighbors from the original graph. Only v1 and v10 are remained lonely. lSo, desired result 11111 has been retrieved. The above technique shows the uniqueness of the selected MIS in determining the credibility. Mr Gang Qu and Mr. Miodrag Potkonjak [8] analyzed one more technique ‘adding edge and nodes’ along with previous two for hiding signature in a digital content. Suppose a random graph G(V,E) has been given. A randomly selected message M for embedding. V is the vertex set of graph G where V={v0, v1…………, vn-1}.Convert the message M into binary (using ASCII) M=m0m1………….. Now, the message M have to be embed into graph G by the following manner, lA new node v has been introduced. lTake the first ëlog nû bits from M. lFind the corresponding vertex v’ and connect it to v. lTake the next ëlog n-1û bits and locate the next vertex to which v is connected. lRepeat the previous step until np edges have been added and a new graph Gn+1, p have been generated. lIntroduce a new node until M has been totally embedded. lColor the new graph. Mr Gang Qu and Mr. Miodrag Potkonjak [8] demonstrate the approach in behind to recover the hidden signature from a embedded, colored graph generated by “adding edges and nodes” technique is as follows, as by this technique a new node and it’s associate edges has been created and they remain invisible. To exhibit the hidden signature we have to go through the signature embedding technique once again and encrypted signatures can be added to the colored graph as edges to the newly inserted vertex. Different binary strings can be generated in the same way from the same colored graph. So to get a valid result a comparison in between every result is necessary. Mr Gang Qu and Mr. Miodrag Potkonjak [7] discussed some technical aspects of  “adding edge” approach, The signature or message that have to embed can be anything but that should capable of protecting the ownership. Encrypt the message, which is converted to binary using cryptographic hash function or stream cipher. Assume that the final binary bit stream is random. A basic assumption on this approach, lTo color Gn, p, we need exactly X color where X is given by, X (Gn, p) = én/(2logbn) ù …(i) lAfter embedding k bits into the main graph Gn,p there are extra k edges added to the watermarked content, the resulting graph remains random with the same number of vertex and a new edge probability, p’ = p + (2k/n(n-1)) So, formula (i) for the chromatic number still holds and this number is denoted by X’. The overhead is defined by the X’-X (i.e. total no of extra color need to color the watermarked graph). Some theorem have been deduced on contrary of the technical concept of the above approach, Theorem 1.1: Adding k (n) edges to a random graph Gn,p , limn->µ X’-X=µ if and only if k (n) Î ?( n log n) Corollary 1.2: (1 color overhead) Adding k(n) edges to graph Gn,p , if limn->µ (k(n)/n log n) = l then limn->µ X’-X <= 1+él/(1-p)ù .  In particular k (n) Î o( n log n). The overhead is at most 1. Theorem 1.3: Adding k(n) edges to a random graph Gn,p, let & be the event that these edges are added randomly, then limn->µ Prob[?]=0 if k (n) Î ?( n/ log n). Mr Gang Qu and Mr. Miodrag Potkonjak [7] discussed some technical aspects of “selecting MIS” approach, Whenever a MIS has been removed from the original graph, the graph became a random one again with the same edge probability. ·To generate a random graph Gn+1,p , add one new vertex into a random graph Gn,p and add an new edge in between the new vertex and the old vertex in Gn,p with probability P. ·The first vertex of any MIS have been selected randomly and the next vertex’s choices are restricted to (1-p)n=qn as pn neighbors of the first vertex have been eliminated. Therefore following theorems have been deduced, Lemma 2.1: Given random graph Gn,p , almost all randomly selected MIS is of size logb n, where b = 1/(1-p). To create a convincible watermarking a large graph, we have to add w(n/log n) edges by the first technique. Theorem 2.2: Given a random graph Gn,p, Let ? be the event that in a random solution, all vertices in this MIS have the same color. By selecting one vertex from an n-vertex graph, we can embed ëlog2 nû bits. From lemma 2.1, at most log2 n log b n bits of information could be embedded into the MIS. To embed long messages, we have to construct more MISes, which may result in huge overhead. Theorem 2.3: Given a random graph Gn,p , if we select k(n) MISes ,assign each MIS one color and color the rest of the graph, then the overhead is at most k(n) and k(n/2) on average at least. Mr Gang Qu and Mr. Miodrag Potkonjak [7] discussed some technical aspects of “adding edges and nodes” 8 approach the technical aspects are, · k new edges have been added into the initial graph Gn,p and transformed into Gn+k,p. · The formula for implying overhead is, X`-X= é(n+k)/2logb(n+k)ù. – én/2logbnù where b=1/1-p · The watermarked potential for the graph Gn,p as WP(Gn,p) = X(Gn,p) – n/(2logbn) The theorems deduced for this technique’s aspect are, Theorem 3.1: Given a random graph Gn,p, we introduce k (n) new vertex and associate edges based on the signature, then for almost all Gn,p , the overhead is at most 1 if k(n) Î o(log n). Theorem 3.2: We build graph Gn+1,p from a given random graph Gn,pnp edges. A coloring scheme to the initial Gn,p is obtained by coloring by introducing one new vertex and Gn+1,p. Let ? be the event: ·Add a vertex v to the colored graph Gn,p . ·Connect v to np random vertex. 3. Conclusion Two most important criteria of any watermarked content is low overhead and high credibility. Previously discussed three techniques have been fully analyzed and we can conclude that, those techniques are totally capable to provide high credibility with at most 1 color overhead. For large graphs (i.e. Random graphs, DIMACS challenge graph, Graphs generated from real life problems). Numerical data from simulation have been analyzed and confirms the result. Fingerprinting is one of the most efficient method to provide security to any content. The technique  can be implemented in NP complete GC(graph coloring)problem. Fingerprinting in the random graphs introduces overhead while for graphs generated from real life register problem. In the QP algorithm which is basically a algorithm for constraint based watermarking algorithm, the graph coloring problem can be used to embed a watermark (edge adding) in the registry allocation. 5. References [1]  Mohanty S. P. . “Digital Watermarking : A tutorial View “ .A technical report. Indian Institute of Science,1999 [2]  Sahoo T. R., Collberg C. . “Software Watermarking in the Frequency Domain: Implementation, Analysis, and Attacks”. Technical report TR04-07, Dept.of computer science, University of Arizona, March, 2004 [3]  Collberg C., Kobourov S. , Carter E. .“Error-Correcting Graphs for Software Watermarking” . In workshop on graph theoretic concepts in computer science. 2003 [4]   Collberg C., Huntwork A. , Carter E., Townsend G. . ”Graph Theoretic Software Watermarks: Implementation, Analysis, and Attacks” . In 6th International Information Hiding Workshop, March, 2004 [5]   Collberg C. , Carter E. , Devroy S. , Huntwork A. , Linn C., Stepp. M. “Dynamic Path Based Software Watermarking”. In proceedings of the conference on programming language and implementation, 2004 [6] http://en.wikipedia.org/wiki/Graph_coloring [7] Qu G. and Potkonjak M. . “Analysis of Watermarking Techniques for Graph Coloring Problem”. In IEEE/ACM International Conference of Computer aided design, Pages 190-193, November, 1998. [8] Stern J. P., Hachez G., Koeune F., and Quisquater J.J.. Robust object watermarking: Application to code. In Information Hiding, pages368-378,1999. http://citeseer.nj.nec.com/stern00robust.html. [9] Davidson R.L. and  Myhrvold N. Method and system for generating and auditing a signature for a computer program. US Patent 5,559,884, Assignee: Microsoft Corporation,1996. http://www.delphion.com/details?pn=US05559884__ [10] Venkatesan R., Vazirani V. , and Sinha   S. A graph theoretic approach to software watermarking. In 4th International Information Hiding Workshop, Pittsburgh, PA, Apr. 2001. [11] Collberg C. and Thomborson C.. Software watermarking: Models and dynamic embeddings. In Principles of Programming Languages 1999, POPL99, San Antonio, TX, Jan. 1999. [12] Aho V.A., Sethi R., and Ullman D.J. Compilers,Priciples,Techniques and tools. Addison-Wisely,1986.ISBN0-201-10088-6.

graphing software

Previous post:

Next post: