Optimal affine image normalization approach for optical character recognition
I.A. Konovalenko1-2, V.V. Kokhan h2, D.P. Nikolaev12 1 Institute for Information Transmission Problems RAS, 127051, Moscow, Russia, Bolshoy Karetny per. 19, bld. 1, 2 Smart Engines, 117312, Moscow, Russia, pr-t 60-letiya Oktyabrya, 9
Abstract
Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. a geometric transformation resulting in an image as if it was captured at an angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be projective. Usually, the camera optical axis is approximately perpendicular to the document surface, so the projective normalization can be replaced with an affine one without a significant loss of accuracy. An affine image transformation is performed significantly faster than a projective normalization, which is important for OCR on mobile devices. In this work, we propose a fast approach for image normalization. It utilizes an affine normalization instead of a projective one if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to a problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of the affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization procedure to be further accelerated.
Keywords: optical character recognition, image registration, image normalization, coordinate discrepancy, projective transformation, affine transformation, approximation, optimization, symbolic computation.
Citation: Konovalenko IA, Kokhan VV, Nikolaev DP. Optimal affine image normalization approach for optical character recognition. Computer Optics 2021; 45(1): 90-100. DOI: 10.18287/2412-6179-CO-759.
Acknowledgments: This work was partially financially supported by the Russian Foundation for Basic Research, projects 18-29-26035 and 17-29-03370.
Introduction
Projective image normalization
Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. geometric transformation resulting in an image as if it was captured from the angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be pro-jective. The latter is commonly employed as a part of image preprocessing for various computer vision tasks, such as document OCR [1, 2, 3, 4, 5], vehicle license plate recognition [6], TV-stream recognition based on a picture of a TV screen [7], chessboard recognition [8], artificial on-road obstacles detection [9], object detection using shape features (detection of the shape of an object within an image and matching that shape with an object from database) [10, 11, 12, 13, 14, 15], surface parameters monitored from satellites (time-temporal variability of sea surface temperature, determining the velocity of the cloud masses motion, etc.) [16], reconstruction of plans and maps from the aerial photographs [17, 18], and many more. In addition, the projective normalization of photographs of documents helps human perception [19].
Affine approximation of projective normalization
Usually, the camera optical axis is approximately perpendicular to the document surface. In such cases, a projection model of the affine camera can be utilized [20], and a projective normalization can be replaced with a commonly used affine normalization without significant loss of accuracy [21, 22]. The affine image transformation is performed significantly faster than the projective normalization [22, 23], which is helpful for fast image normalization. The latter is important for the OCR on mobile devices [24].
The idea of the replacement of the projective transformation with the affine one in practice was mentioned in [25] back in 1985. This property was implemented in [26] for the simplification of the mathematical calculations. The affine approximation is commonly used in image completion [27] and rendering [23, 28, 29]. In [30] the projective transformation is replaced with the simpler affine transformation in order to avoid overfitting. A similar idea is utilized in «weak-perspective projection» [31, 32, 33], where the approximation is partial. Use of the affine invariant methods instead of significantly more complicated projective invariant methods is common in keypoints technology [34, 35, 36], as well as in the related
problem of salient region detection [37], and both of these methods are essentially camera angle invariant. There is also a division into affine and projective methods in stereo reconstruction [38]. The utilization of affine transformation for image rendering and normalization results in loss of accuracy [22, 39], but the accuracy was not formally introduced.
Affine approximation of the given projective normalization aiming to accelerate the latter is considered for the first time in this work.
Definitions and notation
Let /input be an input image (usually a photograph) for the normalization. Let its known projective normalization be a perfect normalization H. Let an image formed as the result of the application of H to /input be a projectively normalized image /proj (see Fig. 1). An arbitrary affine approximation of the projective normalization H is denoted as A: A = H . Thus A is the affine normalization of the image /input. The resulting image /fn of A applied to /input is an affinely normalized image.
L
Fig. 1. The general scheme of transformations, where Input is an image of the document captured from an arbitrary angle, Iproj is a projectively normalized image, Iaffm is an affinely normalized image, and the result of the OCR
def
Let r = [x y]T be Cartesian coordinates of pixels on
the plane of Iproj. We define the residual projective distortion as
def
V = AH1.
(1)
which for each point of the scene transforms coordinates r of its image on Iproj into coordinates V(r) of its image on laffin.
Ideally, the residual distortion V is an identical transformation. For the formalization of pointwise error of affine normalization we define the coordinate discrepancies [40] (see Fig. 2) as
def
d(r )= h r - V(r )
(2)
In some cases, it is possible to evaluate beforehand which part of the projectively normalized image /proj is of interest. Such region of interest (ROI) is denoted as R c M2. Otherwise, R denotes the entire /
proj
a)
Pixels 0
200
400
600
800
1000 1200 1400 Pixels
0 200 400 600 800 1000
b)
1200 1400 Pixels
Fig. 2. The coordinate discrepancies. a) the affinely normalized image Iaffm; black frames indicate the ideal positions of the text fields; b) a shift vector field V(r)-r, reR; the shades of grey
illustrate the square root of coordinate discrepancies >/d(r)
1. Root mean square criterion of normalization accuracy
As a criterion of normalization accuracy, we choose the widely used criterion of root mean square (RMS) coordinate discrepancies. In cases of ROI with finite nonzero area 0 < S(R) < ro and non-empty finite ROI 0 < |R| < ro the criterion is defined as follows:
def
¿2(d; R ) =
VSR Id2(r)dr for0< S(R) < x
'— X d2(r) for 0 <| R |< x.
I R 1 reR
(3)
Such criterion was used, for example, for the automatic normalization of distortion caused by lens distortion and camera movement [41]. The same criterion was also employed for the calculation of the accuracy of the aligned image formation via projectors matrix [42]. Using definitions (1) and (2) we establish the dependence of criterion on the affine transformation A:
def
L2 (A, H; R) = L2 (d; R), d(r) =11 r - AH-1 (r)
(4)
2. Optimal affine image normalization
2.1. Problem _ formulation
Now, as the criterion of normalization accuracy is set, we can formulate a problem of search for the optimal affine approximation of the projective normalization H:
A* = argmin L2(A,H; R).
(5)
We will also refer to it as the optimal affine normalization. The correspondent optimum is denoted as
L*2 = minL2(A,H;R) = L2(A*,H;R).
(6)
The projective normalization H is parametrized by the
def
homography matrix H = (hij ) e M3x3 :
def
r = H(r' =
hiix' + hj2 y' + hn
h2iX + h22 y' + h23
hix ' + h32 y ' + h33
where r' = [x' y']T are Cartesian coordinates of pixels on Input image surface. Let an inverse transformation for H
def
transformation be P = H_1
def
matrix P = (pj ) e K3x3 :
and we parametrize it by
def
r' = P(r) =
pilX + pi2 y + pi3
p2iX + p22 y + p23
(7)
p3iX + p32 y + p33
then P ~H 1. Because matrices P and H are homogenous we assume
P = H-1. (8)
The affine transformation A is parametrized by matrix
A = (aij ) e M2x3 :
def
A(r) = A [xy i]3
(9)
Thus, problem (5) of the optimal transformation search can be formulated as the problem of optimal
matrix search
A* = argmin
R r_ A
s
r - A
P(r) i
P(r) i
dr for 0<S(R)<œ,
(i0)
for 0<|R |< œ.
Earlier in [43] we proved that this problem is convex.
2.2. The applicability limits Consider function
Z (r) = p3iX + p32 y + p33.
(ii)
The line Z(r) = 0 on /proj image surface is denoted as the horizon. Let us consider ROI R which does not lie strictly on
one part outlined by the horizon. Points on the horizon turn the denominator of the transformation (7) into zero, which corresponds to them being infinitely remote on the input image -Input plane. Hence these points cannot be present in 7mput image because of its finite size. In reality, these points of a scene are situated in the tc/2 angle of camera view. Points that belong to the different sides of the horizon cannot be simultaneously present in /mput image, because points that belong to one of these sides are situated in >rc / 2 angle of camera view, i.e. located behind the camera. Thus, at least a part of the ROI is absent in the input /mput image. In this case, the RMS criterion of accuracy (4) is meaningless. Hence we will consider only cases when the ROI lies strictly on one of the sides outlined by the horizon:
Z (r e R) < 0, Z (r e R) > 0.
(i2)
This condition also guarantees the correctness of the RMS accuracy criterion definition (4).
2.3. RO/ of non-zero _ftntte area
Let us consider the ROI with the non-zero finite area, then from (10) follows
A* =argmin R
A J
r - A
P(r) i
dr.
(i3)
We will express the affine transformation matrix A as
def
the vector a = (at) e K6:
an
a\2
def
a (A) =
ai3
a2i
a22
a-
23
•A (a) =
a4 a5 a6
(i4)
Let us specify the transformation P through its components:
uei T
P=[Px Py ] , and introduce a matrix function Q:
def
Q =
Then
"P "
Px Py i
Px Py i
(i5)
(i6)
A
i
= Qa,
which allows the problem vectorization (i3) to be defined as follows:
def
A* = A (a*), a* = argmin R y r - Q(r)a ||22 dr. (i7)
a J
reR
Note that the target function of the problem (17) is quadratic:
J || r - Q(r)a || 2 dr = K{0} - 2K {1}a + aTK {2}a, (18)
where K!k} = Jf w(r)dr,
f!0} (r) = rT r,
def
where f!1}(r) = rT Q(r),
def
f !2}(r) = QT (r)Q(r).
(19)
(20)
We will refer to the coefficients K as the target coefficients. As was shown above, the target coefficients are defined by the homography matrix and the ROI:
K = K(H ,R).
(21)
If the target coefficients are calculated, the problem (17) can be presented as
a* =argmin(K!°! - 2K{1!a + aTK{2!a),
and can be solved analytically:
a* = (K!2} (K!1} )T.
(22)
Thus, the problem of the unconstrained normalization (13) is quadratic and can be reduced to the problem of the fractional quadratic functions integration over the ROI. Obviously, for an arbitrary ROI this integration can be performed only numerically.
2.4. Non-empty _ finite RO/
Similar reasoning can be suggested for the non-empty finite ROI R. In this case, according to (10):
K !k}= Xf W(r),
(23)
while the definition (20) of functions f and the expression (22) for analytical calculation of a* are preserved. Hence the RMS criterion (3) can be calculated as
L2 (A, H; R) = D-1/2 (K{0} - 2K{1}a + aTK{2}a), (24)
def is(R) for0 < S(R) < ro, where D = <
[|R| for 0 <| R |< ro.
Thus, in all considered cases (3) the optimal affine normalization is calculated according to the general Algorithm 1.
Notes on Step 1 regarding the calculation of the target coefficients K = K(H, R). The cases of the non-empty finite ROI and the ROI of non-zero finite areas are discussed above. Let us specify the corresponding Algorithms (2 and 3) for the target coefficients calculation:
Algorithm 1. Algorithm of the optimal affine image normalization search
Input:
• matrix He M3x3 of projective normalization H,
• ROI R c M2: 0 < S(R) < œ or 0 < |R| < œ. Output:
• matrixi"effi2x3 of optimal affine approximation H on R: (9),
• the optimal value of RMS accuracy criterion
L : (6).
Step 1. Based on H and R target coefficients are calculated K=K(H, R).
Step 2. a* is calculated: (22).
Step 3. A* = A (a*) is calculated: (14). Step 4. L*2 = L2(A*,H; R) is calculated: (24).
Algorithm 2. Calculation of the target coefficients _for the non-empty finite ROI_
Input:
• matrix He M3x3 of projective normalization H,
• non-empty finite ROI 0 < |R| < œ. Output: Target coefficients K=K(H, R). Step 1. Matrix P = p) is calculated: (8). Step 2. Px and Py are defined: (7), (15). Step 3. Function Q is defined: (16). Step 4. Functions f are defined: (20).
Step 5. Target coefficients K are calculated: (23).
Algorithm 3. Numerical estimation of the target coefficients for the ROI of non-zero finite area
Input:
• matrix He M3x3 of projective normalization H,
• ROI R c M2 of the non-zero finite area: 0 < S(R) < ro.
Output: Numerical estimation of the target
coefficients K=K(P, R).
Step 1. Matrix P = p) is calculated: (8).
Step 2. Px h Py are defined: (7), (15).
Step 3. Function Q is defined: (16).
Step 4. Functions f are defined: (20).
Step 5. Set {r- }n=i of uniformly distributed on R
points is generated. Step 6. Assignment R := {r^}n=1. Step 7. Target coefficients K are computationally evaluated: (23).
In order to get the conventional statistical estimation, result of (23) should be multiplied by S (R)/n at the final step of Algorithm 3. This multiplication is skipped intentionally, because on the one hand, it does not change the output of the Algorithm 1 (see expression (22)), and on the other hand, the accurate calculation of the area S (R) in special cases complicates the Algorithm 3, and generally might not be even possible.
Further we will analytically calculate the target coefficients K for some special cases of the ROI R with non-zero finite area.
2.5. Orthotropic rectangular ROI
In computer vision applications there is a particularly important case of the orthotropic rectangular ROI R:
R = [xi,x2]x [yi,y2], Xi < X2, yi < y2.
Let us introduce the antiderivatives
def .
F{k }(r) = Rf{k} (r) dr,
then by the Newton-Leibniz axiom, expressions (i9) can be written as
K{k } = F{k}( x, y)^2, (25)
def
where F ( x, y )Ç| % = F ( Xi, yi) + F ( X2, y2) -- F ( Xi, y2 ) - F ( X2, yi).
Now in order to calculate (i9) we have to find the antiderivatives F. Let us introduce the changes of variables:
ci =2pii p322 - 2pi2 p3i p32,
c2 =2pi2p32i - 2piip3ip32, c3 =2piip32p33 - 2pi3p3ip32 +
+ 2piip32p33 - 2pi2p3ip33, c4 = pii p32i,
c5 =2pi3p32i - 2piip3ip33, c6 =2piip323 - 2pi3p3ip33, c7 =2p2ip322 - 2p22p3ip32, c8 =2p22p32i - 2p2ip3ip32, c9 = 2p2ip32p33 - 2p23p3ip32 +
+ 2p2ip32p33 - 2p22p3ip33, 2
ci0 = p2ip3i ,
cii =2p23p32i - 2p2ip3ip33, ci2 =2p2ip323 - 2p23p3ip33,
ci3 =2pi2p32i - 2piip3ip32, ci4 =2piip322 - 2pi2p3ip32, ci5 =2pi2p3ip33 - 2pi3p32p3i +
+ 2pi2p3ip33 - 2piip32p33, 2
ci6 = pi2p32 ,
ci7 =2pi3p322 - 2pi2p32p33,
ci8 =2pi2p323 - 2pi3p32p33 , ci9 =2p22p32i - 2p2ip3ip32 , c20 =2p2ip322 - 2p22p3ip32 , c2i =2p22p3ip33 - 2p23p32p3i +
+ 2p22p3ip33 - 2p2ip32p33 , c22 = p22 p32 ,
c23 =2p23p322 - 2p22p32p33 , c24 =2p22p323 - 2p23p32p33 ,
c25 = pi2p3i - piip32, c26 = pi3p3i - piip33, c27 = piip3i , c28 = p22p3i - p2ip32 , c29 = p23p3i - p2ip33 , c30 = p2ip3i ,
c3i = pi2i p3i, (28)
c32 = pi2!p32i + pi2ip322 - 2piipi2p3ip32 , c33 =2pi2pi3p32i + 2pi2ip32p33 -
- 2piipi2p3ip33 - 2piipi3p3ip32, c34 = pi23p32i + piip323 - 2piipi3p3ip33 , c35 =2piipi2p3i - 2pi2ip32,
c36 =2piipi3p3i - 2pi2ip33,
c37 = p2i p3i,
c38 = p22p3i + p2ip32 -
- 2p2ip22p3ip32,
c39 =2p22p23p32i + 2p2ip32p33 - (29)
- 2p2ip22p3ip33 - 2p2ip23p3ip32 , c40 = p23p32i + p22ip323 - 2p2ip23p3ip33 , c4i =2p2ip22p3i - 2p22ip32 ,
c42 =2p2ip23p3i - 2p22ip33, c43 = pi2p22p32i - pi2p2ip3ip32 -
- piip22p3ip32 + piip2ip322 , c44 = pi2p23p32i - pi2p2ip3ip33 +
+ pi3p22p32i - pi3p2ip3ip32 -
- piip23p3ip32 + piip2ip32p33 -
- piip22p3ip33 + piip2ip32p33 ,
c45 = pi3p23p32i - pi3p2ip3ip33 - (30)
- piip23p3ip33 + piip2ip323 , c46 = piip22p3i - 2piip2ip32 +
+ pi2 p2i p3i, c47 = piip23p3i - 2piip2ip33 +
+ pi3 p2i p3i, c48 = piip2ip3i ,
Zc (X) = p3iX + p33,
The antiderivatives of rT Q(r) are
F{i} (r ) = R rT Q(r )dr =
= RR[xP*(r) xPy(r) x yPx(r) yPy(r) y]dxdy.
Zy (y) = p32y + p33, (3i)
l (r) = log Z (r ). (27) The antiderivative of rT r is
F{0} (r) = R (rTr) dr = x(x2 + y2)y. (32)
Fi{i}(r ) = -L (-^ ci (6( p332 y3 + Zx ( X)3 ) l (r ) + p32 y (-2 pi, y2 + 3 p32 yzx ( x) - 6 Zx ( x)2)) + y2 x^ +
2 p3i i8 p32 2
c l(r) + —-3-r(p32y(2zx(X) -p32y) -2(Zx(x)2 -p322y2)l(r) + yx2c4 + yxc5 + c6 p32y + Zx(x))--c6y),
4 p322 p32
F2{i} (r) = tV(ttV c7 (6( p332 y3 + Zx ( X)3 )l (r) + p32 y(-2 p322 y2 + 3 p32 yZx ( x) - 6 Zx ( x)2)) + y2 X^L +
2p33i i8p332 2
c l (r)
+ ~Tt(p32y(2Zx (X) - p32y) - 2(Zx (X)2 - p322y2)l(r)) + yx2ci0 + yxcn + ci2(p32y + Zx (x))--ci2y),
4 p322 p32
F3{i}(r) = ^,
F4{i}(r) = ^(^V ci3(6( p3i x3 + Zy ( y)3)l (r) + p3iX(-2 p32i x2 + 3 p3i XZy ( y) - 6 Zy ( y)2)) + x2 yZL +
2p332 i8p33i 2
ci5 l (r)
+ -JV(p3iX(2 Zy (y) - p3iX) - 2( Zy (y)2 - p32i X2)l (r)) + xy2ci6 + >^7 + c^( p3i X + Zy (y))--ci8 x),
4 p32
p3i c20
F5{i!(r)^-ir^-iTci9(6(p3iX3 + Zy (y)3)l(r) + p3iX(-2p32iX2 + 3p3iXZy (y) - 6Zy (y)2)) + X2+
2p332 i8p3i 2
c l(r)
+ -rLr(p3iX(2 Zy (y) - p3iX) - 2( Zy (y)2 - p32i X2)l (r)) + xy 2c22 + yxc23 + c24(p3iX + Zy (y))--c24 x),
4 p32i p3i
F6{i}(r)= y2X.
(33)
(34)
The antiderivatives of QT(r) Q (r) are
F {2}(r) = R QT (r)Q(r) dr = R
" Px (r) " Px (r)
Py (r) Py (r)
i i
Px (r) Px (r)
Py (r) Py (r)
i i
dr =
-i
P2 (r) Px (r)Py (r) Px (r)
Px (r)Py (r) Py2(r) Py (r)
Px (r)
Py (r)
i
P2(r) Px (r)Py (r) Px (r) Px (r)Py (r) Py2(r) Py (r )
Px (r)
Py (r)
i
dr.
Fii2} (r)^-^r(c3ixy - ^ (2Zx (X)21(r) + yp32 (yp32 - 2Zx (x))) - ^ (p32y - Zx (x)l(r)) - ^l(r) +
p33i 2 p332 p^2 p32
+—(p32y(2Zx(X) -p32y) -2(Zx(X)2 -p322y2)l(r)) + ^Z(r)l(r) -c36y),
4 pi i
p32
F2{22} (r) = (c37 xy(2Zx (X)21(r) + yp32 (yp32 - 2Zx (X))) - ^ (p32y - Zx (X)l(r)) - ^^^l(r) +
p33i 2 p332 p322
p32y(2Zx(X) -p32y) -2(Zx(X)2 -phy2)l(r)) + ^Z(r)l(r) -c42y),
p32
4 p322
F3{32}(r) = xy,
p32
(35)
F122} (r) = 4" (—f (2Zx (x)21(r) + yp32 (yp32 - 2z, (x))) - ^ (p32y - Zx (x)l(r)) - ^l(r) +
p331 2 p32 p3*2 p32
c46 -(p32y(2Zx(x) -p32y)-2(Zx(x)2 -p32y2)l(r)) +^Z(r)l(r) -c47y + xyc«),
4 p32 1
£i
p32
(36)
F1!32!(r) = —(-^ p32 y(2 Zx (x) - p32 y ) - 2( Zx ( x)2 - p2 y 2)l (r)) + ^ Z (r)l (r) - c26 y + c27 xy ),
p321 4 p32 _1
p321 "4 p322
p32
F2!32}(r) = p32 y(2 Zx (x) - p32 y) - 2( Zx ( x)2 - p22 y 2)l (r)) + ^ Z (r)l (r) - c29 y + c30 xy),
F !2' = F !2' 1 44 J 11 ,
F5!52}= F™,
F !2! = F !2! = F !2' = F !2' 21 54 45 12
31 64 46 13
F !2' = F !2' F !2' = F !2' = F !2! = F !2!
66 33 32 65 56 23
F/' = F2' = F«2' = 0, Ff = f'2' = F<? = 0,
F-!42! = F52' = F2!62' = 0, F5!;2' = F5!22' = F5!32' = 0,
(37)
(38)
F3!42'= F3!52'= F3!62'=0,
F6?'= F«f= F<?=0.
Although the coefficients K are real numbers, there might be some complex numbers showing up throughout calculations. The latter causes serious inconvenience, especially for a software implementation. These possible complex numbers are associated with the fact that function Z might be negative, and its logarithm we get in (31). But according to the constraint (12), the function Z can be either strictly positive or strictly negative. Thus, in order to get rid of any complex numbers, we can replace matrix P with matrix -P. This change is indeed possible since matrix P is homogenous: it defines according to (7) the projective transformation which does not change if matrix P is multiplied by any non-zero value.
Algorithm 4. Calculation of the target coefficients for the orthotropic rectangular ROI
Input:
• matrix H e M3x3 of projective normalization H,
• orthotropic rectangular ROI
R = [x;, x2]x[y;,y2], x; < x2, y; <y2. Output: Target coefficients K=K(H, R). Step 1. Matrix P = (p) is calculated: (8). Step 2. Function Z is defined: (11). Step 3. If Z([x; y;]T) < 0, then P := -P and Z is redefined.
Step 4. Coefficients c are calculated: (26), (27), (28), (29), (30).
Step 5. Functions Zx, Zy, l are defined (31). Step 6. Antiderivative F!0' is defined: (32). Step 7. Antiderivatives F!1' are defined: (33), (34). Step 8. Antiderivatives F!2' are defined: (35), (36), (37), (38).
Step 9. Target coefficients K are calculated: (25).
2.6. Rectangular ROI
We have analytically calculated the target coefficients K for the orthotropic rectangular ROI above. Now we will generalize this solution for the arbitrary oriented
P32
rectangular ROI. Let us introduce the latter as an image of the rotation U of an orthotropic rectangle Ro:
R = U( Ro),
Ro = [Xi,x2]x [yi,y2],
X <x2,
yi<y2.
Let us use (19) and (20):
K{0} = J rT r dr,
K!1' = J rT Q(r) dr,
R
K!2' = J QT (r)Q(r )dr.
def
Let us introduce new coordinates p = U-1(r ) = U J r ,
where
def
U 2 =
c -s
, where -
+s c
def
c = cos(a),
def
s = sin(a),
then r = U2P, which means
K!0' = J pTU2TU2 p dp = J pTp dp, R0 R0
K !1' = J pTU T Q(U 2p) dp, R0
K !2' = J QT (U 2p)Q(U 2p) dp.
R0
Note that
U TQ = QU 6T,
def
where U6 =
-s
-s
+s
+s
+s
(39)
Which means
K !1' = ( J pT Q(U 2p)dp U
k !0! = k00},
Thus K !1} = K0!1)U6T, (40)
K!2} = K0!2},
where
K<°> = JpTp dp,
R0
K = JpT Q(U 2p)dp,
R0
K !2} = J QT (U 2p)Q(U2p)dp.
R0
However, the coefficients K0 are equal to coefficients calculated for the matrix of the projective normalization UTH and ROI R0:
K0=K(U 3TH ,R0), where
(4i)
is a uniform rotation matrix. Thus, the problem is reduced to the previously solved problem of the target coefficients calculation for the orthotropic rectangular ROI.
2.7. RO/ conststtng of rectangles
Consider the ROI which consists of rectangles:
R = un=1 Rt : Rt n Rj = 0, t * j. Using (19):
r 0] c -s 0]
def U 2
U3 = 2 0 = +s c 0
0 0 i 0 0 i
K!k} = Jf!k }(r)dr = ¿J f!k }(r)dr,
R i=1 R
which means
K !k} = ¿K,!k}, (42)
1=1
where the coefficients
K,!k} = J f!k} (r) dr R
can be analytically calculated via Algorithm 5:
K, = K(H, R,).
3. Special cases of the affine image normalization
Aside from the affine transformation for image normalization, its special cases are widely used [44], which is usually even more computationally efficient. Let us consider sets of transformation with matrices A forming the linear manifold:
A[S ] = \ A: A(a), a=Sp, p =
t e
(43)
where matrix Se M6x(rf+1) is a parameter defining the manifold, or in compact notation:
A[S ] = \ A: A
f "t ]
S
V [ij /
t e]
Algorithm 5. Calculation of the target coefftctents for the rectangular RO/
Input:
• matrix H e M3x3 of projective normalization H,
• rectangular ROI R = U(R0), where
Ro = [X1, X2]x[y1, y2], X1 < X2, y1 <y2 is an orthotropic rectangle, and
U(r) =
c = cos(a ) s = sin(a)
+s c
is its rotation. Output: Target coefficients K=K(H, R). Step 1. Matrix U3 is calculated: (41). Step 2. Matrix U6 is calculated: (39). Step 3. K0=K(U3TH,R0) is calculated via
Algorithm 4. Step 4. Target coefficients K are calculated: (40).
Algorithm 6. Calculatton of the target coefftctents for the RO/ conststtng of rectangles
Input:
• matrix H e M3x3 of projective normalization H,
• ROI consisting of rectangles R = uJLj R, Ri n Rj = 0, t * j .
Output: Target coefficients K=K(H, R).
Step 1. Kt=K(H, Rt) are calculated via Algorithm 5.
Step 2. Target coefficients K are calculated: (42).
Thus we can define sets of scaling, translation, shearing, and their superposition matrices. But we cannot introduce, for example, a set of rotation matrices. Let us provide some examples. The isotropic scaling
1 0"
S =
0 0 0 0 0 0 i 0 0 0
■ A[ S ] =
ti 0 0 0 ti 0
(44)
The superposition of translation and shearing: "0 0 0 1 " 10 0 0 0 10 0 0 0 0 0 0 0 0 1 0 0 10
The superposition of shearing and anisotropic scaling:
S =
• A[ S ] = ■
i ti t2 0 i t3
(45)
S =
■ A[ S ] = •
t; 0 t2 0 Î3 14
>. (46)
1 0 0 0 0
0 0 0 0 0
0 10 0 0
0 0 0 0 0
0 0 10 0
0 0 0 1 0
The full affine transformation
1 0 0 0 0 0 0"
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
Now let us find a matrix from the given manifold A[S], which corresponds to the most accurate image normalization according to the RMS criterion:
S =
• A[S ] =
t; /2 ¿3
. (47)
A* =arg min J
AeA[S] J
r - A
P(r) 1
dr.
Following reasoning from subsection 2.3: A*=A(a*), a*=argmm(K{0} -2K{1}a + aTK{2}a),
where K = K(H, R) is calculated via Algorithms 2, 3, or 6. Then
a* = S
(48)
where
t* = argmin where
def
Ki0!= K!0',
def
k!;' = k !;'s ,
def
k!2' = stk !2's.
But t 1
where
K*!0' - 2K*!;'
-[tT 1] K*!2'
(49)
= I1 + i,
def
I =
1 • • 0] [0]
def
0 • • 1 , i 0
0 • • 0 1
thus
t* = argmin(K{0} -2K{1}(/t + i) +
+(/t + i)T K{2} (/t + i)) =
= argmin(( -2Ki,1}t + tTKl2}t),
where
def
K{*0} = K{0} - 2K{1}i + iTK{2}i,
def
K{*1} =(K{1} - iTK*{2})/,
def
K{*2} = /TK{2}/, from which follows the analytical solution:
t* = (k{*2} )-1 (k{*1} )t.
(50)
(51)
Algorithm 7. Optimal special affine image normaliZation search
Input:
• matrix H e M3x3 of projective normalization H,
• ROI R c M2: 0 < S(R) < ro or 0 < |R| < ro,
• matrix S, which defines linear manifold of A[S] matrices.
Output:
• matrix A * e A [S] of optimal affine approximation of H on R: (9),
• the corresponding value of RMS criterion of accuracy L*2: (6).
Step 1. K=K(H, R)is calculated via Algorithms 2, 3, or 6.
Step 2. K* is calculated: (49).
Step 3. K** is calculated: (50).
Step 4. t* is defined: (51).
Step 5. a* is defined: (48).
Step 6. A*=A (a*) is calculated: (14).
Step 7. L*2 = L2(A*,H; R) is calculated: (24).
Because of the example (47), Algorithm 7 is a generalization of Algorithm 1. Its program implementation in MatLab is available at
https://github.com/konovalenko-iitp/optimal-affine-image-normalization.
4. Accelerated approach to image normalization
After the problem of optimal (special) affine image normalization was solved analytically for many cases, we can propose an accelerated approach to image normalization. This approach is based on the replacement of the projective normalization with the (special) affine one if there is no significant loss of accuracy.
Algorithm 8. Accelerated image normalization
Input:
• input image /mput,
• matrix HeM3x3 of projective normalization H,
• ROI R c M2: 0 < S(R) < <» or 0 < |R| < <»,
• matrix S which defines linear manifold of A[S] matrices,
• accuracy threshold Lm™ .
Output: Projectively or optimally affinely
normalized image: /proj or /f^ .
Step 1. Calculation of the (special) affine
approximation A* of H normalization and a corresponding value of RMS criterion of accuracy L*2 via Algorithm 7.
Step 2. If L*2 < L™ , then /f is calculated via the application of the transformation A* to the image /input. If otherwise, /proj is calculated via the application of the transformation H to the image /input.
An example of the optimal affine normalization is illustrated in Fig. 1. Each of these three images /input, /proj and /affin has three channels of 1434 x 966 pixels. Computations were performed on a computer with an Intel Core i3 4030U processor. OpenCV library was utilized for image normalization. As the ROI R we have chosen the composition of three rectangles of text fields on a credit card. The analytical search of the optimal affine normalization (Algorithm 4) in this case on the average of 104 repetitions took tc = 0.191 milliseconds. The application of the resulting affine normalization took ta = 5.90 milliseconds, while the projective normalization took tp = 9.91 milliseconds. Thus, the Algorithm (5) allowed for the tp /(tc + ta) ~ 1.63 times faster performance.
And Fig. 1 shows that even though the camera optical axis is oriented significantly off the perpendicular to the document surface, text fields of the credit card were normalized with high accuracy.
Conclusion
In this work, we propose a fast approach for image normalization. It utilizes the affine normalization instead of projective if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to the problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of an affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization process to be further accelerated.
References
[1] Zeynalov R, Velizhev A, Konushin A. Vosstanovlenie formy stranicy teksta dlya korrekcii geometricheskih is-kazhenij [In Russian]. Proc of the i9 International Conference GraphiCon-2009 2009: i^-m.
[2] Zhukovskiy AE, Nikolaev DP, Arlazarov VV, et al. Segments graph-based approach for document capture in a smartphone video stream. ICDAR 2017; i: 337-342. DOI: 10.1109/ICDAR.2017.63.
[3] Bolotova YuA, Spitsyn VG, Osina PM. A review of algorithms for text detection in images and videos. Computer Optics 2017; 4i(3): 441-452. DOI: 10.18-87/-41--6179--017-41-3-441-45-.
[4] Shemiakina JA, Faradjev IA, Zhukovsky AE. Research on algorithms for calculation of projective transformation in the problem of planar-object targeting by feature points. Sci Tech Inf Process 2018; 45(5): 346-351.
[5] Skoryukina N, Shemyakina J, Arlazarov VL, Faradzhev I. Document localization algorithms based on feature points and straight lines. Proc SPIE 2018; 10696: 106961H. DOI: 10.1117/1-.-311478.
[6] Povolotskiy MA, Kuznetsova EG, Khanipov TM. Russian license plate segmentation based on dynamic time warping. Proc ECMS 2017: ^-Mi.
[7] Skoryukina NS, Chernov TS, Bulatov KB, et al. Snap-screen: TV-stream frame search with projectively distorted and noisy query. Proc SPIE 2017; 10341; 103410Y. DOI: 10.1117/1-.--68735.
[8] Xie Y, Tang G, Hoff W. Geometry-based populated chessboard recognition. Proc SPIE 2018; 10696: 1069603.
[9] Arvind CS, Ritesh Mishra, Kumar Vishal, Venugopal Gundimeda. Vision based speed breaker detection for autonomous vehicle. Proc SPIE 2018; 10696: 106960E.
[10] Dubuisson M-P, Jain AAK. A modified Hausdorff distance for object matching. Proc 12th International Conference on Pattern Recognition 1994; 1: 566-568.
[11] Sim D-G, Kwon O-K, Park R-H. Object matching algorithms using robust Hausdorff distance measures. IEEE Trans Image Process 1999; 8(3): 4-5-4-9.
[12] Orrite C, Herrero JE. Shape matching of partially occluded curves invariant under projective transformation. Comput Vis Image Underst 2004; 93(1): 34-64.
[13] Nikolayev PP. Projectively invariant description of non-planar smooth figures. 1. Preliminary analysis of the problem [In Russian]. Sensornye Sistemy 2016; 30(4): 290311.
[14] Balitskii AM, Savchik AV, Konovalenko IA, Gafarov RF. On projectively invariant points of an oval with a distinguished exterior line. Probl Inf Transm 2017; 53(3): 279283.
[15] Savchik AV, Nikolaev PP. Metod proektivnogo sopostavleniya dlya ovalov s dvumya otmechennymi tochkami [In Russian]. Informacionnye Tekhnologii i vychislitel'nye Sistemy 2018; 2018(1): 60-67.
[16] Katamanov SN. Avtomaticheskaya privyazka izobrazhenij geostacionarnogo sputnika MTSAT-1R [In Russian]. Sov-remennye Problemy Distancionnogo Zondirovaniya Zemli iz Kosmosa 2007; 1(4): 63-68.
[17] Karpenko S, Konovalenko I, Miller A, et al. UAV Control on the basis of 3D Landmark Bearing-Only observations. Sensors 2015; 15(12): -980---98-0. DOI: 10.3390/s151—9768.
[18] Kholopov I.S. Projective distortion correction algorithm at low altitude photographing. Computer Optics 2017; 41(2): ^-m DOI: 10.18-87/0134--45---017-41----84--90.
[19] Legge GE, Pelli DG, Rubin GS, et al. Psychophysics of reading. I. Normal vision. Vision Res 1985; 25(2): 239252.
[20] Forsyth DA, Ponce J. Computer vision: a modern approach. Prentice Hall Professional Technical Reference; 2002.
[21] Triputen V, Gorohovatskij V. Algoritm parallel'noj normali-zacii affinnyh preobrazovanij dlya cvetnyh izobrazhenij [In Russian]. Radioelektronika i Informatika 1997; 1: 97-98.
[22] Putyatin EP, Prokopenko DO, Pechenaya EM. Voprosy nor-malizacii izobrazhenij pri proektivnyh preobrazovaniyah [In Russian]. Radioelektronika i Informatika 1998; 2(3): 82-86.
[23] Wolberg G. Digital image warping. Los Alamitos, CA: IEEE Computer Society Press; 1990.
[24] Trusov A, Limonova E. The analysis of projective transformation algorithms for image recognition on mobile devices. Proc SPIE 2020; 11433: 114330Y.
[25] Gruen A. Adaptive least squares correlation: a powerful image matching technique. South African Journal of Pho-togrammetry, Remote Sensing and Cartography 1985; 14(3): 175-187.
[26] Ohta T-i, Maenobu K, Sakai T. Obtaining surface orientation from texels under perspective projection. IJCAI'81 1981; 2: 746-751.
[27] Pavic Darko Schonefeld V, Kobbelt L. Interactive image completion with perspective correction. Visual Comput 2006; 22(9-11): 671-681.
[28] Heckbert PS. Fundamentals of texture mapping and image warping. Technical Report. Berkeley: University of California, 1989.
[29] Lorenz H, Dollner J. Real-time piecewise perspective projections. GRAPP 2009: 147-155.
[30] Huang J-B, Singh A, Ahuja N. Single image superresolution from transformed self-exemplars. Proc IEEE Conf CVPR 2015: 5197-5206.
[31] 3D Pose from three corresponding points under weak-perspective projection. Technical Report. Cambridge, MA: Massachusetts Institute of Technology; 1992.
[32] Kutulakos KN, Vallino J. Affine object representations for calibration-free augmented reality. Proc IEEE Virtual Reality Annual International Symposium 1996: 25-36.
[33] Aradhye H, Myers GK. Method and apparatus for recognition of symbols in images of three-dimensional scenes. US Patent 7,738,706 of June 15, 2010.
[34] Mikolajczyk K, Schmid C. An affine invariant interest point detector. In Book: Heyden A, Sparr G, Nielsen M, Johansen P, eds. Computer vision - ECCV 2002. Berlin, Heidelberg, New York: Springer-Verlag; 2002: 128-142.
[35] Mikolajczyk K, Schmid C. Scale & affine invariant interest point detectors. Int J Comput Vis 2004; 60(1): 63-86.
[36] Morel J-M, Yu G. ASIFT: A new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2009; 2(2): 438-469.
[37] Kadir T, Zisserman A, Brady M. An affine invariant salient region detector. In Book: Pajdla T, Matas J, eds. Computer vision - ECCV 2004. Berlin, Heidelberg, New York: Springer-Verlag; 2004: 228-241.
[38] Faugeras OD. What can be seen in three dimensions with an uncalibrated stereo rig? In Book: Sandini G, ed. Computer vision - ECCV'92. Berlin, Heidelberg, New York: Springer-Verlag; 1992: 563-578.
[39] Zwicker M, Räsänen J, Botsch M, et al. Perspective accurate splatting. Proceedings of Graphics Interface 2004: 247-254.
[40] Kunina IA, Gladilin SA, Nikolaev DP. Blind compensation of radial distortion in a single image using fast Hough transform. Computer Optics 2016; 40(3): 395-403. DOI: 10.18287/2412-6179-2016-40-3-395-403.
[41] Hsu SC, Sawhney HS. Influence of global constraints and lens distortion on pose and appearance recovery from a purely rotating camera. Proc 4th IEEE Workshop on Applications of Computer Vision (WACV'98) 1998: 154-159.
[42] Chen H, Sukthankar R, Wallace G, Li K. Scalable alignment of large-format multi-projector displays using camera homography trees. IEEE Visualization (VIS 2002) 2002: 339-346.
[43] Konovalenko IA, Kokhan VV, Nikolaev DP. Optimal affine approximation of image projective transformation [In Russian]. Sensornye Sistemy 2019; 33(1): 7-14.
[44] Vanichev AY. Normalizaciya siluetov ob"ektov v sistemah tekhnicheskogo zreniya [In Russian]. Programmnye Produkty i Sistemy 2007; 3: 86-88.
Authors' information
Ivan Andreevich Konovalenko is a researcher at the IITP RAS, a researcher-programmer at Smart Engines Service LLC. He graduated from MIPT in 2014. His major research interests include Computer Vision, Applied Mathematics, and Mathematical Analysis. E-mail: konovalenko@smartengines. com .
The information about author Vladislav Vladimirovich Kokhan you can find on page 77 of this issue.
Dmitry Petrovich Nikolaev, Ph. D. in Physics and Mathematics, is a head of the laboratory at the IITP RAS, a technical director of Smart Engines Service LLC. He graduated from Lomonosov MSU in 2000. His major research interests include Machine Vision, Algorithms for Fast Image Processing, Pattern Recognition. E-mail: [email protected] .
Received May 25, 2020. The final version - September 28, 2020.