Applied Mathematics of Ray Tracing and Perspective Projection in Fiducial-Based Registration of X-Ray Images

Ray tracing (RT) and perspective projection (PP) using fiducial-based registration can be used to determine points of interest in biplanar X-ray imaging. We sought to investigate the implementation of these techniques as they pertain to X-ray imaging geometry. The mathematical solutions are presented and then implemented in a phantom and actual case with numerical tables and imaging. The X-ray imaging is treated like a Cartesian system in millimeters (mm) with a standard frame-based stereotactic system. In this space, the point source is the X-ray emitter (focal spot), the plane is the X-ray detector, and fiducials are in between the source and plane. In a phantom case, RT was able to predict locations of fiducials after moving the point source. Also, a scaled PP matrix could be used to determine imaging geometry, which could then be used in RT. Automated identification of spherical fiducials in 3D was possible using a center of mass computation with average Euclidean error relative to manual measurement of 0.23 mm. For PP, RT projection or a combinatorial approach could be used to facilitate matching 3D to 2D points. Despite being used herein for deep brain stimulation (DBS), utilization of this kind of imaging analysis has wide medical and non-medical applications.


Introduction
In 1895, Wilhelm Roentgen discovered X-rays, which subsequently had an important impact in medicine for over 100 years [1]. As a major diagnostic tool, X-rays revolutionized medicine by empowering clinicians to diagnose various conditions such as pneumonia, bony fractures, and dental cavities. Despite the long history of X-ray use, developments primarily have included advances in speed, quality, and, more recently, digitization. Computational X-ray analysis has not been the standard despite the wide use of X-rays around the world. Rather, advancements in computed tomography (CT) and magnetic resonance imaging (MRI) have been utilized for 3D (three dimension) localization. Nevertheless, X-rays represent a simple, low cost, low radiation dose technique for which the utility can be optimized.
presented on a display plane (the X-ray image). Computationally, RT requires knowledge of the entire imaging geometry, whereas PP requires knowledge of the 3D points and their associated projection points in 2D (two dimension). Using these techniques, RT and PP represent important tools that offer critical analysis of X-rays.
In this paper, we discuss the mathematics involved with RT and PP as well as provide numerical samples and imaging of each. We analyze a phantom case, a real case, explore automations and error propagation. These methods effectively compute 3D to 2D (3D-2D) and, for STiL, 2D to 3D (2D-3D). We explore these techniques using point analysis, but similar techniques can be expanded using volumes.

Technical Report
In stereotactic neurosurgery, a Cartesian coordinate system can be generated using anatomical structures in the brain or using a frame-based apparatus. Anatomical points include the anterior commissure (AC), posterior commissure (PC), and a midline structure to generate the coordinate system, generally from the middle of AC-PC (MidACPC). Frame-based coordinate systems typically use an N-localizer apparatus calibrated on a CT or MRI [3][4][5]. Generally, these coordinate systems are considered linear and are scaled in millimeters. In the following, we implement mathematical solutions for point-based registration from 3D (x, y, z) to 2D (U, V) using RT and PP. These techniques assume that there are no geometrical distortions (scattering or diffraction), that the focal spot of the X-ray emitter is a point source, that the X-ray detector can be treated like a plane, and that the center of spheres or pin tips/bases can be treated as point objects in the system (fiducials). The centers of spheres are used in the majority of cases as the center is geometrically ideal, but we also utilize pin tips and pin bases in addition to other stable objects when available, which depends on the X-ray exposure. All objects are mathematically utilized as points. For the display plane (X-ray image), pixel spacing may be present from DICOM metadata, but this is often not present in fluoroscopic images. In some instances, calibration may be needed to determine this scaling and ensure no inhomogeneity in the image. Also, to maintain a single frame-based coordinate system, the subsequent tables utilize CRW/BRWLF (Cosman-Roberts-Wells/Brown-Roberts-Wells Localizer Frame, Radionics CRW Stereotactic System, Integra LifeSciences Corporation, Plainsboro, New Jersey). Fiducial positions were calculated using CT with the incorporation of important points fused from MRI in iPlan 3.0 Stereotaxy (Brainlab Inc, Feldkirchen, Germany). Also, the numerical tables utilize AP (Antero-Posterior), LAT (Lateral), VERT (Vertical) rather than x, y, z. Finally, many of the solutions provided assume biplanar imaging (two views or poses), but the same mathematical solutions can allow many more.

Background mathematical solutions
In RT, the geometry of the point source (focal spot size) (P 0 ) to a fiducial (P 1 or P 2 ) forms a ray that intersects a detector (P p1 or P p2 ) plane, which is seen on a display (P uv1 or P uv2 ) plane ( Figure 1) [6]. If the stereotactic coordinate system utilized contains an axis orthogonal to the detector plane, the system is coherent on-axis (COA). Having a COA system makes the determination of display points more simple, such that the two variable coordinates are translated, rotated and scaled to the display plane. The display plane can also be rotated separately. If the detector plane is not COA, then a change of basis transformation is needed to normalize an axis. For example, in a COA antero-posterior (AP) view, the AP value of all plane intersections with the AP plane is the same, which allows straightforward utilization of the other dimensions, lateral (LAT) and vertical (VERT), where the intersections occur. However, a system that is not COA yields 3D coordinates for which all values have variation, requiring a normalization prior to projection to the display plane.
images are projected on the display plane, they are 2D (U, V).
The PP matrix can be used to determine the location of the point source. From the intrinsic matrix of PP, the distance between the image source and the detector plane (focal length) can be calculated, thereby allowing calculation of the detector plane location. The point on the detector plane that is closest to the focal spot can be utilized to calculate the principal point on the display plane. Finally, in the display plane, we use the top left corner of the image assigned as (0,0). The X-ray detector can be considered as a mathematical plane (equation 1). The X-ray emitter then functions as a point source that generates rays which intersect fiducials, generating lines in 3D parametric form (equations 2, 3, 4). When these lines intersect the X-ray detector (Step 1, Figure 1), there are the points in the detector plane (equations 5, 6, 7). Because the display plane would vary in presentation, added translation/rotation is required for pixel location, which can be computed in matrix form incorporating this 2D rotation and translation (equation 8). Here, the 2D presentation of the planar image expects a dimension of the 3D intersection to be a constant (COA). When not COA, the normalization procedure (Step 2, Figure 1) is simplified by the use of unit vectors (equation 9) and dot-products between the two coordinate systems thereby identifying the matrix components.
In perspective projection (PP) a 3D point in space is associated with a 2D point on the display plane (equation 10) [7]. Importantly, a 2D point is generated (Step 2, Figure 1) by dividing the third component (t) with u or v (equations [11][12]. A 4x3 conversion matrix is used wherein 11 of the 12 elements are unknowns initially. The three columns represent the projection matrix components (equations [13][14][15]. Each of these three column components has four numerical values, but the last value (C 43 ) is scaled to 1 in this homogenous formulation (equations [16][17][18][19]. The system is then arranged as a set of overdetermined equations generally, such that each instance of 3D-2D points generates two equations (equation 20), one for U and another for V. Therefore, 5.5 3D-2D points are needed to solve the 11 matrix elements, but generally this system would be solved via a least squares formulation to optimize the matrix and minimize errors. Lastly, the complete PP matrix is 4x3 (equation 21), but will be subsequently utilized as its transpose in 3x4 (equation 22).
Interpretation of the PP model is generally widespread in computer vision [7]. Next, we utilize this technique to gain insight into the geometric interpretation of the X-ray system. The essential elements of the complete PP matrix, here now interpreted as 3x4, include a 3x3 intrinsic parameter and the extrinsic parameters include a 3x3 rotation and a 3x1 translation (equation 23). The components of the intrinsic parameter are important to note, which include scalar dimensions, shear, and translation (equation 24). The five intrinsic components are alpha and beta (which should be equal), theta, and the principal point (U 0 , V 0 ). Of note, a skew angle is computed as the absolute value of theta minus pi/2. Again, to calculate these values for geometric interpretation, the scale of the imaging system (detector plane and display plane) should be the same or undergo a conversion. For example, the imaging plane may generate uncalibrated pixels and the stereotactic frame (or anatomic frame) is in millimeters. If the U and V coordinates on the imaging plane are converted to a millimeter system, interpretation of the components of the intrinsic parameters is facilitated. Notably, this scaling is not required for STiL using PP. Next, the vector components (equations 25-27) of the PP matrix are utilized to calculate components of these intrinsic parameters (equations 28-31). Of note, the expected result of equation 31 is 0, as in an ideal setting the imaging system would be non-skew (theta = pi/2 (approximately 1.57) radians or 90 degrees). The next elements calculated are utilized to generate the point source and the imaging plane. To formulate relevant information, the negative inverse (-a -1 ) of the vector component (3x3) of the PP matrix is performed and multiplied by a translation (equations 32-25). Visually, we see from Figure 1 that a 2D point on the display plane can be projected back to a plane that intersects the point source and is parallel to the detector plane. First, by using the translation component of the original PP matrix (b) and multiplying that with the negative inverse of the vector component, the point source (Omega) is then determined (equation 36). Picking two more unique 2D points (c, d) in the display plane generates two more points in the plane that intersect the point source (equations 37-38). Using these three points in the image source plane, two vectors can be created (equations 39-40). Forward or reverse cross-products of these vectors (equation 41-42) yields the normal of the image source plane, which is parallel to the detector plane (equation 43). This result may be expressed as a plane equation wherein the last element (sigma) is not yet known (equation 44). However, sigma can be determined using components of the intrinsic matrix for focal length, which in X-ray imaging is equivalent to the Source Image Distance (SID) (equations 45-46). It is possible, and even likely, under many circumstances that the imaging system may be slightly skew, so taking the average of each scalar component (alpha and beta) may yield a good approximation (equation 47). Furthermore, the relationship between focal length, the point source, and the detector plane tie all these elements together (equation 48). If the focal length is known, then one can readily calculate the missing component (sigma) of the detector plane equation.
Further evaluation of the geometry reveals that the relationship required for solving the sigma of the detector plane equation can be alternatively solved using a quadratic solution ( Figure 1) as explained below. First, consider two sets of parametric equations of 3D lines that emerge from the point source through 3D points (equations 49-54). These lines then intersect the plane at some points (equations 55-60). Interestingly, PP allows one to determine the 2D scalar distance, which is a Euclidean distance, on the display plane using a projection of two 3D (we used {1,1,1} and {100, 100, 100}) points using equation 10. This scalar distance is equal to zeta in equation 61 and could alternatively be measured manually on the display plane. The final formation when squared reveals an equation that has a single unknown, sigma (equation 62). Finally, sigma can be solved here via the quadratic method (equations 63-64).

Fiducial matching from 3D to 2D
A time-consuming and potentially error-prone task is matching 3D to 2D fiducials for PP. One method is to utilize knowledge of the geometric setup and perform RT-assisted-matching (RTAM) for PP. Other options include applying identifiable characteristics or geometry to the fiducials. Lastly, after matching approximately six points (minimum 5.5), the PP matrix itself can be utilized to identify further matches.
A fully automatic matching for PP represents an important improvement in the workflow as it does not require precise projection knowledge between the 3D and the 2D points. It does require 3D knowledge of the fiducials and identification of the fiducials on 2D, but their precise correlation between 3D to 2D is not needed. Rather, a combinatorial optimization matching method (COMM) was implemented for this step to eliminate matching errors.
Mathematically, for a case comprising N (3D fiducials) and K (2D fiducials), where K <= N, the space of all possible combinations of K in N is: For each combination, there are K! permutations that must be searched to reach the best match. Therefore, given that unique combinations where order does need to match both 3D and 2D, the resulting numerical possibilities to investigate are: As mentioned above, the solution of the PP transformation matrix derives from an overdetermined system that should be solved as a least squares error computation. Without additive information for each candidate match, the sum of the squares of the residuals of its solutions can be used as a metric of the matching quality. In principle, the automatic matching selection reduces to computation of all possible combinations and permutations keeping the one with the minimum least-squares error from its PP matrix solution. The previous crude approach produces very long computation times especially when the numbers K and N grow.
Singular value decomposition (SVD) is the most robust linear solution for PP matrix computation, but the suitability of QR-factorization, Gaussian Elimination, and Cholesky Factorization were explored. The methods were used in several test cases and each selected the same optimum match when minimizing mean square error (MSE). Cholesky factorization achieved convergence in less than a quarter of the time needed by SVD. Additional measures of the quality of the PP matrix can be utilized. For example, matrix condition number (CN) represents a ratio of the largest to smallest singular values (via SVD), which can eliminate singular or ill-conditioned systems. Theta, which should be 90 degrees (~1.57 radians), determines if the resultant vectors of the PP matrix are distorted as a measure of skew. Further, geometric constraints can be added if the estimated point source or detector plane are known. Generally, the fastest is MSE, which is not dependent on other factors or knowledge, but the addition of other thresholds ensures a consistent optimum result. Due to the nature of this approach and the prevalence of multicore CPUs/GPUs, a parallel implementation of the search process can also be implemented. For fiducials of n=8 and k=6, we get 20,160 possible combinations, and in combination with multiple thresholds, this represents a reasonable minimum for most CPUs.

RT fiducial tracking by moving focal spot location
As discussed above, the solution for RT is determined by the geometric properties of X-ray acquisition. If only some elements are known, a solution might be accomplished iteratively by changing the point source location and a tentative magnification multiplier until good overlap with fiducials is accomplished. If the geometry is largely unknown, but PP is possible, then PP can be used to compute the point source and image plane locations, which can subsequently be used in RT. This could be particularly useful when measurements are difficult to establish, but a relatively fixed setup is present.
If the X-ray setup is fixed (Source, Object(s), and Detector), then known changes in geometry should be compensated quickly using RT. For example, we demonstrate a case of a fixed X-ray setup with a phantom skull and widely distributed geometric fiducials. In this example, PP is used to determine the origin point for RT in both AP and LAT views.

PP and orthogonal and non-orthogonal RT
The following tables demonstrate the mathematical solutions for RT and PP. Starting with orthogonal imaging (COA) in RT, point source and planar data can be combined with objects (fiducials) and line-detector plane intersections ( Table 1). In this case, the source image distance (SID) is 1168 mm and the pixel spacing is 388 micrometer (um)/pixel when using 2D fluoroscopy mode using the O-arm2 (Medtronic, Dublin, Ireland). Computing the ray-plane intersections for all the relevant objects reveals the AP dimension to be constant at -515 (COA), which allows the use of the LAT and VERT coordinates to be translated directly to image ( Table  2). Similarly, the LAT image has a constant LAT dimension of 513 ( Table 3).   Using knowledge of the point source and the detector plane locations in Table 1    The following tables (Tables 4-8) demonstrate the PP solution using 3D coordinates combined with scaled (in millimeters) U and V screen coordinates for analysis in both AP and LAT views. The complete PP matrices are computed using SVD with 22 and 26 equations in AP and LAT, respectively ( Table 6). Applying the matrices to the images for all the hyperdense objects (fiducials, pins, electrode tips) shows overlap in the AP and LAT image (Figures 6, 7). The next step is to use PP to develop the geometry and compare the results to RT (Table 7). Finally, we apply biplanar imaging to localize the 3D position (STiL) in both RT and PP. The solution in RT can be derived from a line-line intersection problem using parametric equations for the lines to solve for the parameters r, which describes the closest point of intersection (equations 67-69). After r is solved, the values for x, y, and z can be readily calculated by plugging the value back into each parametric equation. A skew ray (line-line) intersection computation generating the closest points of intersection on each ray allows for a Euclidean distance between those points serving as a quality check between the two images.          Table 6  Distance from Point Source to Plane = Focal Length, which was calculated using Quadratic Solution;

PP Matrices
AP/LAT Normal-to-normal Angle (degrees) = normal of computed detector plane relative to the associated normal of a normalized plane.     Next, we consider the situation of uncertain correlation between 3D points to 2D points. For this, the same patient case discussed above now includes an initial oblique image with the N- localizer positioned (Figure 8). We start the solution by using a combinatorial approach. Various thresholds can be analyzed, but here we assume no specific knowledge and use MSE and the matrix condition number (CN), but include the theta angle result ( Table  9). The combination of low MSE, low CN, and theta closest to 1.57 radians identify the result. The next step is to use the geometric analysis of the point source and plane location ( Table 10). These data are then placed in RT, but require normalization along the AP axis to make COA ( Table 11). The final result produced by RT shows excellent overlap (Figure 9). Note that the projection can also be performed using PP. Because both methods project the scalar distance on the detector plane, data from RT and PP should be similar. Here, we randomly chose the AC point to Midline point yielding a distance in RT of 89.604 mm to PP of 89.694 mm. Finally, in RT, the result between detector plane intersection and display plane (normalized) is exactly the same (89.604 mm).       Distance from Point Source to Plane = Focal Length, which was calculated using Quadratic Solution Normal-to-Normal Angle in degrees = detector plane normal compared to the normalized plane   (Table 10)

Error propagation with PP
While RT is computationally deterministic, PP utilizes a least squares optimization. Therefore, error propagation with PP is an important study. First, we evaluate automatic sphere detection on 3D images (CT) using a "center of mass" calculation and then evaluate potential errors. Hounsfield units and spherical size were used as thresholds for automatic sphere detection. This method would expedite acquisition of 3D points from CT imaging. Here, a comparison between this automation and manual selection is presented (Table 12). While this is an important step, error propagation in PP becomes relevant. We therefore introduced Monte Carlo simulation evaluating root mean square (RMS) as a function of the number of references when introducing random 1 mm errors into 1 million simulations ( Figure 10). This reveals that error propagation is reduced by adding more data, which facilitates the least squares computation. In our physical setup, the vertical (VERT) distance between spheres had the least geometric variance. Fixing these points at 8, but increasing this VERT distance between them, causes a significant reduction in worst-case calculation of a 3D point when a 2-mm artificial error is introduced ( Figure 11). This suggests that a wider geometric spread of fiducials reduces 3D error propagation while using PP.

Discussion
X-ray imaging is widespread in various aspects of medicine. Single or multiple-plane computations, however, have not been widely utilized in standard X-rays imaging. While some efforts have been reported previously, herein, we have attempted a more comprehensive, usable, and illustrative approach [8][9][10][11][12]. We also explored various automations and error propagation. Further, enhanced computing power may facilitate these computations. Coincidentally, computer vision for 3D-2D rendering applies many similar principles to those used in X-ray imaging described above [7]. In computer vision, the task is generally to produce a 2D screen image from a dynamically changing 3D pose, but many modern applications include 3D-3D such as with virtual reality. Consideration of augmented reality also applies similar projection techniques superimposed on real-time analysis of images. Unlike our description for X-rays, the use of perspective projection with camera lenses may yield radial inhomogeneity (pincushion or barrel distortions) that require other steps or non-linear solutions.
Herein, we describe some of the mathematical principles of RT and PP applied to X-ray images and implement them in phantom and actual cases. Importantly, these methods have wide applications whenever planar images are obtained. Depending on the use case, simply increasing the number of reference points and the geometric spread ensures a quality result. However, RT depends more on knowledge of the imaging geometry. Also, application of scaling on the display image allows an enhanced understanding of the exposed geometry using PP. Three-dimensional imaging using CT or MRI images has largely overtaken 2D planar imaging when 3D positions are needed. However, CT/MRI incurs more expense and complexity, in addition to increased radiation (for CT) dose, without a significant improvement in resolution for a single target point. Further, the pixel size of an X-ray or fluoroscopy apparatus in our case were 0.125 mm and 0.388 mm, respectively, whereas the pixel size for 3D imaging is at best about 0.625 mm-0.8 mm for most current CT or MRI systems. Moreover, 3D imaging requires image fusion and introduces some artifacts by the nature of the acquisition and reconstruction techniques. Even though potentially millions of pixels are processed in 3D on CT/MRI, a single pixel displacement can result in a 0.625 mm error. In 2D imaging, for RT and PP, image point analysis is also required, but the 2D determination is at a higher resolution such that a single pixel displacement renders a 0.125 mm error. Therefore, this kind of point resolution to determine points in space, such as for deep brain stimulation (DBS) leads, can be sufficiently optimized using these simple X-ray techniques, which correlate well with preoperative CT/MRI imaging. In addition, as previously published, DBS lead rotation may be computed from projection X-ray data [2]. Important use cases for X-ray/fluoroscopy imaging include DBS, spine surgery, vascular interventions, non-vascular interventions, orthopedic interventions, radiosurgery and dental procedures as well as numerous non-medical applications or basic science laboratories, all of which can be enhanced with improved computing capacity [13 -16]. Because these systems can be designed using cost-effective materials, computational enhancements for these use cases could continue to grow. In addition, current X-ray and fluoroscopy technologies perform well with RT and PP, even though they were not designed for their use. Considering the mathematics and the physical nature of the X-ray systems, some assumptions should be considered carefully before applying RT or PP. Assumptions include that the system behaves like a Cartesian system, that the focal spot of the X-ray is a point source, that the X-ray detector is a plane, and that there is no significant scatter, or inhomogeneity in the image. We also assume X-ray and fluoroscopic imaging behave similarly. Some of these assumptions may not be completely accurate. For example, the focal spot of an X-ray can vary in size between different machines. Also, some fluoroscopic detectors may produce inhomogeneity. We also observed 2D images on some CT scanners, which were not a projection but rather a reconstruction. Therefore, care must be considered when applying these techniques to an