Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
https://produccioncientificaluz.org/index.php/divulgaciones/
DOI: https://doi.org/10.5281/zenodo.11540455
(CC BY-NC-SA 4.0)
c
Autor(s)
e-ISSN 2731-2437
p-ISSN 1315-2068
Boundary Estimation with the Fuzzy Set
Regression Estimator
Estimaci´on Frontera con el Estimador de Regresi´on con Conjunto Difuso
Jes´us A. Fajardo
(jfajardogonzalez@gmail.com;jfajardo@udo.edu.ve)
Departamento de Matem´atica del N´ucleo de Sucre
Universidad de Oriente
Cuman´a 6101, Rep´ublica Bolivariana de Venezuela
Abstract
In order to extend the properties of the fuzzy set regression estimation method and provide
new results related to the nonparametric regression estimation problems not based on ker-
nels, this paper analyzes the possible boundary effects, if any, of the fuzzy set regression
estimator and presents a criterion to remove it. Moreover, a boundary fuzzy set estimator
is proposed which is defined as a particular class of fuzzy set regression estimators, where
the bias, variance, mean squared error and function that minimizes the mean squared er-
ror of the proposed estimator are given. Finally, these theoretical findings are illustrated
through some numerical examples, and with one real data example. Simulations show that
the proposed estimator has better performance at points near zero in a spread neighborhood
of the smoothing parameter, when it is compared with a general boundary kernel regression
estimator for the two regression models and two density functions considered. The previ-
ously exposed represents the natural extension of the recent results to the boundary fuzzy
set density estimator case.
Palabras y frases clave: Fuzzy set regression estimator, boundary estimation.
Resumen
Con el fin de ampliar las propiedades del etodo de estimaci´on de regresi´on con conjunto
difuso y proporcionar nuevos resultados relacionados con los problemas de estimaci´on no
param´etrica de la regresi´on no basados en ucleos, este art´ıculo analiza los posibles efectos
frontera, si los hay, del estimador de regresi´on con conjunto difuso y presenta un criterio para
eliminarlo. Adem´as, se propone un estimador frontera con conjunto difuso el cual se define
como una clase particular de estimadores de regresi´on con conjunto difuso, donde el sesgo,
la varianza, el error cuadr´atico medio y la funci´on que minimiza el error cuadr´atico medio
del estimador propuesto son presentados. Finalmente, estos resultados te´oricos se ilustran
a trav´es de algunos ejemplos num´ericos y con un ejemplo de datos reales. Las simulaciones
muestran que el estimador propuesto tiene un mejor desempe˜no en los puntos cercanos a
cero en un entorno disperso del par´ametro de suavizado, cuando se compara con un esti-
mador general frontera de la regresi´on con ucleo para los dos modelos de regresi´on y las
dos funciones de densidad consideradas. Lo expuesto anteriormente representa la extensi´on
Recibido 04/05/20023. Revisado 14/7/2023. Aceptado 12/12/2023.
MSC (2010): Primary 62G99; Secondary 62G05.
Autor de correspondencia: Jes´us Fajardo.
Boundary estimation 83
natural de los resultados recientes al caso del estimador frontera de la densidad con conjunto
difuso.
Key words and phrases: Estimador de regresi´on con conjunto difuso, estimaci´on frontera.
1 Introducci´on
Let (X
1
, Y
1
), . . . , (X
n
, Y
n
) be n independent copies of a random vector (X, Y ). The regression
model is given by
Y
i
= r(X
i
) + ε
i
, i = 1, . . . , n,
where the observation errors ε
1
, . . . , ε
n
are random variables such that
E[ε
i
|X] = 0, V ar[ε
i
|X] = σ
2
< , i = 1, . . . , n,
and unknown regression function r(t) = E[Y |X = t], for t R. The main goal of this paper is
to introduce a new method to estimate the regression function r at points near zero in a spread
neighborhood of the smoothing parameter.
Numerous nonparametric methods have been developed in the literature to estimate r, with
independent pairs of data. Nevertheless, most of those estimation methods are not consistent
when the support of r has finite endpoints, seriously affecting the overall performance of the
implemented estimators. The lack of consistency is a consequence of the so called “boundary
effects,” where the connection between the estimation methods and boundary effects is reflected
in the performances of the proposed estimators for each method. This makes their performances
at boundary points usually to differ from the performances at interior points. Theoretically, the
convergence rates of the estimators at boundary points are slower than the convergence rates at
interior points of the support of r. Strictly speaking, the bias of the estimators is of order O(b
n
)
instead of O(b
2
n
) at boundary points, where b
n
is the smoothing parameter or bandwidth (b
n
0
as n ). This imposes the need to carry out a study to detect whether the boundary effects
are present or not in the estimator of any function, since it is not obvious that the behavior of the
estimator is the same at both the boundary and the interior points. To remove those boundary
effects in kernel regression estimation, a variety of methods have been developed in the literature.
An excellent summary of some well-known methods is given in [24]. Finally, it is important to
remark here that the above phenomenon, called “boundary effects” in the estimation theory, also
affects the fuzzy set regression estimator introduced in [9].
In this paper, a criterion to remove the boundary effects, without boundary modifications,
in the nonparametric fuzzy set regression estimator is proposed, obtaining a natural extension
of the approach introduced in [8] to the regression case. For this, at each point near 0 in a b
n
spread neighborhood, the proposed boundary estimator is defined as a particular class of fuzzy
set regression estimators, where the bias, variance and optimal mean squared error (MSE) are
given. Moreover, the function that minimizes the M SE is obtained. Simultaneously, extensive
simulations are carried out to compare the local MSE of the proposed boundary estimator with
the local MSE of the general boundary estimator given in [24] at points near 0 in a b
n
spread
neighborhood, observing that the local MSE of the proposed boundary estimator is the smallest
for the two regression models and the two density functions considered. This reduction shows that
the boundary fuzzy set regression estimator has better performance than the estimator studied
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
84 Jes´us A. Fajardo
in [24]. Besides, it is appropriate to note that the above results extend the properties of the
fuzzy set regression estimation method, providing new properties related to the nonparametric
regression estimation problems not based on kernels.
The particular choice above was based mainly on the results of the simulations obtained
in [24], for the two regression models and the two density functions considered in this work,
which showed that the general boundary kernel regression estimator defined in the above paper
performed quite well when it was compared with both local linear and classical kernel regression
estimators. Among other reasons that supported the above particular choice, the theoretical
properties that are shared by the boundary estimator defined in [24] and the proposed boundary
estimator are highlighted: non-negativity, “natural boundary continuation” and they improve the
bias but holding on to the low variances. Moreover, it is worth pointing out that the paper [24]
extends the approach introduced in [17] to the regression case, by defining the popular Nadaraya-
Watson estimator, [20, 27], in terms of the boundary kernel density estimator given in [17]. It is
worth noting that the results of the simulations presented in [17] for the four shapes of densities
considered showed that the boundary kernel density estimator introduced in the above work
performed quite well when it was compared with the estimators boarded in [16, 29] and its
simple modification which allows obtaining the local linear fitting estimator [13,30]. Nonetheless,
the results of the simulations obtained in [8], for the four shapes of densities considered in [17],
showed that the boundary fuzzy set density estimator performed quite well when it was compared
with the boundary kernel estimator defined in [17]. Now, combining this last result with the
idea established in [24], it is reasonable to define an estimator of the Nadaraya-Watson type
regression function in terms of the boundary fuzzy set density estimator, in order to achieve the
objectives emphasized in this paper and to solve the problem proposed in [8]. On the other hand,
a literature review on the proposed topic revealed that there is not evidence of publications
with respect to the comparison of the performance between other methods and the method
introduced in [24]. Besides, the author guarantees a conclusion analogous to the previous one for
the fuzzy set regression estimator case. Finally, it is necessary to point out that in the recent
works [1–4, 15, 18, 19, 25] the problems of nonparametric regression estimation are studied under
specific conditions and new regression estimators are introduced through the approach of each
previous work. It should be noted that the method introduced in [15] combines the smoothing
spline and kernel functions. Nonetheless, in the papers [1, 3, 4, 18] and [2,19,25] both Nadaraya-
Watson and local linear estimators are the main actors, respectively. This last point suggests
the combination of the approaches in the works [2, 19, 25] and [7] to future research, since in [7]
was shown that the fuzzy set regression estimator has better performance than the local linear
regression smoothers.
This paper is organized as follows. In Section 2, the boundary effects in the fuzzy set regre-
ssion estimator are studied and the criterion to remove such effects is presented. Moreover, the
boundary fuzzy set regression estimator is defined and its asymptotic properties are introduced.
Besides, the function that minimizes the M SE of the proposed boundary estimator is calculated.
The simulation studies and data analysis are introducen in Sections 3 and 4, respectively. Final
comments are given in Section 5.
2 Fuzzy set regression estimator and boundary effects
A study to detect the presence or not of the boundary effects in the estimator of any function
is necessary since it is not obvious that the behavior of the estimator can be the same at the
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 85
boundary points as in the interior points. Consequently, this section analyzes the possible bounda-
ry effects, if any, of the fuzzy set regression estimator given in [9], and introduces a criterion to
remove it, without boundary modifications. Moreover, the definition of the boundary fuzzy set
regression estimator and its asymptotic properties are given. Also, the function that minimizes
the MSE of the proposed boundary estimator is obtained.
2.1 Fuzzy set estimator of the regression function
It is important to emphasize that the fuzzy set regression estimation method introduced in [9]
is based on defining an estimator of the Nadaraya-Watson type for independent pairs of data in
terms of the fuzzy set density estimator given in [10]. In order to familiarize the reader with the
above method, a general summary of the details that allowed to define the estimator introduced
in [10] will be presented.
For independent copies X
1
, . . . , X
n
of a random variable X in R, whose distribution L(X)
has a Lebesgue density f
X
near some fixed point x
0
R, a fuzzy set estimator of f
X
, at the point
x
0
R, is defined in [11], by means of thinned point processes N
ϕ
n
n
, a process framed inside the
theory of the point processes, as follows
ˆ
θ
n
(x
0
) =
N
ϕ
n
(t)
n
(R)
n a
n
=
1
n a
n
n
X
i=1
U
i
, t R,
where
N
ϕ
n
n
=
1
n a
n
n
X
i=1
U
i
ε
X
i
,
ϕ
n
(t) = P (U
i
= 1| X
i
= t), ε
X
is the random Dirac measure, a
n
> 0 is a smoothing parameter
or bandwidth such that a
n
0 as n , and the random variables U
i
, 1 i n, are
independent with values in {0, 1}, which determines whether X
i
belongs to the neighborhood of
x
0
or not. See e.g. [21], for more details on the theory of the point processes and thinned point
processes. In [11], only the asymptotic efficiency within the class of all estimators that are based
on randomly selected points from the sample X
1
, . . . , X
n
was proved. Efficiency was established
using LeCam’s LAN theory. Although the almost sure and uniform convergence properties over
a compact subset on R are not studied, the pointwise convergence in law whose distribution limit
has an asymptotic variance that depends only of f
X
(x
0
) is proposed. On the other hand, it is
opportune to point out that the random variables that define the estimator
ˆ
θ
n
do not possess, for
example, precise functional characteristics in regards to the point of estimation. This absence of
functional characteristics complicates the evaluation of the estimator using a sample. Thus, the
simulations to estimate the density function will be more complicated. However, to overcome the
difficulties presented by the estimator introduced in [11], a particular case of the above estimator
was defined in [10].
Let X be a real random variable whose distribution L(X) has a Lebesgue density f
X
near some
fixed point x
0
R. For each measurable Borel function ϕ : R [0, 1] and each random variable
V , uniformly distributed on [0, 1] and independent of X, the random variable 1I
[0(X)]
(V ) satisfies
ϕ(t) = P(1I
[0(X)]
(V ) = 1|X = t). This simple observation allows us to construct a fuzzy set
estimator of f
X
to estimate f
X
(x
0
), which satisfies the conditions required in [11]. Moreover, as
the local behavior of the distribution of X will be evaluated, it is obvious that only observations
X
i
in a neighborhood of x
0
can reasonably contribute to the estimation of f
X
(x
0
).
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
86 Jes´us A. Fajardo
Let X
1
, . . . , X
n
be an independent random sample of X. Let V
1
, . . . , V
n
be independent ran-
dom variables uniformly distributed on [0, 1] and independent of X
1
, . . . , X
n
. Let 1I
h
0
X
i
x
0
b
n
i
(V
i
)
be random variables where b
n
> 0 is a smoothing parameter or bandwidth such that b
n
0 as
n . For each t R, one obtains that
ϕ
t x
0
b
n
= P
1I
h
0
X
i
x
0
b
n
i
(V
i
) = 1| X
i
= t
,
where ϕ
n
(t) = ϕ
tx
0
b
n
is a Markov kernel (see [21], Section 1.4). Thus, for independent copies
(X
i
, V
i
), 1 i n, of (X, V ), the thinned point process is defined as follows
N
ϕ
n
n
(·) =
n
X
i=1
1I
h
0
X
i
x
0
b
n
i
(V
i
) ε
X
i
(·),
with underlying point process N
n
(·) =
P
n
i=1
ε
X
i
(·) and a thinning function ϕ
n
(see [21], Section
2.4), where ε
X
is the random Dirac measure.
On the other hand, observe that the set of observations or the events {X
i
= t}, t
R, in a neighborhood of x
0
can now be described by the thinned point process N
ϕ
n
n
, where
1I
h
0
X
i
x
0
b
n
i
(V
i
) decides, whether X
i
belongs to the neighborhood of x
0
or not. Precisely, ϕ
n
(t)
is the probability that the observation X
i
= t belongs to the neighborhood of x
0
. Note that this
neighborhood is not explicitly defined, but it is actually a fuzzy set in the sense of the paper [28],
given its membership function ϕ
n
. The thinned process N
ϕ
n
n
is therefore a fuzzy set representation
of the data (see [11], Section 2).
Next, the fuzzy set density estimator introduced in [10] is presented, which is a particular
case of the estimator proposed in [11].
Definition 1. Let X
1
, . . . , X
n
be an independent random sample of X. Let V
1
, . . . , V
n
be inde-
pendent random variables uniformly distributed on [0, 1] and independent of X
1
, . . . , X
n
. Let ϕ
be such that a
n
= b
n
R
ϕ(u) du and 0 <
R
ϕ(u) du < . Then, the fuzzy set estimator of the
density function f
X
at the point x
0
R is defined as
ˆ
ϑ
n
(x
0
) =
1
na
n
n
X
i=1
1I
h
0
X
i
x
0
b
n
i
(V
i
) =
1
na
n
τ
n
(x
0
). (1)
Observe that estimator (1) can be written in terms of a fuzzy set representation of the data, since
ˆ
ϑ
n
(x
0
) = (na
n
)
1
N
ϕ
n
n
(R). This equality justifies the fuzzy set term of estimator (1), where the
thinning function ϕ
n
can be used to select points of the sample with different probabilities, in
contrast to the kernel estimator, which assigns equal weight to all points of the sample. Moreover,
it is important to highlight that (1) is of easy practical implementation and the random variable
τ
n
(x
0
) is binomial B (n, α
n
(x
0
)) distributed with
α
n
(x
0
) = E
1I
h
0
Xx
0
b
n
i
(V )
= P
1I
h
0
Xx
0
b
n
i
(V ) = 1
= E
ϕ
X x
0
b
n

. (2)
In what follows, it will be assumed that α
n
(x
0
) (0, 1). Besides, throughout this article ϕ is the
thinning function that characterizes to estimator (1), and the abbreviation “w.t.f.” will be used
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 87
to the phrase “with thinning function.” See [6] and [10], for more details on the asymptotic and
convergence properties of estimator (1).
Next, the regression estimator of the Nadaraya-Watson type introduced in [9], for independent
pairs of data, is presented, which it is defined in terms of estimator (1).
Definition 2. Let ((X
1
, Y
1
), V
1
), . . . , ((X
n
, Y
n
), V
n
) be independent copies of a random vector
((X, Y ), V ), where V
1
, . . . , V
n
are independent random variables uniformly distributed on [0, 1],
and independent of (X
1
, Y
1
), . . . , (X
n
, Y
n
). Let ϕ be such that a
n
= b
n
R
ϕ(u) du and 0 <
R
ϕ(u) du < . Then, the fuzzy set estimator of the regression function r at the point x
0
R is
defined as
ˆr
n
(x
0
) =
P
n
i=1
Y
i
1I
0
X
i
x
0
b
n

(V
i
)
τ
n
(x
0
)
if τ
n
(x
0
) 6= 0
0 if τ
n
(x
0
) = 0,
(3)
where τ
n
is given in (1).
As in the case of estimator (1), the function ϕ characterizes estimator (3). See [5] and [9], for
more details on the asymptotic and convergence properties of estimator (3).
2.2 The boundary problem of the fuzzy set regression estimator and
its boundary estimator
In order to provide new results related to the nonparametric regression estimation problems not
based on kernels, and to ensure the natural extension of the approach to the boundary fuzzy
set density estimation case, this section studies the behavior of estimator (3), at the particular
points 0 and x (0, b
n
], under the following conditions:
C1 Functions f
X
and r are at least twice continuously differentiable in a neighborhood of
z [0, ).
C2 f
X
(z) > 0.
C3 Sequence b
n
satisfies: b
n
0, nb
n
, as n .
C4 Function ϕ is symmetrical regarding zero, has compact support on [B , B], B > 0, and it
is continuous at 0 with ϕ(0) > 0.
C5 There exists M > 0 such that |Y | < M a.s.
C6 Function φ(z) = E[Y
2
|X = z] is at least twice continuously differentiable in a neighborhood
of z [0, ).
Next, the results associated with the behavior of the estimator
ˆg
n
(z) =
1
na
n
n
X
i=1
Y
i
1I
h
0
X
i
z
b
n
i
(V
i
), z [0, ), (4)
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
88 Jes´us A. Fajardo
at the particular points 0 and x (0, b
n
], are presented. Here, g(z) = r(z)f
X
(z), a
n
= b
n
R
ϕ(u) du
and 0 <
R
ϕ(u) du < . Moreover, the function ψ will be defined as ψ(u) =
ϕ(u)
R
ϕ(u) du
1I
[B, B]
(u),
and as each x (0, b
n
] has the form x = c b
n
, 0 < c 1, in that which follows, to simplify the
notation, x will be written instead of c b
n
.
Theorem 1. Under the conditions (C1) (C4), we have
E [ˆg
n
(0)] = g(0) +
b
2
n
2
g
00
(0)
Z
u
2
ψ(u) du + o
b
2
n
.
Proof. Replacing z with 0 into (4), given that V and (X, Y ) are independent, and taking the
conditional expectation with respect to X = 0 and V = v, we obtain
E [ˆg
n
(0)] = a
1
n
Z
−∞
ϕ
u
b
n
g(u)du.
Combining (C1) and (C4), we can write the above equality as
E [ˆg
n
(0)] =
Z
B
B
b
n
ψ(u) g (ub
n
) du.
As (C1) allows us to make a the Taylor expansion of function g in the neighborhood of 0, we can
rewrite the above equality as
E [ˆg
n
(0)] =
Z
B
B
b
n
g(0) + b
n
g
0
(0)u +
b
2
n
2
u
2
g
00
(λub
n
)
ψ(u) du. (5)
Moreover, (C3) allows us to suppose, without loss of generality, that b
n
< 1. Now, we can
guarantee that for B > 0
B
b
n
> B. (6)
Combining (C3), (5) and (6), we have
E [ˆg
n
(0)] = g(0) +
b
2
n
2
Z
u
2
h
g
00
(0) +
g
00
(λub
n
) g
00
(0)
i
ψ(u) du. (7)
Now combining (C1), (C3) and (C4), we obtain
Z
u
2
h
g
00
(λub
n
) g
00
(0)
i
ψ(u) du = o(1). (8)
The result now follows by combining (7) and (8).
Corollary 1. Under the conditions (C1) (C4), we have
E
h
ˆ
ϑ
n
(0)
i
= f
X
(0) +
b
2
n
2
f
00
X
(0)
Z
u
2
ψ(u) du + o
b
2
n
.
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 89
Proof. Setting Y = 1, we have g(0) = f
X
(0) and Eg
n
(0)] = E[
ˆ
ϑ
n
(0)]. From Theorem 1, we obtain
the result.
Theorem 2. Under the conditions (C1) (C4), we have
E [ˆg
n
(x)] =
Z
B
c
g(0) + b
n
g
0
(0)u +
b
2
n
2
g
00
(0)u
2
ψ(u) du + o
b
2
n
.
Proof. Replacing z with x into (4), using that V and (X, Y ) are independent, and taking the
conditional expectation with respect to X = x and V = v, we obtain
E [ˆg
n
(x)] = a
1
n
Z
−∞
ϕ
u x
b
n
g(u) du.
Combining (C1) and (C4), we can write the above equality as
E [ˆg
n
(x)] =
Z
B
x
b
n
ψ(u) g (x + ub
n
) du.
As (C1) allows us to make a Taylor expansion of function g in the neighborhood of x, we can
rewrite the above equality as
E [ˆg
n
(x)] =
Z
B
c
g(x) + b
n
g
0
(x)u +
b
2
n
2
u
2
g
00
(x + λub
n
)
ψ(u) du. (9)
Now combining (C1) with (C3), and (C3) with (C4), we obtain
h(x) = h(0) + o (1) , for h = g, g
0
, (10)
and
Z
B
c
u
2
[g
00
(x + λub
n
) g
00
(0)] ψ(u)du = o (1) . (11)
The result now follows by combining (9), (10) and (11).
Corollary 2. Under the conditions (C1) (C4), we have
E
h
ˆ
ϑ
n
(x)
i
=
Z
B
c
f
X
(0) + b
n
f
0
X
(0)u +
b
2
n
2
f
00
X
(0)u
2
ψ(u) du + o
b
2
n
.
Proof. Setting Y = 1, we have g(x) = f
X
(x) and E[ˆg
n
(x)] = E[
ˆ
ϑ
n
(x)]. From Theorem 2, we
obtain the result.
The above results allow us to guarantee that both (1) and (4) estimators do not present the
boundary effects at 0, and they will present the boundary effects at x when h
0
(0)
R
B
c
u ψ(u) du 6=
0, for h = f
X
, g and for each c (0, 1]. Moreover, it is important to remark here that Corollaries
1 and 2 are Theorems 1 and 2 in [8].
On the other hand, it is probable enough that estimator (3) presents the above behavior at
the particular points 0 and x (0, b
n
]: estimator (3) does not present the boundary effects at
0 and it will present the boundary effects at x when r
0
(0)
R
B
c
u ψ(u) du 6= 0, for each c (0, 1].
The following two theorems allow to confirm the above suspicion.
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
90 Jes´us A. Fajardo
Theorem 3. Under the conditions (C1) (C5), we have
E [ˆr
n
(0)] = r(0) +
b
2
n
2
r
00
(0) + 2
r
0
(0)f
0
X
(0)
f
X
(0)
Z
u
2
ψ(u) du + o
b
2
n
.
Proof. Let us consider the following decomposition (see, e.g., [12])
ˆr
n
(t) =
ˆg
n
(t)
E
h
ˆ
ϑ
n
(t)
i
1
ˆ
ϑ
n
(t) E
h
ˆ
ϑ
n
(t)
i
E
h
ˆ
ϑ
n
(t)
i
+
h
ˆ
ϑ
n
(t) E
h
ˆ
ϑ
n
(t)
ii
2
h
E
h
ˆ
ϑ
n
(t)
ii
2
[ˆr
n
(t)]
1
. (12)
Replacing z with 0 into (12) and taking the expectation, we obtain
E [ˆr
n
(0)] =
E [ˆg
n
(0)]
E
h
ˆ
ϑ
n
(0)
i
A
1
h
E
h
ˆ
ϑ
n
(0)
ii
2
+
A
2
h
E
h
ˆ
ϑ
n
(0)
ii
2
,
where
A
1
= E
h
ˆ
ϑ
n
(0) E
h
ˆ
ϑ
n
(0)
i
ˆg
n
(0)
i
and A
2
= E
ˆ
ϑ
n
(0) E
h
ˆ
ϑ
n
(0)
i
2
ˆr
n
(0)
.
Combining (C3), (7) and (8), we have
E [ˆg
n
(0)] = g(0) + o (1) . (13)
Now setting Y = 1, we can write (4) as
E
h
ˆ
ϑ
n
(0)
i
= f
X
(0) + o (1) . (14)
As the random vectors ((X
i
, Y
i
), V
i
), 1 i n, are identically distributed, we have
A
1
= Cov
h
ˆg
n
(0),
ˆ
ϑ
n
(0)
i
=
1
(na
n
)
2
n
X
i=1
Cov [Y
i
U
i
, U
i
] =
1
na
n
E
Y U
a
n
1
n
E
Y U
a
n
E
U
a
n
=
1
na
n
E
ˆg
n
(0)
1
n
E
ˆg
n
(0)
E
ˆ
ϑ
n
(0)
,
where U = 1I
[0(
X
b
n
)]
(V ) = U
2
. Combining (C3), (13) and (14), we obtain
A
1
=
1
na
n
[g(0) + o (1)]
1
n
[g(0) + o (1)] [f
X
(0) + o (1)] =
1
na
n
g(0) + o
1
nb
n
.
On the other hand, by (C5) there exists C > 0 such that |ˆr
n
(0)| C. Thus, we can write
|A
2
| C E
h
ˆ
ϑ
n
(0) E[
ˆ
ϑ
n
(0)]
i
2
=
C
na
2
n
E
U
2
(E [U])
2
=
C
na
n
E
U
a
n
(1 E [U]) .
Now setting Y = 1 and combining with (7), we can write
E
U
a
n
= E
h
ˆ
ϑ
n
(0)
i
= f
X
(0) +
b
2
n
2
Z
u
2
h
f
00
X
(0)
f
00
X
(λub
n
) f
00
X
(0)
i
ψ(u) du.
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 91
Moreover, (C1) implies that f
00
X
is bounded in the neighborhood of 0, and (C3) allows us to
suppose that b
n
< 1. Then we can bound E[
U
a
n
]. Besides, by (2) we can bound (1 E[U ]).
Therefore, A
2
= O(
1
nb
n
). Now, we can write
A
1
E
h
ˆ
ϑ
n
(0)
i
2
=
1
(f
X
(0))
2
+ o(1)
!
1
nb
n
g(0) + o
1
nb
n
= o(1), by (C3),
and
A
2
E
h
ˆ
ϑ
n
(0)
i
2
=
1
(f
X
(0))
2
+ o(1)
!
O
1
nb
n
= O
1
nb
n
.
The last two equalities, imply that
E [ˆr
n
(0)] =
E [ˆg
n
(0)]
E
h
ˆ
ϑ
n
(0)
i
+ O
1
nb
n
.
Now combining (C1), (C3) and (C4), we obtain
b
2
n
2
Z
u
2
h
g
00
(λub
n
) g
00
(0)
i
ψ(u) du = o(1). (15)
Moreover, combining (7) and (15) we can write
E [ˆg
n
(0)] = g(0) +
b
2
n
2
g
00
(0)
Z
u
2
ψ(u) du,
whence
E[
ˆ
ϑ
n
(0)] = f
X
(0) +
b
2
n
2
f
00
X
(0)
Z
u
2
ψ(u)du.
Then
E [ˆr
n
(0)] =
g(0) +
b
2
n
2
g
00
(0)
R
u
2
ψ(u) du
f
X
(0) +
b
2
n
2
f
00
X
(0)
R
u
2
ψ(u) du
+ O
1
nb
n
= H
n
(0) + O
1
nb
n
.
Next, we will obtain an equivalent expression for H
n
(0). Taking the conjugate, we have
H
n
(0) =
"
g(0)f
X
(0) +
b
2
n
R
u
2
ψ(u) du
2
g
00
(0)f
X
(0) f
00
X
(0)g(0)
b
2
n
2
2
f
00
X
(0)g
00
(0)
Z
u
2
ψ(u) du
2
#
[D
n
(0)]
1
,
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
92 Jes´us A. Fajardo
where D
n
(0) = (f
X
(0))
2
b
2
n
f
00
X
(0)
R
u
2
ψ(u) du
2
2
. From (C2) and (C3), we obtain [D
n
(0)]
1
=
(f
X
(0))
2
+ o(1). Thus,
H
n
(0) =
"
g(0)f
X
(0) +
b
2
n
2
g
00
(0)f
X
(0) f
00
X
(0)g(0)
Z
u
2
ψ(u) du
b
2
n
2
2
f
00
X
(0)g
00
(0)
Z
u
2
ψ(u) du
2
#
h
(f
X
(0))
2
+ o(1)
i
= r(0) +
b
2
n
2
"
g
00
(0)f
X
(0) f
00
X
(0)g(0)
(f
X
(0))
2
#
Z
u
2
ψ(u) du + o
b
2
n
.
Then
E [ˆr
n
(0)] = r(0) +
b
2
n
2
r
00
(0) + 2
f
0
X
(0)r
0
(0)
f
X
(0)
Z
u
2
ψ(u) du + o(b
2
n
) + O
1
nb
n
.
By (C3), we have (nb
n
)
1
= o(1). The result now follows combining the last two equalities.
Theorem 4. Under the conditions (C1) (C5), we have
E [ˆr
n
(x)] = r(0) + r
0
(0) b
n
Z
B
c
u ψ(u) du +
b
2
n
2
r
00
(0) + 2
r
0
(0)f
0
X
(0)
f
X
(0)
Z
B
c
u
2
ψ(u) du
g
0
(0)f
0
X
(0) b
2
n
[f
X
(0)]
2
"
Z
B
c
u ψ(u) du
#
2
+ o
b
2
n
. (16)
Proof. Replacing z with x into (12) and taking the expectation, we obtain
E [ˆr
n
(x)] =
E [ˆg
n
(x)]
E
h
ˆ
ϑ
n
(x)
i
˙
A
1
h
E
h
ˆ
ϑ
n
(x)
ii
2
+
˙
A
2
h
E
h
ˆ
ϑ
n
(x)
ii
2
,
where
˙
A
1
= E
h
ˆ
ϑ
n
(x) E
h
ˆ
ϑ
n
(x)
i
ˆg
n
(x)
i
and
˙
A
2
= E
"
ˆ
ϑ
n
(x) E
h
ˆ
ϑ
n
(x)
i
2
ˆr
n
(x)
#
.
Combining (9) and (10), both hold true under the hypotheses of Theorem 4, we can write
E [ˆg
n
(x)] =
Z
B
c
h
g(0) + b
n
g
0
(0)u +
(b
n
u)
2
2
(g
00
(0) + [g
00
(x + λub
n
) g
00
(0)])
i
ψ(u)du. (17)
Remember that, (C3) allows us to suppose that b
n
< 1. Now, combining (C1), (C3), (C4) with
(17), we have that Eg
n
(x)] = g(0)
R
B
c
ψ(u) du + o (1), whence E[
ˆ
ϑ
n
(x)] = f
X
(0)
R
B
c
ψ(u) du +
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 93
o (1). Moreover, combining the fact that ((X
i
, Y
i
), V
i
), 1 i n, are identically distributed,
with the previous equalities and (C3), we have
˙
A
1
=
1
(na
n
)
2
n
X
i=1
Cov [Y
i
U
i
, U
i
] =
1
na
n
E
ˆg
n
(x)
1
n
E
ˆg
n
(x)
E
ˆ
ϑ
n
(x)
=
1
na
n
g(0)
Z
B
c
ψ(u) du + o
1
nb
n
,
where in this case U = 1I
h
0
Xx
b
n
i
(V ) = U
2
. On the other hand, by (C5) there exists C > 0
such that |ˆr
n
(0)| C. Thus, we can write
|
˙
A
2
| CE
h
ˆ
ϑ
n
(x) E[
ˆ
ϑ
n
(x)]
i
2
=
C
na
2
n
E
U
2
(E [U])
2
=
C
na
n
E
U
a
n
(1 E [U]) .
Now setting Y = 1 and combining with (17), we can write
E[
ˆ
ϑ
n
(x)] =
Z
B
c
f
X
(0) + b
n
f
0
X
(0)u +
b
2
n
2
u
2
f
00
X
(x + λub
n
)
ψ(u) du.
Moreover, (C1) implies that f
00
X
is bounded in the neighborhood of x, and (C3) allows us to
suppose that b
n
< 1. Thus, we can bound E
h
ˆ
ϑ
n
(x)
i
. Besides, by (2) we can bound (1 E[U]).
Then
˙
A
2
= O(
1
nb
n
). Therefore
˙
A
1
(E[
ˆ
ϑ
n
(x)])
2
= o(1) and
˙
A
2
(E[
ˆ
ϑ
n
(x)])
2
= O(
1
nb
n
). In consequence,
E [ˆr
n
(x)] =
E [ˆg
n
(x)]
E
h
ˆ
ϑ
n
(x)
i
+ O
1
na
n
.
Now combining (C1), (C3) and (C4), we obtain
b
2
n
2
Z
B
c
u
2
h
g
00
(x + λub
n
) g
00
(0)
i
ψ(u) du = o(1).
The previous equality allows us to rewrite (17) as
E[ˆg
n
(x)] =
Z
B
c
[g(0) + b
n
g
0
(0)u +
b
2
n
2
g
00
(0)u
2
]ψ(u) du,
whence
E[
ˆ
ϑ
n
(x)] =
Z
B
c
[f
X
(0) + b
n
f
0
X
(0)u +
b
2
n
2
f
00
X
(0)]ψ(u) du.
Then
E [ˆr
n
(x)] =
g(0)C
c
+ C
n,g
(0)
f
X
(0)C
c
+ C
n,f
X
(0)
+ O
1
nb
n
=
˙
H
n
(0) + O
1
nb
n
, (18)
where
C
c
=
Z
B
c
ψ(u) du
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
94 Jes´us A. Fajardo
and
C
n,q
(0) = b
n
q
0
(0)
Z
B
c
u ψ(u) du +
b
2
n
2
q
00
(0)
Z
B
c
u
2
ψ(u) du, for q = g, f
X
.
Next, an equivalent expression for
˙
H
n
(0) will be obtained. Taking the conjugate and combining
the following equalities
(i) C
n,f
X
(0) = o(1), by (C3),
(ii)
1
(f
X
(0) C
c
)
2
C
n,f
X
(0)
2
=
1
(f
X
(0) C
c
)
2
+ o(1), by (i),
(iii) C
n,g
(0)C
n,f
X
(0) = b
2
n
g
0
(0)f
0
X
(0)
"
Z
B
c
(u)du
#
2
+ ø
b
2
n
, by(C3),
(iv) g(0)f
X
(0) = r(0) [f
X
(0)]
2
, since g(x) = r(x)f
X
(x),
(v) f
X
(0)g
0
(0) g(0)f
0
X
(0) = r
0
(0) [f
X
(0)]
2
, given that r(x) =
g(x)
f
X
(x)
,
(vi) g
00
(0)f
X
(0) f
00
X
(0)g(0) = r
00
(0) [f
X
(0)]
2
+ 2f
X
(0)r
0
(0)f
0
X
(0),
we can write
˙
H
n
(0) = r(0) + r
0
(0) b
n
Z
B
c
u ψ(u) du + b
2
n
r
00
(0) + 2
r
0
(0)f
0
X
(0)
f
X
(0)
Z
B
c
u
2
ψ(u) du
+
g
0
(0)f
0
X
(0) b
2
n
[f
X
(0)]
2
"
Z
B
c
u ψ(u) du
#
2
+ o
b
2
n
.
By (C3), we have that (nb
n
)
1
= o(1). The result now follows combining (18) and the last two
equalities.
Next, the results that will allow us to eliminate the boundary effects in estimator (3) are
presented, which were used in [8], Theorems 3 and 4, to eliminate the boundary effects in estimator
(1). The following theorem in particular will allow us to control suitably the constants that define
the bias of estimator (3) and it justifies a condition in the criterion introduced to eliminate the
terms with coefficients b
n
in all the expressions above.
Theorem 5. Under condition (C4), we have that for M > 0 there exists B
0
> 0 such that
v =
Z
B
0
B
0
u
2
ψ(u) du M.
Proof. Similar to the proof of Theorem 3 in [8].
Observe that combining (C4) and Theorem 5 we obtain
v =
Z
B
0
B
0
u
2
ψ(u) du =
Z
u
2
ψ(u) du M,
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 95
for B
0
> B. Now, we can redefine ψ as
ψ(u) =
ϕ(u)
R
ϕ(u) du
1I
[B
0
,B
0
]
(u), B
0
B. (19)
To give a simple and effective solution to the boundary problem, without boundary corrections,
a criterion will be implemented which will allow us to eliminate the terms with coefficients b
n
in
all above expressions, making
R
B
c
u ψ(u) du = 0 for each c (0, b
n
]. Such criterion is based on
deriving a thinning function ϕ as the solution to the following variational problem:
Maximizing :
Z
ϕ(u) du.
Subject to :
Z
ϕ
2
(u) du = k,
Z
u ϕ(u) du = 0, (VP)
Z
u
2
v
ϕ(u) du = 0,
k > 0, ϕ(u) = 0 for u (B, B)
c
, ϕ(0) > 0, ϕ(u) [0, 1], v (0, M].
Theorem 6. The solution of (VP) is given by
ϕ
k
(u) =
"
1
16
15 k
2
u
2
#
1I
[
15
16
k,
15
16
k
]
(u), k > 0. (20)
In particular, for each c (0, b
n
] we have
ϕ
c
(u) =
"
1
16
15 c
2
u
2
#
1I
[
15
16
c,
15
16
c
]
(u). (21)
Proof. Similar to the proof of Theorem 4 in [8].
From (16) w.t.f. ϕ
c
we obtain
E [ˆr
n
(x)] = r(0) +
b
2
n
2
r
00
(0) + 2
r
0
(0)f
0
X
(0)
f
X
(0)
Z
B
0
B
0
u
2
ψ
c
(u) du + o
b
2
n
, (22)
where 0 < c 1, 0 < B
0
15
16
c, ψ
c
is given by (19) w.t.f. ϕ
c
, and ϕ
c
is given by (21).
Thus, estimator (3) does not present the boundary effects at x when the thinning function is ϕ
c
.
Moreover, combining Theorem 3 with Theorem 3.1 in [5], we obtain
E [ˆr
n
(z)] = r(z) +
b
2
n
2
r
00
(z) + 2
r
0
(z)f
0
X
(z)
f
X
(z)
Z
B
0
B
0
u
2
ψ
k
(u) du + o
b
2
n
,
for each k > 1 and z {0} (b
n
, ), where B
0
15
16
k, ψ
k
is given by (19) w.t.f. ϕ
k
, and ϕ
k
is
given by (20). Thus, estimator (3) does not present the boundary effects at z {0} (b
n
, )
when the thinning function is ϕ
k
, for each k > 1.
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
96 Jes´us A. Fajardo
On the other hand, denoting the Epanechnikov kernel by K
E
, and replacing k with
5
3
and
M with M
E
=
R
u
2
K
E
(u) du into (V P ), we have that estimator (3) w.t.f. ϕ
5
3
, is the estimator
studied in [5]. That is, the estimator introduced in [5] is a particular case of the class of estimators
defined by (3) w.t.f. ϕ
k
, for each k > 1. Moreover, the results obtained in [5] allow to guarantee
that ϕ
k
minimize M SE[ˆr
n
(z)], for each k > 1 and z {0} (b
n
, ). Particularly in [5], it was
shown that MSE [ˆr
n
(t)] MSE
h
ˆr
NW
n
(t)
i
for each t R, where ˆr
n
is given by (3) w.t.f. ϕ
5
3
and
ˆr
NW
n
is the classical kernel regression estimator.
Next, the boundary estimator introduced in [8] and the main result of the above article are
presented. The inclusion of the above results is based on the following two reasons. First,
because the boundary regression estimator will be defined in terms of the estimator proposed
in [8]. Second, to highlight two asymptotic properties of the boundary estimator defined in [8].
Definition 3. Let X
1
, . . . , X
n
be an independent random sample of X. Let V
1
, . . . , V
n
be inde-
pendent random variables uniformly distributed on [0, 1] and independent of X
1
, . . . , X
n
. Then,
the boundary fuzzy set estimator of the density function f
X
at the point x (0, b
n
] is defined as
˜
ϑ
n
(x) =
1
n a
n
n
X
i=1
1I
h
0, ϕ
c
X
i
x
b
n
i
(V
i
) =
1
na
n
τ
(c)
n
(x), (23)
where 0 < c 1, a
n
= b
n
R
ϕ
c
(u) du, and ϕ
c
is given by (21).
Theorem 7. Under conditions (C1)-(C3), we have
E
h
˜
ϑ
n
(x) f
X
(0)
i
=
b
2
n
2
f
00
X
(0)
Z
B
0
B
0
u
2
ψ
c
(u) du + o
b
2
n
,
and
V ar
h
˜
ϑ
n
(x)
i
=
1
n b
n
R
ϕ
c
(u) du
f
X
(0) + o
1
n b
n
,
where 0 < c 1, 0 < B
0
15
16
c, ψ
c
is given by (19) w.t.f. ϕ
c
, and ϕ
c
is given by (21).
Next, the boundary fuzzy set estimator of r and the main result of this paper are presented.
Definition 4. Let ((X
1
, Y
1
), V
1
), . . . , ((X
n
, Y
n
), V
n
) be independent copies of a random vector
((X, Y ), V ), where V
1
, . . . , V
n
are independent random variables uniformly distributed on [0, 1],
and independent of (X
1
, Y
1
), . . . , (X
n
, Y
n
). Then, the boundary fuzzy set estimator of the regressi-
on function r at the point x (0, b
n
] is defined as
˜r
n
(x) =
P
n
i=1
Y
i
1I
0
c
X
i
x
b
n

(V
i
)
τ
(c)
n
(x)
if τ
(c)
n
(x) 6= 0
0 if τ
(c)
n
(x) = 0,
(24)
where 0 < c 1, 0 < B
0
15
16
c, τ
(c)
n
is given in (23), and ϕ
c
is given by (21).
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 97
Lemma 1. Under the conditions (C1) (C6), we have
E [˜r
n
(x) r(0)] =
b
2
n
2
r
00
(0) + 2
f
0
X
(0) r
0
(0)
f
X
(0)
Z
B
0
B
0
u
2
ψ
c
(u)du + o
b
2
n
and
V ar [˜r
n
(x)] =
1
n b
n
R
ϕ
c
(u) du
φ(0) r
2
(0)
f
X
(0)
+ o
1
n b
n
,
where 0 < c 1, 0 < B
0
15
16
c, ψ
c
is given by (19) w.t.f. ϕ
c
, and ϕ
c
is given by (21).
Proof. From (3) w.t.f. ϕ
c
, we have ˜r
n
(x) = ˆr
n
(x). Now, combining the above equality with (22),
we obtain the expression for E [˜r
n
(x) r(0)]. For the proof of V ar [˜r
n
(x)], follow [5] considering
ϕ
c
as thinning function, keeping in mind that q(x) = q(0) + o(1), for q = f
X
, r, φ.
For z b
n
, ˜r
n
(z) reduces to the fuzzy set regression estimator ˆr
n
(z) given by (3) w.t.f. ϕ
1
.
Thus, ˜r
n
(z) is a natural boundary continuation of estimator (3) w.t.f. ϕ
1
. Moreover, the results
obtained in [5] allow us to guarantee that the thinning function ϕ
c
locally minimizes MSE[˜r
n
(x)],
for each c (0, 1].
Next, asymptotic formulae for the optimal smoothing parameter and optimal mean square
error of (24), b
n
and MSE
, are presented, which are an immediate consequence of Lemma 1, and
they guarantee that estimator (24) is locally optimal in terms of rates of convergence (see [26]).
Corollary 3. Under the conditions (C1) (C6), we have
b
n
= n
1/5
R
ϕ
c
(u) du
1
φ(0)r
2
(0)
f
X
(0)
r
00
(0) + 2
f
0
X
(0) r
0
(0)
f
X
(0)
2
R
B
0
B
0
u
2
ψ
c
(u)du
2
1/5
and
MSE
[˜r
n
( ˙x)] =
5
4
n
4/5
φ(0)r
2
(0)
f
X
(0)
4
r
00
(0) + 2
f
0
X
(0) r
0
(0)
f
X
(0)
2
R
B
0
B
0
u
2
ψ
c
(u)du
2
R
ϕ
c
(u) du
4
1/5
,
where 0 < c 1, ˙x = c b
n
, 0 < B
0
15
16
c, ψ
c
is given by (19) w.t.f. ϕ
c
, and ϕ
c
is given by (21).
3 Simulations results
In this section some simulation results are presented, which are only designed to illustrate the
performance of (24) at points near 0 in a b
n
spread neighborhood. For purposes of comparison,
the estimator introduced in [24] was also considered. It is important to remark here that, the
particular choice above was based mainly on the results of the simulations obtained in [24], for
the two regression models and two densities considered in this section, which showed that the
boundary kernel regression estimator performed quite well when it was compared with the local
linear and ˆr
NW
n
estimators. Among other reasons that sustained the above particular choice,
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
98 Jes´us A. Fajardo
the theoretical properties that are shared by the boundary kernel regression estimator defined
in [24] and the proposed boundary fuzzy set regression estimator are highlighted: non-negativity,
“natural boundary continuation” and they improve the bias but holding on to the low variances.
Properties not justified will be verified in this section. The author considers that the previous
comments and discussion motivated by the literature review results, justify considering only the
estimator given by [24] to develop the simulations.
The general boundary kernel estimator introduced in [24] is defined as
˜r
K
n
(l) =
P
n
i=1
Y
i
n
K
l+w
1
(X
i
)
h
n
+ K
lw
1
(X
i
)
h
n
o
P
n
i=1
n
K
l+w
2
(X
i
)
h
n
+ K
lw
2
(X
i
)
h
n
o
, (25)
where l = s h
n
, 0 s 1, h
n
is the smoothing parameter and K is a kernel function of order 2.
Moreover, w
k
is a transformation defined as
w
k
(y) = y +
1
2
d
k
y
2
+ λ
0
(d
k
)
2
y
3
, k = 1, 2,
where
d
1
= w
00
1
(0) =
g
0
(0)
g(0)
D
K,s
, d
2
= w
00
2
(0) =
f
0
X
(0)
f
X
(0)
D
K,s
,
and λ
0
is a constant such that 12λ
0
> 1, with
D
K,s
=
2
R
1
s
(u s)K(u) du
2
R
1
s
(u s)K(u) du + s
.
To assess the effect of the boundary estimators at points near 0 in a b
n
spread neighborhood, the
following models are studied:
Model 1 : r
1
(z) = 2 z + 1 and Model 2 : r
2
(z) = 2 z
2
+ 3 z + 1,
where z [0, ) and errors ε
j
, are assumed to be standard normally distributed independent
random variables. Likewise, consider two cases of density f
X
with support [0, ):
Density 1 : f
1
X
(z) = exp(z) and Density 2 : f
2
X
(z) =
2
π(1 + z
2
)
.
It is important to emphasize that, for z h
n
the estimator (25) reduces to the classical kernel
estimator ˆr
NW
n
. Thus, (25) is a natural boundary continuation of ˆr
NW
n
(see [24], Section 2).
On the other hand, the following optimal global smoothing parameters were implemented as
the smoothing parameters of both (24) and (25) estimators (see Theorems 3.1 and 2.4.1 in [5]
and [12], respectively):
b
n
=
C
n
R
ϕ
5
3
(u) du
1
R
u
2
ψ
5
3
(u) du
2
1/5
(26)
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 99
and
h
n
=
"
C
n
R
K
2
E
(u)du
R
u
2
K
E
(u)du
2
#
1/5
, (27)
where
C =
R
φ(u)r
2
(u)
f
X
(u)
du
R
r
00
(u) + 2
f
0
X
(u) r
0
(u)
f
X
(u)
2
du
, (28)
ϕ
5
3
is given by (20), ψ
5
3
is given by (19) w.t.f. ϕ
5
3
, and v =
R
u
2
ψ
5
3
(u)du
R
u
2
K
E
(u) du = M
E
(see [5], Section 3).
It is important to emphasize that the estimator (3) w.t.f. ϕ
5
3
has better performance than the
estimator ˆr
NW
n
(see [5]). Moreover, the reason for using an optimal global smoothing parameter
as the smoothing parameter is that the comparisons based on the optimal smoothing parameter
are more convincing than comparisons based on approximated smoothing parameters, which—
because of the quality or otherwise of the smoothing parameter selection method—might be
misleading. Also, a global rather than local smoothing parameter choice is made, because this is
much more likely to be used in applications.
The following formulas and values are used in all simulations, K
E
(u) =
3
4
(1 x
2
)1I
[1,1]
(u),
λ
0
= 0.0933, v = M
E
/4, s = 0.1, 0.2, 0.3, 0.4, 0.5 and c = (h
n
/b
n
) s, where the smoothing parame-
ters b
n
and h
n
are given by (26) and (27), respectively. Moreover, the simulated sample sizes are
n = 50 and n = 500, and all results are calculated by averaging over 1000 trials. Simultaneously,
for each regression model and each density, the absolute bias (absolute value of the estimated
value minus the true value) and the MSE of both estimators are calculated, at the points x = c b
n
and l = s h
n
of the following boundary regions (0, b
n
) and (0, h
n
). Nonetheless, the comparison
of both (24) and (25) estimators through the mean integrated squared error is not convenient,
since to the two regression models and two densities considered the boundary regions satisfy the
following property (0, h
n
] (0, b
n
]. The results are shown in Tables 3 and 3. Furthermore, a
close examination of Tables 3 and 3 allows us to see that for the two regression models and two
densities considered the proposed boundary estimator is the best in terms of MSE at points near
0 in a b
n
spread neighborhood, since it has low bias and extremely low variance, guaranteeing that
MSE [˜r
n
(x)] M SE[˜r
K
n
(x)] at the points x near 0 in a b
n
spread neighborhood. Additiona-
lly, these properties are apparently preserved over the rest of the boundary regions. Thus, the
proposed boundary estimator outperforms the estimator introduced in [24].
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
100 Jes´us A. Fajardo
Table 1: Bias and MSE of the indicated regression estimators at boundary, case of density 1.
˜r
n
Model 1 ˜r
K
n
b
n
= 3.5045 n = 50 h
n
= 2.1047
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0011 0.0560 0.1332 0.2454 0.3812 0.1726 0.4102 0.5823 0.7552 0.9440
MSE MSE
0.0000 0.0031 0.0177 0.0602 0.1453 0.0298 0.1682 0.3391 0.5703 0.8911
b
n
= 2.2112 n = 500 h
n
= 1.3280
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0101 0.0240 0.0528 0.0963 0.1515 0.2347 0.2842 0.3228 0.3807 0.4496
MSE MSE
0.0001 0.0006 0.0028 0.0093 0.0230 0.0551 0.0807 0.1042 0.1449 0.2022
˜r
n
Model 2 ˜r
K
n
b
n
= 1.2978 n = 50 h
n
= 0.7794
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0383 0.0200 0.0225 0.0408 0.0579 0.1667 0.2058 0.2433 0.2762 0.3045
MSE MSE
0.0015 0.0004 0.0005 0.0017 0.0034 0.0278 0.0423 0.0592 0.0763 0.0927
b
n
= 0.8189 n = 500 h
n
= 0.4918
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0036 0.0012 0.0062 0.0111 0.0230 0.0727 0.0907 0.1047 0.1133 0.1162
MSE MSE
0.0000 0.0000 0.0000 0.0001 0.0005 0.0053 0.0082 0.0110 0.0128 0.0135
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 101
Table 2: Bias and MSE of the indicated regression estimators at boundary, case of density 2.
˜r
n
Model 1 ˜r
K
n
b
n
= 2.1614 n = 50 h
n
= 1.2980
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0201 0.0148 0.0359 0.0734 0.1233 0.0996 0.2173 0.3078 0.3800 0.4398
MSE MSE
0.0004 0.0002 0.0013 0.0054 0.0152 0.0099 0.0472 0.0947 0.1444 0.1934
b
n
= 1.3637 n = 500 h
n
= 0.8190
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0050 0.0017 0.0042 0.0179 0.0378 0.0652 0.1161 0.1502 0.1702 0.1801
MSE MSE
0.0000 0.0000 0.0000 0.0003 0.0014 0.0042 0.0135 0.0226 0.0290 0.0324
˜r
n
Model 2 ˜r
K
n
b
n
= 1.0466 n = 50 h
n
= 0.6285
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0710 0.0309 0.0284 0.0088 0.0132 0.0720 0.1305 0.1670 0.1806 0.1708
MSE MSE
0.0050 0.0010 0.0008 0.0001 0.0002 0.0052 0.0170 0.0279 0.0326 0.0292
b
n
= 0.6603 n = 500 h
n
= 0.3966
c s
0.0601 0.1201 0.1802 0.2402 0.3003 0.1000 0.2000 0.3000 0.4000 0.5000
| Bias | | Bias |
0.0064 0.0040 0.0013 0.0028 0.0088 0.0333 0.0579 0.0697 0.0678 0.0544
MSE MSE
0.0000 0.0000 0.0000 0.0000 0.0001 0.0011 0.0033 0.0049 0.0046 0.0030
4 Data analysis
The proposed estimator was tested over one well-known real dataset to demonstrate its usefulness
in practical applications. For purposes of comparison, the particular classical kernel regression
estimator introduced in [22] was considered. With respect to this last point, a brief discussion
will be developed at the end of this section. The real dataset is the Motorcycle data found in
Appendix 2-Table 1 of [14], where the experiment, a simulated motorcycle crash, is described in
detail in [23]. It has n = 133 observations of X, where the X-values denote time (in milliseconds)
after a simulated impact with motorcycles and the response variable Y is the head acceleration
(in g) of a post mortem human test object. To facilitate the comparison of the effectiveness of
estimators (24), ˆr
NW
n
, (3) w.t.f. ϕ
5
3
, and (3) w.t.f. ϕ
1
, the performance of each estimator will
be graphed over the dataset through an average of 1000 trials. Moreover, for the fixed random
sample, X
1
, . . . , X
n
, a sample of independent random variables V
(d)
1
, . . . , V
(d)
n
will be used, where
V
(d)
i
is uniformly distributed on [0, 1] for each i, d, 1 i n and 1 d 1000.
In order to simulate the performance of the estimator ˆr
NW
n
, the idea in [22] will be adopted
with smoothing parameter h
n
= 2.40, and quartic kernel K
Q
(u) = (16/25)(1 u
2
)
2
1I
[1,1]
(u)
instead of K
E
. Simultaneously, for the fuzzy set estimation the smoothing parameter (26) will
be implemented, with v = M
Q
/2, where the approximate value of constant (28) was calculated
through (27) using h
n
= 2.40 and K
Q
instead of K
E
. Here M
Q
=
R
u
2
K
Q
(u) du. Figures 4 and 4
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
102 Jes´us A. Fajardo
show the simulation results together with the data points on [0, 57.60] and [0, 6.50], respectively.
From Figures 4 and 4, the estimator (3) w.t.f. ϕ
5
3
does not appear to offer an appropriate
approximation on large part of the region [2.92, 57.60], when compared with the similar results
produced by both (3) w.t.f. ϕ
1
and ˆr
NW
n
estimators. At the same time, Figure 4 shows that the
presence of the boundary problem in (3) w.t.f. ϕ
5
3
was removed by (24), where both (24) and
ˆr
NW
n
estimators fail miserably over their respective boundary regions, and only the estimator (24)
follows the data very closely on the left side of the picture, by making the curve be constant,
equal zero, over the region [0, 1.305].
Finally, it is necessary to point out that [24] does not present a study of simulations with real
data, and within the context of the above paper an appropriate criterion for the estimation of
the parameters d
1
and d
2
was not proposed. The previously exposed explains the absence, in this
section, of a study related to the graphical comparison of estimators (25) and (24). Remember
that, within the proposed objectives in this paper only the estimation of the parameters associated
with the fuzzy set estimation method will be considered. Consequently, the subjective choice of
the values of the parameters d
1
and d
2
would be incorrect, since it could benefit the behavior of
(24) and impair the behavior of (25), which would produce unreliable results.
Nueva_Completa-eps-converted-to.pdf
—— ˜r
n
-.-.-.-. ˆr
NW
n
pppppppppppp ˆr
n
w.t.f. ϕ
5
3
··· · · ˆr
n
w.t.f. ϕ
1
Figure 1: Regression estimates to the motorcycle data at the points inside the region [0, 57.60]
using smoothing parameters h
n
= 2.40 and b
n
= 2.92, where v =
M
Q
2
. The circles indicate the
raw data.
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 103
Nueva_Boundary-eps-converted-to.pdf
—— ˜r
n
-.-.-.-. ˆr
NW
n
pppppppppppp ˆr
n
w.t.f. ϕ
5
3
· · · · · ˆr
n
w.t.f. ϕ
1
Figure 2: Regression estimates to the motorcycle data at the points inside the region [0, 6.5] using
smoothing parameters h
n
= 2.40 and b
n
= 2.92, where v =
M
Q
2
. The circles indicate the raw
data.
5 Final remarks
In this paper, a new contribution in the area of regression estimation not based on kernels
is presented, obtaining a natural extension of the results introduced in [8]. In particular, the
boundary effects in the fuzzy set regression estimator are studied and removed, where it does not
require boundary modifications to eliminate such effects, unlike most well-known kernel regression
estimators. Moreover, among the desirable properties of the boundary fuzzy set estimator the
non-negativity of the estimator is highlighted, as well as its performance which is generally very
robust with respect to various shapes of regression and density functions, since it allows important
reductions of the bias, while maintaining low variance. It is clear that no single existing estimator
in the literature dominates all the others for all shapes of regression and density functions. Each
estimator has certain advantages and works well at certain times. However, for the two regression
models and the two density functions considered, the boundary fuzzy set regression estimator
has better performance than the boundary regression estimator introduced in [24] at points near
zero in a spread neighborhood of the smoothing parameter.
On the other hand, it is appropriate to highlight the important role that the thinning function
plays in the results obtained, since its adequate construction allowed to eliminate the boundary
effects in the fuzzy set regression estimator, giving each boundary point the least approximation
between the boundary fuzzy set estimator and regression function. Finally, it is necessary to
point out that through the thinning point process that describes the fuzzy set density estimation
method, the set of observations considered can be characterized in a neighborhood of the esti-
mation point, where the indicator functions that define the fuzzy set density estimator decide
whether observation belongs on the neighborhood of the estimation point or not, and the thinning
functions that define each indicator function are used to select the sample points with different
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
104 Jes´us A. Fajardo
probabilities, in contrast to the kernel estimators which assign equal weight to all points of the
sample.
6 Acknowledgements
This research has been supported by a grant from the Academia de Ciencias de Am´erica Latina-
ACAL.
References
[1] T. H. Ali, H. A. A.-M. Hayawi and D. S. I. Botani (2021). Estimation of the bandwidth pa-
rameter in Nadaraya-Watson kernel non-parametric regression based on universal threshold
level, Commun. Stat. - Simul. Comput., 1–34. DOI: 10.1080/03610918.2021.1884719
[2] L. R. Cheruiyot (2020). Local Linear Regression Estimator on the Boundary
Correction in Nonparametric Regression Estimation, JSTA, 19 (3), 460–471. DOI:
10.2991/jsta.d.201016.001
[3] L. R. Cheruiyot, O. R. Otieno and G. O. Orwa (2019). A Boundary Corrected Non-
Parametric Regression Estimator for Finite Population Total,Int. J. Probab. Stat., 8 (3),
1–83. DOI: 10.5539/ijsp.v8n3p83
[4] F. Comte, and N. Marie (2021). On a Nadaraya-Watson estimator with two bandwidths,
Electron. J. Statist, 15 (1), 2566–2607. DOI: 10.1214/21-EJS1849
[5] J. Fajardo (2012). A Criterion for the Fuzzy Set Estimation of the Regression Function, J.
Prob. Stat., 2012. DOI: 10.1155/2012/593036
[6] J. Fajardo (2014). A criterion for the fuzzy set estimation of the density function, Braz. J.
Prob. Stat., 28 (3), 301–312. DOI: 10.1214/12-BJPS208
[7] J. Fajardo (2018). Comparison of the Mean Square Errors of the Fuzzy Set Regression
Estimator and Local Linear Regression Smoothers, Int. J. Math. Stat., 19 (1), 74–89.
http://www.ceser.in/ceserp/index.php/ijms/article/view/5355
[8] J. Fajardo and P. Harmath (2021). Boundary estimation with the fuzzy set density estimator,
METRON, 79, 285–302. DOI: 10.1007/s40300-021-00210-z
[9] J. Fajardo R. R´ıos and L. Rodr´ıguez (2010). Properties of convergence of a fuzzy set estimator
of the regression function, J. Stat., Adv. Theory Appl., 3 (2), 79–112. Zbl 05800441
[10] J. Fajardo, R. R´ıos and L. Rodr´ıguez (2012). Properties of convergence of a fuzzy set estima-
tor of the density function, Braz. J. Prob. Stat., 26 (2), 208–217. DOI: 10.1214/10-BJPS130
[11] M. Falk and F. Liese (1998). Lan of thinned empirical processes with an application to fuzzy
set density estimation, Extremes, 1 (3), 323–349. DOI: 10.1023/A:1009981817526
[12] F. Ferraty, V. N´u˜nez-Anon, and P. Vieu (2001). Regresi´on No Param´etrica: Desde la Di-
mensi´on uno Hasta la Dimensi´on Infinita, Servicio Editorial de la Universidad del Pais
Vasco, Bilbao.
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
Boundary estimation 105
[13] P. Hall and B. U. Park (2002). New methods for bias correction at endpoints and boundaries,
Ann. Statist., 30 (5), 1460–1479. DOI: 10.1214/aos/1035844983
[14] W. ardle (1990). Applied nonparametric regression, Cambridge University Press.
[15] R. Hidayat, I. N. Budiantara, B. W. Otok and V. Ratnasari (2021). The regression curve
estimation by using mixed smoothing spline and kernel (MsS-K) model, Commun. Stat.
Theory Methods, 50 (17), 3942–3953. DOI: 10.1080/03610926.2019.1710201
[16] M. C. Jones and P. J. Foster (1996). A simple nonnegative boundary co-
rrection method for kernel density estimation, Stat. Sinica, 6 (4), 1005–1013.
http://www3.stat.sinica.edu.tw/statistica/j6n4/j6n414/j6n414.htm
[17] R. J. Karunamuni and T. Alberts (2005). On boundary correction in kernel density estima-
tion, Stat. Methodol., 2 (3), 191–212. DOI: 10.1016/j.stamet.2005.04.001
[18] B. C. Kouassi, O. Hili and E. Katchekpele (2021). Nadaraya-Watson estimation of a non-
parametric autoregressive mode, MJM, 9 (4), 251–258. DOI: 10.26637/mjm904/009
[19] Y. Linke, I. Borisov, P. Ruzankin, V. Kutsenko, E. Yarovaya and S. Shalnova (2022). Uni-
versal Local Linear Kernel Estimators in Nonparametric Regression, Mathematics, 10 (15),
2693. DOI: 10.3390/math10152693
[20] E. A. Nadaraya (1964). On estimating regression, Theory Probab. Appl., 9, 141–142. DOI:
10.1137/1109020
[21] R. D. Reiss (1993). A Course on Point Processes, Springer Series in Statistics, New York.
[22] I. Salgado-Ugarte, M. Shimizu and T. Taniuchi (1996). Nonparametric regression:
Kernel, WARP, and k-NN estimators, S.T.B., 5. https://EconPapers.repec.org/Re-
PEc:tsj:stbull:y:1996:v:5:i:30:snp10
[23] G. Schmidt, R. Mattern and F. Sch¨uler (1981). Biomechanical investigation to determine
physical and traumatological differentiation criteria for the maximum load capacity of head
and vertebral column with and without protective helmet under effects of impact. EEC Re-
search Program on Biomechanics of Impac, Final Report Phase III, Project 65, Institut f¨ur
Rechtsmedizin, Universit¨at Heidelberg.
[24] K. Souraya, A. Sayah and Y. Djabrane (2015). General method of boundary correction in
kernel regression estimation, Afr. Stat., 10, 739–750. DOI: 10.16929/as/2015.729.66
[25] I. Sriliana, I . N. Budiantara and V. Ratnasari (2022). A Truncated Spline and Local Linear
Mixed Estimator in Nonparametric Regression for Longitudinal Data and Its Application,
Symmetry, 14 (12), 2687. DOI: 10.3390/sym14122687
[26] C. J. Stone (1980). Optimal rates of convergence for nonparametric estimators, Ann. Statist.,
8, 1348–1360. DOI: 10.1214/aos/1176345206
[27] G. S. Watson (1964). Smooth regression analysis, Sankhy¯a, Ser. A, 26, 359–372.
https://www.jstor.org/stable/25049340
[28] L. A. Zadeh (1965). Fuzzy sets, Inform. Control, 8, 338–353.
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106
106 Jes´us A. Fajardo
[29] S. Zhang and R. J. Karunamuni (1998). On kernel density estimation near endpoints, J.
Stat. Plan. Infer., 70 (2), 301–316. DOI: 10.1016/S0378-3758(97)00187-0
[30] S. Zhang, R. J. Karunamuni and M. C. Jones (1999). An Improved Estimator of the
Density Function at the Boundary, J. Am. Stat. Assoc. 94 (448), 1231–1240. DOI:
10.1080/01621459.1999.10473876
Divulgaciones Matem´aticas Vol. 23-24, No. 1-2 (2022-2023), pp. 82–106