Convolution and Differentiation of Distributions

If F is a distribution and g is a function, why is (DF)*g equal to F*Dg?

Steve Trettel

| Analysis

I’ll review (extremely rapidly!) to fix notation. We equip the vector space Cc of smooth compactly supported real valued functions on Rn with the following topology: a sequence ϕn converges to ϕ if the ϕn and all their derivatives converge to ϕ and all of its derivatives with respect to the norm ||ϕ||=Rn|ϕ|dvol. The topological dual of D of Cc is the set of distributions on Rn, or continuous linear functionals CcR.

Every smooth function fCc naturally gives rise to a distribution FD via integration, fFF(ϕ)=Rnfϕdvol, but not all distributions are of this form. Important examples are given by the delta distributions: for any pRn we define δpD by δp(ϕ):=ϕ(p), and we write δ for δ0.

The set of distributions is closed under differentiation, where the derivative of a distribution F is defined by its action on a function ϕ in analogy to integration by parts. When M=R this lets us define the first derivative F of F by F(ϕ):=F(ϕ). More generally, if is some differential operator on Cc we define on D by F(ϕ):=F(ϕ)

Distributions can be multiplied by smooth functions: if ψ:RnR is smooth and FD, we define ψF to be the distribution such that (ψF)(ϕ)=F(ψϕ) for all ϕCc. Distributions can also be convolved with functions in Cc, but this operation now yields smooth functions rather than distributions: if FD and gCc, their convolutional product Fg is defined below, where g(p) is the function xg(px). Fg:pF(g(p))

As an example, if ϕCc, we compute the value of δϕ at pRn aa (δϕ)(p):=δ(ϕ(p))=ϕ(p0)=ϕ(p) Thus, δϕ=ϕ, and colvolution with δ realizes the identity operator on Cc.

Differentiation and convolution interact in a particularly simple way, if is a linear differential operator and FD, gCc, then we may differentiate the function Fg by (Fg)=(F)g=F(g)

Once you know this line of equalities, the motivation for introducing distributions to solve PDE’s becomes clear! The general goal is to calculate (hopefully easier to find) distributional solutions to a differential equation to actual, real-valued function solutions through convolution.If is a differential operator we say a distribution F is a fundamental solution for if F=δ Such a fundamental solution lets us find a real solution to the differential equation u=g, with gCc by simply taking u=Fg, as we easily confirm:

(Fg)=(F)g=δg=g

Of course, this (crucially!) relies on the fact that (Fg)=(F)g, and recently I realized I had completely forgotten how to prove this fact! Luckily, shortly after Daniel O’Connor showed me how it works, and so I want to write it down for the next time that I forget.

Proving (Fg)=(F)g=F(g)

To prove this, we start small and build up to the general case. The interesting part is actually this small beginning however, the rest is just packaging.

Theorem 1: On R, suppose FD and gCc. Then Fg is differentiable and (Fg)=(F)g.

Proof: Here’s Daniel’s argument: Let xR, h0 and consider the difference quotient ψh(x):=(Fg)(x+h)(Fg)(x)h If the limit limh0ψh(x) exists, then Fg is differentiable at x. Using the definition of Fg and the linearity of F, we may evaluate this as ψh(x)=F(g(x+h))F(g(x))h  =F(g(x+h)g(x)h)

Using the continuity of F, we may take the limit inside, and so limh0ψh(x)=F(limh0g(x+h)g(x)h).

The quantity inside of F attempts to assign to each pR the value plimh0g(x+hp)g(xp)h=g(xp) so in our notation, this is the function g(x), which itself is in Cc as g was. Thus, (Fg)(x) exists, and (Fg)(x)=F(g(x)). But this new term is exactly the definition of F convolved with g, when evaluated at x! Thus as functions, we have shown (Fg)=Fg

This is half of what we want, but the rest is just a straightforward application of the definition of the distributional derivative. By definition, F is the linear functional such that F(ϕ)=F(ϕ) for all ϕCc, so computing (F)g, we see for xR (Fg)(x)=F(g(x)):=F(g(x))

Where g(x) is the function sending pddpg(xp). Computing this derivative with the chain rule shows g(x)=g(x), and so F(g(x))=F(g(x))=F(g(x))

Stringing all this together, we see (Fg)(x)=F(g(x)), where we recognize this second term as defining the convolution Fg(x). As this equality holds for all xR we have equality between functions: Fg=Fg

Combining with our earlier result proves the theorem, as we have shown both (Fg) and Fg are equal to Fg.

Corollary 2: Let Dk be the kth derivative operator on Cc(R). Then for any FD and gCc, the convolution Fg is k times differentiable and Dk(Fg)=(DkF)g=F(Dkg)

Proof: We can proceed inductively using Theorem 1, as Dk=DDk1 is the k-fold composition of the first derivative operator.

Lemma 3: On Rn, let x denote the directional derivative with respect to the first coordinate. Then for any FD and gCc, x(Fg) is smooth, and x(Fg)=(xF)g=F(xg).

Proof: The proof is exactly analogous to the one dimensional case in Theorem 1, so we can proceed rather quickly. It suffices to check this equality holds at an arbitrary fixed pRn, where x(Fg)(p)=limh0(Fg)(p+he1)(Fg)(p)h Evaluatiting the convolutions and using the linearity and continuity of F shows this to be F(xg(p)), which is the convolution of F with xg evaluated at p. Thus, x(Fg)=F(xg). The second equality again follows simply by starting the definition of xF to compute (xF)g at p, resulting in (xF)g=F(xg).

Corollary 4: If L=xaybzc is any monomial in the coordinate partial derivative operators on Rn, then for any FD and gCc, L(Fg)=(LF)g=F(Lg)

Proof: As in corollary 2, we inductively apply Lemma 3 to each partial derivative operator which shows up in L.

To build upwards from this, its useful to stop for a second and factorize out a little argument about convolution:

Lemma 5: If F,Φ are distributions and g,γ a smooth compactly supported functions, then (F+Φ)g=Fg+Φg and F(g+γ)=Fg+Fγ.

Proof: Let xRn. First consider F(g+γ) evaluated at x. This is by definition F((g+γ)(x)), that is, F(g(x)+γ(x)). Using the linearity of F, we see this to be F(g(x))+F(γ(x)), which is by definition (Fg)(x)+(Fγ)(x). Thus, F(g+γ)=Fg+Fγ.

Next, consider(F+Φ)g evaluated at x. By the definition of convolution, ((F+Φ)g)(x)=(F+Φ)(g(x)). Using the definition of + in D, we distribute as (F+Φ)(g(x))=F(g(x))+Φ(g(x)), where the last terms are each by definition equal to (Fg)(x) and (Φg)(x) respectively. Thus, (F+Φ)g=Fg+Φg.

Lemma 6: Let L1,L2 be a differential operator on Rn such that Li(Fg)=(LiF)g=F(Lig) for i1,2 and any FD, gCc. Then L=L1+L2 also satisfies L(Fg)=(LF)g=FLg for all F,g.

Proof: The differential operator L=L1+L2 acts on functions by Lϕ=L1ϕ+L2ϕ. Thus if FD, gCc, (L1+L2)(Fg)=L1(Fg)+L2(Fg).

To get the first of the two claimed equalities, we can use half our hypothesis on the Li to re-write this as (L1F)g+(L2F)g, and then use Lemma 5 to factor out the convolution giving (L1+L2)(Fg)=(L1F+L2F)g. Factoring out the F gives what we wanted: (L1+L2)(Fg)=((L1+L2)F)g

To get the second equality, we use the other half of our assumption on the Li to rewrite L1(Fg)+L2(Fg) as F(L1g)+F(L2g). We use Lemma 5 to factor out the distribution from this convolution, followed by the further factoring L1g+L2g=(L1+L2)g. All together, this gives what we wanted: (L1+L2)(Fg)=F((L1+L2)g)

Lemma 7: Let L be a differential operator on Rn such that L(Fg)=(LF)g=F(Lg) for any FD and gCc. Then if ψ:RnR is any smooth function, the differential operator K=ψL defined by Kϕ=ψL(ϕ) also satisfies K(Fg)=(KF)g=F(Kg) for all F,g.

Proof: Evaluating K(Fg)=ψL(Fg),we can use that L satisfies our hypothesis to conclude this is equal to ψ((LF)g) and ψ(F(Lg)). Taking the former and evaluating at xRn, we see it to equal ψ(x)((LF)g)(x)=ψ(x)(LF)(g(x)) At this fixed x, ψ(x) is a constant, and so this is the same as evaluating the distribution (ψ(x)LF) on g(x). But this is the definition of the convolution of ψ(x)LF with the function g, evaluated at x! Thus, all together ψ((LF)g) is the function x((ψ(x)LF)g)(x) which is the first half of what we want: K(Fg)=ψ((LF)g)=(ψLF)g=(KF)g

The other case is similar, considering ψ(F(Lg)) evaluated at x. This yields ψ(x)F((Lg)(x)), and as ψ(x) is a constant at this x, we may pull it inside to get F(ψ(x)(Lg)(x)). That is, ψ(F(Lg)) sends x to the result of convolving the function ψ(x)Lg with F, so K(Fg)=ψ(F(Lg))=F(ψLg)=F(Kg)

Finally, we need a description of the class of linear differential operators on Rn which is amenable to our start-small-and-build-upwards approach:

Lemma 8: Any linear differential operrator on Rn is a multinomial in the partial derivative operators, with coefficients in C(Rn).

All the hard work is done, now its just putting the pieces together to state the main result:

Theorem 9: Let be any linear differential operator on Rn. Then (Fg)=(F)g=F(g) for all FD, gCc.

Proof: We write as a multinomial in the partial derivatives, =[α]ψ[α][α] where [α]=[a,b,c,] ranges over some finite subset of all multi-indices, [α]=xaybzc, and for each index ψ[α] is some smooth function RnR. But as each [α] satisfies the desired property by Corollary 4, we can apply Lemmas 6 and 7 finitely many times to conclude that does as well.