PyLops-distributed

Note

This library is under early development.

Expect things to constantly change until version v1.0.0.

This library is an extension of PyLops for distributed operators.

As much as numpy and scipy lie at the core of the parent project PyLops, PyLops-distributed heavily builds on top of Dask, and more specifically Dask arrays.

Doing so, linear operators can be parallelized across several processes on a single node or across multiple nodes. Their forward and adjoint are first lazily built as directed acyclic graphs and evaluated only when requested by the user (or automatically within one of our solvers).

Most of the operators and solvers in PyLops-distributed mirror their equivalents in PyLops and users can seamlessly switch between PyLops and PyLops-distributed or even combine operators acting locally with distributed operators.

Here is a simple example showing how a diagonal operator can be created, applied and inverted using PyLops:

import numpy as np
from pylops import Diagonal

n = 10
x = np.ones(n)
d = np.arange(n) + 1

Dop = Diagonal(d)

# y = Dx
y = Dop*x
# x = D'y
xadj = Dop.H*y
# xinv = D^-1 y
xinv = Dop / y

and similarly using PyLops-distributed:

import numpy as np
import dask.array as da
import pylops_distributed
from pylops_distributed import Diagonal

# set-up client
client = pylops_distributed.utils.backend.dask()

n = 10
x = da.ones(n, chunks=(n//2,))
d = da.from_array(np.arange(n) + 1, chunks=(n//2, n//2))

Dop = Diagonal(d)

# y = Dx
y = Dop*x
# x = D'y
xadj = Dop.H*y
# xinv = D^-1 y
xinv = Dop / y

da.compute((y, xadj, xinv))
client.close()

It is worth noticing two things at this point:

  • In this specific case we did not even need to reimplement the Diagonal operator. Calling numpy operations as methods (e.g., x.sum()) instead of functions (e.g., np.sum(x)) makes it automatic for our operator to act as a distributed operator when a dask array is provided instead. Unfortunately not all numpy functions are also implemented as methods: in those cases we reimplement the o perator directly within PyLops-distributed.
  • Using * and .H* is still possible also within PyLops-distributed, however when initializing an operator we will need to decide whether we want to simply create dask graph or also evaluation. This gives flexibility as we can decide if and when apply evaluation using the compute method on a dask array of choice.

History

PyLops-Distributed was initially written and it is currently maintained by Equinor It is an extension of PyLops for large-scale optimization with distributed linear operators that can be tailored to our needs, and as contribution to the free software community.

Installation

You will need Python 3.5 or greater to get started.

Dependencies

Our mandatory dependencies are limited to:

We advise using the Anaconda Python distribution to ensure that these dependencies are installed via the Conda package manager.

Step-by-step installation for users

Python environment

Stable releases on PyPI and Conda coming soon…

To install the latest source from github:

>> pip install https://git@github.com/equinor/pylops-distributed.git@master

or just clone the repository

>> git clone https://github.com/equinor/pylops-distributed.git

or download the zip file from the repository (green button in the top right corner of the main github repo page) and install PyLops from terminal using the command:

>> make install

Step-by-step installation for developers

Fork and clone the repository by executing the following in your terminal:

>> git clone https://github.com/your_name_here/pylops-distributed.git

The first time you clone the repository run the following command:

>> make dev-install

If you prefer to build a new Conda enviroment just for PyLops, run the following command:

>> make dev-install_conda

To ensure that everything has been setup correctly, run tests:

>> make tests

Make sure no tests fail, this guarantees that the installation has been successfull.

If using Conda environment, always remember to activate the conda environment every time you open a new bash shell by typing:

>> source activate pylops-distributed

Tutorials

Marchenko redatuming by inversion

This example shows how to set-up and run the pylops_distributed.waveeqprocessing.Marchenko inversion using synthetic data.

Data are first converted to frequency domain and stored in the high-performance format Zarr. This allows lazy loading using a Dask array and distributing over frequencies the computation of the various Fredholm integrals involved in the forward model.

# sphinx_gallery_thumbnail_number = 3
import warnings
import numpy as np
import dask.array as da
import matplotlib.pyplot as plt

from scipy.signal import convolve
from pylops_distributed.waveeqprocessing import Marchenko

warnings.filterwarnings('ignore')
plt.close('all')

Let’s start by defining some input parameters and loading the test data

# Input parameters
inputfile = '../testdata/marchenko/input.npz'
inputzarr = '../testdata/marchenko/input.zarr'

vel = 2400.0         # velocity
toff = 0.045         # direct arrival time shift
nsmooth = 10         # time window smoothing
nfmax = 1000         # max frequency for MDC (#samples)
niter = 10           # iterations

inputdata = np.load(inputfile)

# Receivers
r = inputdata['r']
nr = r.shape[1]
dr = r[0, 1]-r[0, 0]

# Sources
s = inputdata['s']
ns = s.shape[1]
ds = s[0, 1]-s[0, 0]

# Virtual points
vs = inputdata['vs']

# Density model
rho = inputdata['rho']
z, x = inputdata['z'], inputdata['x']

# Reflection response in frequency domain (R[f, s, r])
R_fft = da.from_zarr(inputzarr)
print('R_fft:', R_fft)

# Subsurface fields
Gsub = inputdata['Gsub']
G0sub = inputdata['G0sub']
wav = inputdata['wav']
wav_c = np.argmax(wav)

t = inputdata['t']
ot, dt, nt = t[0], t[1]-t[0], len(t)

Gsub = np.apply_along_axis(convolve, 0, Gsub, wav, mode='full')
Gsub = Gsub[wav_c:][:nt]
G0sub = np.apply_along_axis(convolve, 0, G0sub, wav, mode='full')
G0sub = G0sub[wav_c:][:nt]

plt.figure(figsize=(10, 5))
plt.imshow(rho, cmap='gray', extent=(x[0], x[-1], z[-1], z[0]))
plt.scatter(s[0, 5::10], s[1, 5::10], marker='*', s=150, c='r', edgecolors='k')
plt.scatter(r[0, ::10], r[1, ::10], marker='v', s=150, c='b', edgecolors='k')
plt.scatter(vs[0], vs[1], marker='.', s=250, c='m', edgecolors='k')
plt.axis('tight')
plt.xlabel('x [m]')
plt.ylabel('y [m]')
plt.title('Model and Geometry')
plt.xlim(x[0], x[-1])

fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12, 9))
axs[0].imshow(Gsub, cmap='gray', vmin=-1e6, vmax=1e6,
              extent=(r[0, 0], r[0, -1], t[-1], t[0]))
axs[0].set_title('G')
axs[0].set_xlabel(r'$x_R$')
axs[0].set_ylabel(r'$t$')
axs[0].axis('tight')
axs[0].set_ylim(1.5, 0)
axs[1].imshow(G0sub, cmap='gray', vmin=-1e6, vmax=1e6,
              extent=(r[0, 0], r[0, -1], t[-1], t[0]))
axs[1].set_title('G0')
axs[1].set_xlabel(r'$x_R$')
axs[1].set_ylabel(r'$t$')
axs[1].axis('tight')
axs[1].set_ylim(1.5, 0)
  • Model and Geometry
  • G, G0

Out:

R_fft: dask.array<from-zarr, shape=(500, 101, 101), dtype=complex64, chunksize=(125, 101, 101), chunktype=numpy.ndarray>

(1.5, 0.0)

Let’s now create an object of the pylops_distributed.waveeqprocessing.Marchenko class and apply redatuming for a single subsurface point vs.

# direct arrival window
trav = np.sqrt((vs[0]-r[0])**2+(vs[1]-r[1])**2)/vel

MarchenkoWM = Marchenko(R_fft, nt=nt, dt=dt, dr=dr, wav=wav,
                        toff=toff, nsmooth=nsmooth)

f1_inv_minus, f1_inv_plus, p0_minus, g_inv_minus, g_inv_plus = \
    MarchenkoWM.apply_onepoint(trav, G0=G0sub.T, rtm=True, greens=True,
                               dottest=False, **dict(niter=niter, compute=True))
g_inv_tot = g_inv_minus + g_inv_plus

We can now compare the result of Marchenko redatuming via LSQR with standard redatuming

fig, axs = plt.subplots(1, 3, sharey=True, figsize=(16, 9))
axs[0].imshow(p0_minus.T, cmap='gray', vmin=-5e5, vmax=5e5,
              extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
axs[0].set_title(r'$p_0^-$')
axs[0].set_xlabel(r'$x_R$')
axs[0].set_ylabel(r'$t$')
axs[0].axis('tight')
axs[0].set_ylim(1.2, 0)
axs[1].imshow(g_inv_minus.T, cmap='gray', vmin=-5e5, vmax=5e5,
              extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
axs[1].set_title(r'$g^-$')
axs[1].set_xlabel(r'$x_R$')
axs[1].set_ylabel(r'$t$')
axs[1].axis('tight')
axs[1].set_ylim(1.2, 0)
axs[2].imshow(g_inv_plus.T, cmap='gray', vmin=-5e5, vmax=5e5,
              extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
axs[2].set_title(r'$g^+$')
axs[2].set_xlabel(r'$x_R$')
axs[2].set_ylabel(r'$t$')
axs[2].axis('tight')
axs[2].set_ylim(1.2, 0)

fig = plt.figure(figsize=(15, 9))
ax1 = plt.subplot2grid((1, 5), (0, 0), colspan=2)
ax2 = plt.subplot2grid((1, 5), (0, 2), colspan=2)
ax3 = plt.subplot2grid((1, 5), (0, 4))
ax1.imshow(Gsub, cmap='gray', vmin=-5e5, vmax=5e5,
           extent=(r[0, 0], r[0, -1], t[-1], t[0]))
ax1.set_title(r'$G_{true}$')
axs[0].set_xlabel(r'$x_R$')
axs[0].set_ylabel(r'$t$')
ax1.axis('tight')
ax1.set_ylim(1.2, 0)
ax2.imshow(g_inv_tot.T, cmap='gray', vmin=-5e5, vmax=5e5,
           extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
ax2.set_title(r'$G_{est}$')
axs[1].set_xlabel(r'$x_R$')
axs[1].set_ylabel(r'$t$')
ax2.axis('tight')
ax2.set_ylim(1.2, 0)
ax3.plot(Gsub[:, nr//2]/Gsub.max(), t, 'r', lw=5)
ax3.plot(g_inv_tot[nr//2, nt-1:]/g_inv_tot.max(), t, 'k', lw=3)
ax3.set_ylim(1.2, 0)
  • $p_0^-$, $g^-$, $g^+$
  • $G_{true}$, $G_{est}$

Out:

(1.2, 0.0)

Total running time of the script: ( 0 minutes 6.286 seconds)

Gallery generated by Sphinx-Gallery

Gallery generated by Sphinx-Gallery

PyLops-distributed API

Linear operators

Basic operators
MatrixMult(A[, dims, compute, todask, dtype]) Matrix multiplication.
Identity(N[, M, inplace, compute, todask, dtype]) Identity operator.
Diagonal(diag[, dims, dir, compute, todask, …]) Diagonal operator.
Transpose(dims, axes[, compute, todask, dtype]) Transpose operator.
Roll(N[, dims, dir, shift, compute, todask, …]) Roll along an axis.
Restriction(M, iava[, dims, dir, inplace, …]) Restriction (or sampling) operator.
Spread(dims, dimsd[, table, dtable, …]) Spread operator.
VStack(ops[, chunks, compute, todask, …]) Vertical stacking.
HStack(ops[, chunks, compute, todask, dtype]) Horizontal stacking.
BlockDiag(ops[, chunks, compute, todask, dtype]) Block-diagonal operator.
Smoothing and derivatives
Smoothing1D(nsmooth, dims[, dir, compute, …]) 1D Smoothing.
FirstDerivative(N[, dims, dir, sampling, …]) First derivative.
SecondDerivative(N[, dims, dir, sampling, …]) Second derivative.
Laplacian(dims[, dirs, weights, sampling, …]) Laplacian.
Signal processing
FFT(dims[, dir, nfft, sampling, real, …]) One dimensional Fast-Fourier Transform.
Convolve1D(N, h[, offset, dims, dir, …]) 1D convolution operator.
Fredholm1(G[, nz, saveGt, compute, chunks, …]) Fredholm integral of first kind.
Wave-Equation processing
MDC(G, nt, nv[, dt, dr, twosided, saveGt, …]) Multi-dimensional convolution.
Marchenko(R, nt[, dt, dr, wav, toff, …]) Marchenko redatuming
Demigration(z, x, t, srcs, recs, vel, wav, …) Demigration operator.

Solvers

Low-level solvers
cg(A, y[, x, niter, tol, compute, client]) Conjugate gradient
cgls(A, y[, x, niter, damp, tol, compute, …]) Conjugate gradient least squares

Applications

Wave-Equation processing
LSM(z, x, t, srcs, recs, vel, wav, wavcenter) Least-squares Migration (LSM).

PyLops-distributed Utilities

Alongside with its Linear Operators and Solvers, PyLops contains also a number of auxiliary routines performing universal tasks that are used by several operators or simply within one or more Tutorials for the preparation of input data and subsequent visualization of results.

Shared

Backends
backend.dask([hardware, client, processes, …]) Dask backend initialization.
Dot-test
dottest(Op, nr, nc, chunks[, tol, …]) Dot test.

Contributing

Contributions are welcome and greatly appreciated!

Follow the instructions in our main repository

Changelog

Version 0.2.0

Released on: 06/06/2020

Version 0.1.0

Released on: 09/02/2020

Version 0.0.0

Released on: 01/09/2019

  • First official release.

Roadmap

Coming soon…

Contributors