PyLops-distributed¶
Note
This library is under early development.
Expect things to constantly change until version v1.0.0.
This library is an extension of PyLops for distributed operators.
As much as numpy and scipy lie at the core of the parent project PyLops, PyLops-distributed heavily builds on top of Dask, and more specifically Dask arrays.
Doing so, linear operators can be parallelized across several processes on a single node or across multiple nodes. Their forward and adjoint are first lazily built as directed acyclic graphs and evaluated only when requested by the user (or automatically within one of our solvers).
Most of the operators and solvers in PyLops-distributed mirror their equivalents in PyLops and users can seamlessly switch between PyLops and PyLops-distributed or even combine operators acting locally with distributed operators.
Here is a simple example showing how a diagonal operator can be created, applied and inverted using PyLops:
import numpy as np
from pylops import Diagonal
n = 10
x = np.ones(n)
d = np.arange(n) + 1
Dop = Diagonal(d)
# y = Dx
y = Dop*x
# x = D'y
xadj = Dop.H*y
# xinv = D^-1 y
xinv = Dop / y
and similarly using PyLops-distributed:
import numpy as np
import dask.array as da
import pylops_distributed
from pylops_distributed import Diagonal
# set-up client
client = pylops_distributed.utils.backend.dask()
n = 10
x = da.ones(n, chunks=(n//2,))
d = da.from_array(np.arange(n) + 1, chunks=(n//2, n//2))
Dop = Diagonal(d)
# y = Dx
y = Dop*x
# x = D'y
xadj = Dop.H*y
# xinv = D^-1 y
xinv = Dop / y
da.compute((y, xadj, xinv))
client.close()
It is worth noticing two things at this point:
- In this specific case we did not even need to reimplement the
Diagonal
operator. Calling numpy operations as methods (e.g.,x.sum()
) instead of functions (e.g.,np.sum(x)
) makes it automatic for our operator to act as a distributed operator when a dask array is provided instead. Unfortunately not all numpy functions are also implemented as methods: in those cases we reimplement the o perator directly within PyLops-distributed. - Using
*
and.H*
is still possible also within PyLops-distributed, however when initializing an operator we will need to decide whether we want to simply create dask graph or also evaluation. This gives flexibility as we can decide if and when apply evaluation using thecompute
method on a dask array of choice.
History¶
PyLops-Distributed was initially written and it is currently maintained by Equinor It is an extension of PyLops for large-scale optimization with distributed linear operators that can be tailored to our needs, and as contribution to the free software community.
Installation¶
You will need Python 3.5 or greater to get started.
Dependencies¶
Our mandatory dependencies are limited to:
We advise using the Anaconda Python distribution
to ensure that these dependencies are installed via the Conda
package manager.
Step-by-step installation for users¶
Python environment¶
Stable releases on PyPI and Conda coming soon…
To install the latest source from github:
>> pip install https://git@github.com/equinor/pylops-distributed.git@master
or just clone the repository
>> git clone https://github.com/equinor/pylops-distributed.git
or download the zip file from the repository (green button in the top right corner of the main github repo page) and install PyLops from terminal using the command:
>> make install
Step-by-step installation for developers¶
Fork and clone the repository by executing the following in your terminal:
>> git clone https://github.com/your_name_here/pylops-distributed.git
The first time you clone the repository run the following command:
>> make dev-install
If you prefer to build a new Conda enviroment just for PyLops, run the following command:
>> make dev-install_conda
To ensure that everything has been setup correctly, run tests:
>> make tests
Make sure no tests fail, this guarantees that the installation has been successfull.
If using Conda environment, always remember to activate the conda environment every time you open a new bash shell by typing:
>> source activate pylops-distributed
Tutorials¶
Note
Click here to download the full example code
Marchenko redatuming by inversion¶
This example shows how to set-up and run the
pylops_distributed.waveeqprocessing.Marchenko
inversion using
synthetic data.
Data are first converted to frequency domain and stored in the high-performance format Zarr. This allows lazy loading using a Dask array and distributing over frequencies the computation of the various Fredholm integrals involved in the forward model.
# sphinx_gallery_thumbnail_number = 3
import warnings
import numpy as np
import dask.array as da
import matplotlib.pyplot as plt
from scipy.signal import convolve
from pylops_distributed.waveeqprocessing import Marchenko
warnings.filterwarnings('ignore')
plt.close('all')
Let’s start by defining some input parameters and loading the test data
# Input parameters
inputfile = '../testdata/marchenko/input.npz'
inputzarr = '../testdata/marchenko/input.zarr'
vel = 2400.0 # velocity
toff = 0.045 # direct arrival time shift
nsmooth = 10 # time window smoothing
nfmax = 1000 # max frequency for MDC (#samples)
niter = 10 # iterations
inputdata = np.load(inputfile)
# Receivers
r = inputdata['r']
nr = r.shape[1]
dr = r[0, 1]-r[0, 0]
# Sources
s = inputdata['s']
ns = s.shape[1]
ds = s[0, 1]-s[0, 0]
# Virtual points
vs = inputdata['vs']
# Density model
rho = inputdata['rho']
z, x = inputdata['z'], inputdata['x']
# Reflection response in frequency domain (R[f, s, r])
R_fft = da.from_zarr(inputzarr)
print('R_fft:', R_fft)
# Subsurface fields
Gsub = inputdata['Gsub']
G0sub = inputdata['G0sub']
wav = inputdata['wav']
wav_c = np.argmax(wav)
t = inputdata['t']
ot, dt, nt = t[0], t[1]-t[0], len(t)
Gsub = np.apply_along_axis(convolve, 0, Gsub, wav, mode='full')
Gsub = Gsub[wav_c:][:nt]
G0sub = np.apply_along_axis(convolve, 0, G0sub, wav, mode='full')
G0sub = G0sub[wav_c:][:nt]
plt.figure(figsize=(10, 5))
plt.imshow(rho, cmap='gray', extent=(x[0], x[-1], z[-1], z[0]))
plt.scatter(s[0, 5::10], s[1, 5::10], marker='*', s=150, c='r', edgecolors='k')
plt.scatter(r[0, ::10], r[1, ::10], marker='v', s=150, c='b', edgecolors='k')
plt.scatter(vs[0], vs[1], marker='.', s=250, c='m', edgecolors='k')
plt.axis('tight')
plt.xlabel('x [m]')
plt.ylabel('y [m]')
plt.title('Model and Geometry')
plt.xlim(x[0], x[-1])
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12, 9))
axs[0].imshow(Gsub, cmap='gray', vmin=-1e6, vmax=1e6,
extent=(r[0, 0], r[0, -1], t[-1], t[0]))
axs[0].set_title('G')
axs[0].set_xlabel(r'$x_R$')
axs[0].set_ylabel(r'$t$')
axs[0].axis('tight')
axs[0].set_ylim(1.5, 0)
axs[1].imshow(G0sub, cmap='gray', vmin=-1e6, vmax=1e6,
extent=(r[0, 0], r[0, -1], t[-1], t[0]))
axs[1].set_title('G0')
axs[1].set_xlabel(r'$x_R$')
axs[1].set_ylabel(r'$t$')
axs[1].axis('tight')
axs[1].set_ylim(1.5, 0)
Out:
R_fft: dask.array<from-zarr, shape=(500, 101, 101), dtype=complex64, chunksize=(125, 101, 101), chunktype=numpy.ndarray>
(1.5, 0.0)
Let’s now create an object of the
pylops_distributed.waveeqprocessing.Marchenko
class and
apply redatuming for a single subsurface point vs
.
# direct arrival window
trav = np.sqrt((vs[0]-r[0])**2+(vs[1]-r[1])**2)/vel
MarchenkoWM = Marchenko(R_fft, nt=nt, dt=dt, dr=dr, wav=wav,
toff=toff, nsmooth=nsmooth)
f1_inv_minus, f1_inv_plus, p0_minus, g_inv_minus, g_inv_plus = \
MarchenkoWM.apply_onepoint(trav, G0=G0sub.T, rtm=True, greens=True,
dottest=False, **dict(niter=niter, compute=True))
g_inv_tot = g_inv_minus + g_inv_plus
We can now compare the result of Marchenko redatuming via LSQR with standard redatuming
fig, axs = plt.subplots(1, 3, sharey=True, figsize=(16, 9))
axs[0].imshow(p0_minus.T, cmap='gray', vmin=-5e5, vmax=5e5,
extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
axs[0].set_title(r'$p_0^-$')
axs[0].set_xlabel(r'$x_R$')
axs[0].set_ylabel(r'$t$')
axs[0].axis('tight')
axs[0].set_ylim(1.2, 0)
axs[1].imshow(g_inv_minus.T, cmap='gray', vmin=-5e5, vmax=5e5,
extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
axs[1].set_title(r'$g^-$')
axs[1].set_xlabel(r'$x_R$')
axs[1].set_ylabel(r'$t$')
axs[1].axis('tight')
axs[1].set_ylim(1.2, 0)
axs[2].imshow(g_inv_plus.T, cmap='gray', vmin=-5e5, vmax=5e5,
extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
axs[2].set_title(r'$g^+$')
axs[2].set_xlabel(r'$x_R$')
axs[2].set_ylabel(r'$t$')
axs[2].axis('tight')
axs[2].set_ylim(1.2, 0)
fig = plt.figure(figsize=(15, 9))
ax1 = plt.subplot2grid((1, 5), (0, 0), colspan=2)
ax2 = plt.subplot2grid((1, 5), (0, 2), colspan=2)
ax3 = plt.subplot2grid((1, 5), (0, 4))
ax1.imshow(Gsub, cmap='gray', vmin=-5e5, vmax=5e5,
extent=(r[0, 0], r[0, -1], t[-1], t[0]))
ax1.set_title(r'$G_{true}$')
axs[0].set_xlabel(r'$x_R$')
axs[0].set_ylabel(r'$t$')
ax1.axis('tight')
ax1.set_ylim(1.2, 0)
ax2.imshow(g_inv_tot.T, cmap='gray', vmin=-5e5, vmax=5e5,
extent=(r[0, 0], r[0, -1], t[-1], -t[-1]))
ax2.set_title(r'$G_{est}$')
axs[1].set_xlabel(r'$x_R$')
axs[1].set_ylabel(r'$t$')
ax2.axis('tight')
ax2.set_ylim(1.2, 0)
ax3.plot(Gsub[:, nr//2]/Gsub.max(), t, 'r', lw=5)
ax3.plot(g_inv_tot[nr//2, nt-1:]/g_inv_tot.max(), t, 'k', lw=3)
ax3.set_ylim(1.2, 0)
Out:
(1.2, 0.0)
Total running time of the script: ( 0 minutes 6.286 seconds)
PyLops-distributed API¶
Linear operators¶
Basic operators¶
MatrixMult (A[, dims, compute, todask, dtype]) |
Matrix multiplication. |
Identity (N[, M, inplace, compute, todask, dtype]) |
Identity operator. |
Diagonal (diag[, dims, dir, compute, todask, …]) |
Diagonal operator. |
Transpose (dims, axes[, compute, todask, dtype]) |
Transpose operator. |
Roll (N[, dims, dir, shift, compute, todask, …]) |
Roll along an axis. |
Restriction (M, iava[, dims, dir, inplace, …]) |
Restriction (or sampling) operator. |
Spread (dims, dimsd[, table, dtable, …]) |
Spread operator. |
VStack (ops[, chunks, compute, todask, …]) |
Vertical stacking. |
HStack (ops[, chunks, compute, todask, dtype]) |
Horizontal stacking. |
BlockDiag (ops[, chunks, compute, todask, dtype]) |
Block-diagonal operator. |
Smoothing and derivatives¶
Smoothing1D (nsmooth, dims[, dir, compute, …]) |
1D Smoothing. |
FirstDerivative (N[, dims, dir, sampling, …]) |
First derivative. |
SecondDerivative (N[, dims, dir, sampling, …]) |
Second derivative. |
Laplacian (dims[, dirs, weights, sampling, …]) |
Laplacian. |
Signal processing¶
FFT (dims[, dir, nfft, sampling, real, …]) |
One dimensional Fast-Fourier Transform. |
Convolve1D (N, h[, offset, dims, dir, …]) |
1D convolution operator. |
Fredholm1 (G[, nz, saveGt, compute, chunks, …]) |
Fredholm integral of first kind. |
Wave-Equation processing¶
MDC (G, nt, nv[, dt, dr, twosided, saveGt, …]) |
Multi-dimensional convolution. |
Marchenko (R, nt[, dt, dr, wav, toff, …]) |
Marchenko redatuming |
Demigration (z, x, t, srcs, recs, vel, wav, …) |
Demigration operator. |
Solvers¶
PyLops-distributed Utilities¶
Alongside with its Linear Operators and Solvers, PyLops contains also a number of auxiliary routines performing universal tasks that are used by several operators or simply within one or more Tutorials for the preparation of input data and subsequent visualization of results.
Contributing¶
Contributions are welcome and greatly appreciated!
Follow the instructions in our main repository
Changelog¶
Version 0.2.0¶
Released on: 06/06/2020
- Added
prescaled
input parameter topylops_distributed.waveeqprocessing.MDC
andpylops_distributed.waveeqprocessing.Marchenko
- Added
dtype
parameter to theFFT
calls in the definition of thepylops_distributed.waveeqprocessing.MDD
operation. This ensure that the type of the real part ofG
input is enforced to the output vectors of the forward and adjoint operations. - Changed handling of
dtype
inpylops_distributed.signalprocessing.FFT
to ensure that the type of the input vector is retained when applying forward and adjoint. - Added
PBS
backend topylops_distributed.utils.backend.dask
Version 0.1.0¶
Released on: 09/02/2020
- Added
pylops_distributed.Restriction
operator - Added
pylops_distributed.signalprocessing.Convolve1D
andpylops_distributed.signalprocessing.FFT2D
operators - Improved efficiency of
pylops_distributed.signalprocessing.Fredholm1
whensaveGt=False
- Adapted
pylops_distributed.optimization.cg.cg
andpylops_distributed.optimization.cg.cgls
solvers for complex numbers
Roadmap¶
Coming soon…
Contributors¶
- Matteo Ravasi, mrava87