Least mean squares (LMS) algorithms represent the simplest and most easily applied adaptive algorithms. The recursive least squares (RLS) algorithms, on the other hand, are known for their excellent performance and greater fidelity, but they come with increased complexity and computational cost. In performance, RLS approaches the Kalman filter in adaptive filtering applications with somewhat reduced required throughput in the signal processor. Compared to the LMS algorithm, the RLS approach offers faster convergence and smaller error with respect to the unknown system at the expense of requiring more computations.
Note that the signal paths and identifications are the same whether the filter uses RLS or LMS. The difference lies in the adapting portion.
The LMS filters adapt their coefficients until the difference between the desired signal and the actual signal is minimized (least mean squares of the error signal). This is the state when the filter weights converge to optimal values, that is, they converge close enough to the actual coefficients of the unknown system. This class of algorithms adapt based on the error at the current time. The RLS adaptive filter is an algorithm that recursively finds the filter coefficients that minimize a weighted linear least squares cost function relating to the input signals. These filters adapt based on the total error computed from the beginning.
The LMS filters use a gradientbased approach to perform the adaptation. The initial weights are assumed to be small, in most cases very close to zero. At each step, the filter weights are updated based on the gradient of the mean square error. If the gradient is positive, the filter weights are reduced, so that the error does not increase positively. If the gradient is negative, the filter weights are increased. The step size with which the weights change must be chosen appropriately. If the step size is very small, the algorithm converges very slowly. If the step size is very large, the algorithm converges very fast, and the system might not be stable at the minimum error value. To have a stable system, the step size μ must be within these limits:
$$0<\mu <\frac{2}{{\lambda}_{\mathrm{max}}},$$
where λ_{max} is the largest eigenvalue of the input autocorrelation matrix.
The RLS filters minimize the cost function, C by appropriately selecting the filter coefficients w(n) and updating the filter as the new data arrives. The cost function is given by this equation:
$$C({w}_{n})={\displaystyle \sum _{i=0}^{n}{\lambda}^{ni}{e}^{2}(i)},$$
where
w_{n} — RLS adaptive filter coefficients.
e(i) — Error between the desired signal d and the estimate of the desired signal dest at the current time index. The signal dest is the output of the RLS filter, and so implicitly depends on the current filter coefficients.
λ — Forgetting factor that gives exponentially less weight to older samples, specified in the range 0 < λ ≤ 1. When λ = 1, all previous errors are considered of equal weight in the total error. As λ approaches zero, the past errors play a smaller role in the total. For example, when λ = 0.1, the RLS algorithm multiplies an error value from 50 samples in the past by an attenuation factor of 0.1^{50} = 1 x 10^{−50}, considerably deemphasizing the influence of the past errors on the current total error.
In cases where the error value might come from a spurious input data point or points, the forgetting factor lets the RLS algorithm reduce the significance of older error data by multiplying the old data by the forgetting factor.
This table summarizes the key differences between the two types of algorithms:
LMS Algorithm  RLS Algorithm 

Simple and can be easily applied.  Increased complexity and computational cost. 
Takes longer to converge.  Faster convergence. 
Adaptation is based on the gradientbased approach that updates filter weights to converge to the optimum filter weights.  Adaptation is based on the recursive approach that finds the filter coefficients that minimize a weighted linear least squares cost function relating to the input signals. 
Larger steady state error with respect to the unknown system.  Smaller steady state error with respect to unknown system. 
Does not account for past data.  Accounts for past data from the beginning to the current data point. 
Objective is to minimize the current mean square error between the desired signal and the output.  Objective is to minimize the total weighted squared error between the desired signal and the output. 
No memory involved. Older error values play no role in the total error considered.  Has infinite memory. All error data is considered in the total error. Using the forgetting factor, the older data can be deemphasized compared to the newer data. Since 0 ≤ λ < 1, applying the factor is equivalent to weighting the older error. 
LMS based FIR adaptive filters in DSP System Toolbox™:
 RLS based FIR adaptive filters in DSP System Toolbox:

Within limits, you can use any of the adaptive filter algorithms to solve an adaptive filter problem by replacing the adaptive portion of the application with a new algorithm.
[1] Hayes, Monson H., Statistical Digital Signal Processing and Modeling. Hoboken, NJ: John Wiley & Sons, 1996, pp.493–552.
[2] Haykin, Simon, Adaptive Filter Theory. Upper Saddle River, NJ: PrenticeHall, Inc., 1996.