Modern Javascript Recap - Part IV

Joris Verbogt
Joris Verbogt
Jun 23 2023
Posted in Engineering & Technology

New Array and String methods

Modern Javascript Recap - Part IV

As the JavaScript language evolves, new language features are being introduced all the time, but sometimes there are just some useful variations on existing data types and methods.

In the previous blog post on this subject, we took a look at fundamental properties of numbers in JavaScript. In this blog post, I want to highlight some of the latest additions to the Array and String data types.

New Array methods

Reverse lookup with findLast and findLastIndex

These are basically the same as their existing counterparts find and findIndex, but searching from the tail of the Array:

const inputArray = ['apple', 'banana', 'orange'];
console.log(inputArray.find((element) => element.endsWith('e'))) // 'apple'
console.log(inputArray.findIndex((element) => element.endsWith('e'))) // 0
console.log(inputArray.findLast((element) => element.endsWith('e')))  // 'orange'
console.log(inputArray.findLastIndex((element) => element.endsWith('e'))) // 2

Non-modifying variants of element ordering methods

Whenever you're dealing with methods that act upon Arrays, you run the risk of inadvertently modifying your source Array.

Notably, this is the case for the element ordering methods reverse, sort and splice. These now have variants that return a new Array, leaving the source intact:

const source = [0, 2, 1, 3]
console.log(source.toReversed()) // [3, 1, 2, 0]
console.log(source.toSorted()) // [0, 1, 2, 3]
console.log(source.toSpliced(1, 1)) // [2, 1, 3]
console.log(source) // [0, 2, 1, 3]

Furthermore, there is now also a with method, that allows you to set value at an index without changing the source:

const source = [0, 1, 2, 3]
console.log(source.with(0, 1)) // [1, 1, 2, 3]
console.log(source) // [0, 1, 2, 3]

New String methods

Fix invalid UTF characters with isWellFormed() and toWellFormed()

UTF-16 extends its 65536-character limit by introducing so-called 'surrogate pairs', i.e., combinations of multiple UTF-16 units into one Unicode code point.

The units used for extension can not be used on their own. By manipulating strings (e.g., splitting) you run the risk of ending up with a so-called 'lone surrogate'. A "lone surrogate" is a 16-bit code unit satisfying one of the descriptions below:

  • It is in the range 0xD800–0xDBFF, inclusive (i.e. is a high surrogate), but it is the last code unit in the string, or the next code unit is not a low surrogate.
  • It is in the range 0xDC00–0xDFFF, inclusive (i.e. is a low surrogate), but it is the first code unit in the string, or the previous code unit is not a high surrogate.

Most JavaScript built-in methods handle them correctly because they all work based on UTF-16 code units. But other methods, for example, encodeURI will throw a URIError for lone surrogates, because URI encoding uses UTF-8 encoding, which does not have any encoding for lone surrogates.

Strings not containing any lone surrogates are called well-formed strings, and there are 2 methods dealing with this in the String data type.

To check if a String is well-formed:

// Lone high surrogate
console.log('ab\uD800'.isWellFormed()) // false
console.log('ab\uD800c'.isWellFormed()) // false
// Lone low surrogate
console.log('\uDFFFab'.isWellFormed()) // false
console.log('c\uDFFFab'.isWellFormed()) // false
// Well-formed
console.log('abc'.isWellFormed()) // true
console.log('ab\uD83D\uDE04c'.isWellFormed()) // true

To make sure a String is well-formed, replacing any non-valid characters with the Unicode replacement character U+FFFD:

// Lone high surrogate
console.log('ab\uD800'.toWellFormed()) // 'ab�'
console.log('ab\uD800c'.toWellFormed()) // 'ab�c'
// Lone low surrogate
console.log('\uDFFFab'.toWellFormed()) // '�ab'
console.log('c\uDFFFab'.toWellFormed()) // 'c�ab'
// Well-formed
console.log('abc'.toWellFormed()) // 'abc'
console.log('ab\uD83D\uDE04c'.toWellFormed()) // 'ab😄c'

So, in order to prevent URI parsing from failing when these lone surrogates are present, you could do something like this:

const illFormedURIString = 'https://example.com/search?q=\uD800'

try {
  encodeURI(illFormedURIString)
} catch (err) {
  console.log(err) // URIError: URI malformed
}

if (illFormedURIString.isWellFormed()) {
  console.log(encodeURI(illFormedURIString))
} else {
  console.warn("Ill-formed URI string.") // 'Ill-formed URI string.'
  console.log(encodeURI(illFormedURIString.toWellFormed())) // 'https://example.com/search?q=%EF%BF%BD'
}

Conclusion

As new features are added to the JavaScript language, it's always good to keep an eye on features actually being implemented in the latest engines.

If you have control over the runtime engine, such as is the case with NodeJS, you can easily benefit from these new methods by upgrading to the latest stable releases. The examples above all work in the current Node version 20, which is bound to become the LTS version in October this year.

As always, we hope you liked this article and if you have anything to add, maybe you are suited for a Developer position in Notificare. We are currently looking for a Core API Developer, check out the job description. If modern Javascript is your thing, don't hesitate to apply!

Keep up-to-date with the latest news