Skip to main content

Strip UTF-8 BOM from beginning (string.strip_utf8_bom)

Declaration

out = string.strip_utf8_bom(text)

Parameters

  • text
    string. Text that may contain UTF-8 BOM.

Returns

  • out
    string. Text with BOM removed.

Description

UTF-8 BOM appears as three invisible bytes at file start: "\xEF\xBB\xBF" (In Lua string literals, "\x" followed by two hex digits denotes a single byte; e.g., "\x58" is 'X'. See ASCII reference.)

Example

txt = "\xEF\xBB\xBFXXTouch"
sys.alert(txt..', '..#txt) -- "XXTouch, 10"
--
txt = string.strip_utf8_bom(txt)
sys.alert(txt..', '..#txt) -- "XXTouch, 7"

Note: Uses function outside of this chapter: sys.alert

Tips

UTF-8 does not require a BOM, although Unicode allows it. Non-BOM UTF-8 is standard; putting BOM in UTF-8 files is mainly a Windows practice and may cause issues elsewhere.