Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bdls_pathutil
[Package bdls]

Provide portable file path manipulation. More...

Namespaces

namespace  bdls

Detailed Description

Outline
Purpose:
Provide portable file path manipulation.
Classes:
bdls::PathUtil Portable utility methods for manipulating paths
See also:
Component bdls_filesystemutil
Description:
This component provides utility methods for manipulating strings that represent paths in the filesystem. Class methods of bdls::PathUtil include platform-independent operations to add or remove filenames or relative paths at the end of a path string (by "filenames" we are referring to the names of any filesystem item, including regular files and directories). There are also methods to parse the path to delimit the "root" as defined for the current platform; see Parsing and Performance (rootEnd argument) below.
Paths that have a root are called absolute paths, whereas paths that do not have a root are relative paths.
Note that this component does not perform filesystem operations. In particular, no effort is made to verify the existence or accessibility of any segment of any path.
Terminology:
To introduce the terminology explored in this section, lets start with a Unix example:
  "/foo/bar/myfile.txt"
The elements of this path would be:
            Path: "/foo/bar/myfile.txt"
            Root: "/"                       # the starting separator(s)
  Leaf(Basename): "myfile.txt"
       Extension: ".txt"
         Dirname: "/foo/bar/"
Separator:
A platform dependent character that separates elements of a path, such as directory names from each other and file names. The separator character is the / (slash) on Unix (and the like) systems and \ (backslash) on Windows systems.
Path:
An optional root, followed by optional directories, followed by an optional filename.
Root:
The root, if present, is at the beginning of a path and its presence determines if a path is absolute (the root is present) or relative (the root is not present). The textual rules for what a root is are platform dependent. See Unix Root and Windows Root.
See also Parsing and Performance (rootEnd argument) for important notes about speeding up functions (especially on Windows) by not reparsing roots every time a function is called.
Unix Root:
The Unix root consists of the separator characters at the beginning of a path, so the root of "/one" is "/", the root of "//two" is "//", while the root of "somefile" is "" (there is no root, relative path).
Windows Root:
The Windows root is much more complicated than the Unix root, because Windows has three different flavors of paths: local (LFS), UNC, and Long UNC UNC (LUNC):
LFS: root consists of a drive letter followed by a colon (the name part) and then zero or more separators (the directory part). E.g., "c:\hello.txt", root is "c:\"; "c:tmp" root is "c:"
UNC: root consists of two separators followed by a hostname and separator (the name part), and then a shared folder followed by one or more separators (the directory part). e.g., "\\servername\sharefolder\output\test.t" root is "\\servername\sharefolder\"
LUNC: root starts with "\\?\". Then follows either "UNC" followed by a UNC root, or an LFS root. The "\\?\" is included as part of the root name. e.g., "\\?\UNC\servername\folder\hello" root is "\\?\UNC\servername\dir\" while "\\?\c:\windows\test" root is "\\?\\c:\"
Leaf (a.k.a. Basename):
The leaf is the rightmost name following the root, in other words: the last element of the path. Note that several methods in this utility require a leaf to be present to function (such as getDirname). Note that a relative path may contain a leaf only. Examples:
  Path                            Leaf
  ----                            ----
  "/tmp/foo/bar.txt"              "bar.txt"
  "c:\tmp\foo\bar.txt"            "bar.txt"
  "\\server\share\tmp\foo.txt"    "foo.txt"
  "/tmp/foo/"                     "foo"
  "/tmp/"                         "tmp"
  "/"                             Not Present
Extension:
An extension is a suffix of a leaf that begins with a dot and that does not contain additional dots. There are a few caveats. The special leaf names "." and ".." are considered to not have extensions. Furthermore, if a leaf's name begins with a dot, such dot is not considered when determining the extension. For example, the leaf ".bashrc" does not have an extension, but ".bbprofile.log" does, and its extension is ".log". We will say that a path has an extension if it has a leaf and its leaf has an extension. Note that for consistency reasons, our implementation differs from other standard implementations in the same way getLeaf does: the path "/foo/bar.txt/" is considered to have an extension and its extension is ".txt". Examples:
  Path                            Extension
  ----                            -------
  "/tmp/foo/bar.txt"              ".txt"
  "/tmp/foo/bar"                  Not Present
  "/tmp/foo/bar.longextension"    ".longextension"
  "/a/b.txt/"                     ".txt"
  "/a/b.txt/."                    Not present
  "/a.txt/b.txt/.."               Not present
  "/a/.profile"                   Not present
  "/a/.profile.backup"            ".backup"
  "foo.txt"                       ".txt"
Dirname:
Dirname is the part of the path that contains the root but not the leaf. Note that the getDirname utility method requires a leaf to be present to function. Examples:
  Path                            Dirname
  ----                            -------
  "/tmp/foo/bar.txt"              "/tmp/foo/"
  "c:\tmp\foo\bar.txt"            "c:\tmp\foo\"
  "\\server\share\tmp\foo.txt"    "\\server\share\tmp\"
  "/tmp/foo/"                     "/tmp"
  "/tmp/"                         "/"
  "/"                             no leaf -> error
  "foo.txt"                       empty
Parsing and Performance (rootEnd argument):
Most methods of this component will perform basic parsing of the beginning part of the path to determine what part of it is the "root" as defined for the current platform. This parsing is trivial on Unix platforms but is slightly more involved for the Windows operating system. To accommodate client code which is willing to store parsing results in order to maximize performance, all methods which parse the "root" of the path accept an optional argument delimiting the "root"; if this argument is specified, parsing is skipped.
Usage:
This section illustrates intended use of this component.
Example 1: Basic Syntax:
We start with strings representing an absolute native path and a relative native path, respectively:
  #ifdef BSLS_PLATFORM_OS_WINDOWS
  bsl::string tempPath  = "c:\\windows\\temp";
  bsl::string otherPath = "22jan08\\log.txt";
  #else
  bsl::string tempPath  = "/var/tmp";
  bsl::string otherPath = "22jan08/log.txt";
  #endif
tempPath is an absolute path, since it has a root. It also has a leaf element ("temp"):
  assert(false == bdls::PathUtil::isRelative(tempPath));
  assert(true  == bdls::PathUtil::isAbsolute(tempPath));
  assert(true  == bdls::PathUtil::hasLeaf(tempPath));
We can add filenames to the path one at a time, or we can add another path if is relative. We can also remove filenames from the end of the path one at a time:
  bdls::PathUtil::appendRaw(&tempPath, "myApp");
  bdls::PathUtil::appendRaw(&tempPath, "logs");

  assert(true == bdls::PathUtil::isRelative(otherPath));
  assert(0    == bdls::PathUtil::appendIfValid(&tempPath, otherPath));
  assert(true == bdls::PathUtil::hasLeaf(tempPath));

  bdls::PathUtil::popLeaf(&tempPath);
  bdls::PathUtil::appendRaw(&tempPath, "log2.txt");

  #ifdef BSLS_PLATFORM_OS_WINDOWS
  assert("c:\\windows\\temp\\myApp\\logs\\22jan08\\log2.txt" == tempPath);
  #else
  assert("/var/tmp/myApp/logs/22jan08/log2.txt"              == tempPath);
  #endif
A relative path may be appended to any other path, even itself. An absolute path may not be appended to any path, or undefined behavior will result:
  assert(0 == bdls::PathUtil::appendIfValid(&otherPath, otherPath));  // OK
  /* bdls::PathUtil::append(&otherPath, tempPath); */ // UNDEFINED BEHAVIOR!
Note that there is no attempt to distinguish filenames that are regular files from filenames that are directories, or to verify the existence of paths in the filesystem.
  #ifdef BSLS_PLATFORM_OS_WINDOWS
  assert("c:\\windows\\temp\\myApp\\logs\\22jan08\\log2.txt" == tempPath);
  #else
  assert("/var/tmp/myApp/logs/22jan08/log2.txt"              == tempPath);
  #endif
Example 2: Parsing a path using splitFilename:
Suppose we need to obtain all filenames from the path.
First, we create a path for splitting and a storage for filenames:
  #ifdef BSLS_PLATFORM_OS_WINDOWS
  const char                     *splitPath = "c:\\one\\two\\three\\four";
  #else
  const char                     *splitPath = "//one/two/three/four";
  #endif
  bsl::vector<bsl::string_view>  filenames;
Then, we run a cycle to sever filenames from the end one by one:
  bsl::string_view head;
  bsl::string_view tail;
  bsl::string_view path(splitPath);

  do {
      bdls::PathUtil::splitFilename(&head, &tail, path);
      filenames.push_back(tail);
      path = head;
  } while (!tail.empty());
Now, verify the resulting values:
  assert(5           == filenames.size());

  assert("four"      == filenames[0]);
  assert("three"     == filenames[1]);
  assert("two"       == filenames[2]);
  assert("one"       == filenames[3]);
  assert(""          == filenames[4]);
Finally, make sure that only the root remains of the original value:
  #ifdef BSLS_PLATFORM_OS_WINDOWS
  assert("c:\\"      == head);
  #else
  assert("//"        == head);
  #endif