Session 05: Vector and matrix arithmetic. Strings and text: {stringr}

Feedback should be send to goran.milovanovic@datakolektiv.com. These notebooks accompany the Intro to Data Science: Non-Technical Background course 2020/21.


What do we want to do today?

We have learned a lot about vectors and matrices in R already. However, following the first four (intensive) sessions on R programming, covering everything from vectors and lists (that are also vectors in R) to iterations, decisions, and functions … that knowledge might be scattered a bit. Now we want to consolidate our knowledge on vectors and then introduce multidimensional arrays and some basic linear algebra. After all, understanding how vectors operate in a vectorized programming language is pretty much part of being in command… Following our overview of vectors, matrices, and arrays, we proceed to a super-important topic of strings and text processing in R. We introduce the {stringr} package and discuss the basics of Regular expressions (regex). While Regular expression are a topic that deserves a course on their own, the basics are definitely an essential part of any Data Science and Analytics role.

0. Prerequisits.

Install the following packages:

install.packages('stringr')

Note. By now, many of you have probably already installed {tidyverse}. If that is the case, library(tidyverse) would do just fine - {stringr} is there.

1. Vectors and matrices

1.1 Subsetting and recycling

A reminder. First of all: vectoriziation is always turned on, that is simply the nature of R…

a <- c(7, 1, 3, 9, 15)
b <- 5
a + b
[1] 12  6  8 14 20

… but recycling is also always on: the result that we have observed is a consequence of the fact that b, a numeric vector of length one, was recycled as many times as was necessary to meet the length of a which is five. See:

a <- 1:10
b <- c(2, 3)
a ^ b
 [1]    1    8    9   64   25  216   49  512   81 1000

Square, then cube, then square, then cube… and so on. Because we are recycling b <- c(2, 3).

The same for matrices:

a <- matrix(1:9, 
            ncol = 3)
print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Now

a^2
     [,1] [,2] [,3]
[1,]    1   16   49
[2,]    4   25   64
[3,]    9   36   81

But

a^c(2, 3)
longer object length is not a multiple of shorter object length
     [,1] [,2] [,3]
[1,]    1   64   49
[2,]    8   25  512
[3,]    9  216   81

Again: how does R order the indices of a matrix? Mind the warning, by the way.

The recycling rule:

If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector.

Now, as of subsetting vectors and matrices.

a <- seq(2, 100, 2)
print(a)
 [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46
[24]  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76  78  80  82  84  86  88  90  92
[47]  94  96  98 100

We can subset by indices:

a[1:20]
 [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

But we can also create a mask and subset by it:

a <- seq(2, 100, 2)
print(a)
 [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46
[24]  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76  78  80  82  84  86  88  90  92
[47]  94  96  98 100
mask <- rep(c(T, F), times = length(a)/2)
print(mask)
 [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
[16] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[31]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
[46] FALSE  TRUE FALSE  TRUE FALSE
length(a) == length(mask)
[1] TRUE
a_mask <- a[mask]
print(a_mask)
 [1]  2  6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 98
length(a_mask)
[1] 25

Reminder. Unidimensional vectors do not have a dimension in R:

print(a)
 [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46
[24]  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76  78  80  82  84  86  88  90  92
[47]  94  96  98 100
dim(a)
NULL

They only have a length:

length(a)
[1] 50

Unlike matrices or dataframes:

a <- matrix(1:9, 
            ncol = 3)
dim(a)
[1] 3 3

Did you ever think about using negative indices?

a <- 1:10
a[-2]
[1]  1  3  4  5  6  7  8  9 10

So, negative indices delete elements from a vector, as well as FALSE deletes them when used in a mask! See:

a <- matrix(1:9, 
            nrow = 3)
print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Now:

a[-2, -2]
     [,1] [,2]
[1,]    1    7
[2,]    3    9

What has just happened? Well… [-2, -2] means: remove the 2nd row and the 2nd column. There are interesting combinations to remember, such as…

a[-2, ]
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    3    6    9

… which reads: remove the second row, but keep all columns. Remember how we used to subset dataframes? Or:

a[-2, 3]
[1] 7 9

^^ removed the 2nd row, and then kept everything from the 3rd column of a. Mind the classes, it is not a matrix anymore…

class(a[-2, 3])
[1] "integer"

… so dim(a[-2, 3]) is, of course:

dim(a[-2, 3])
NULL

1.2 Basic Linear Algebra

Let’s begin by creating two vectors, arr1 and arr2:

arr1 <- seq(2,20,2)
arr2 <- seq(1,19,2)
print("arr1: ")
[1] "arr1: "
print(arr1)
 [1]  2  4  6  8 10 12 14 16 18 20
print("arr2: ")
[1] "arr2: "
print(arr2)
 [1]  1  3  5  7  9 11 13 15 17 19

Vectorized, element-wise multiplication:

arr1 * arr2
 [1]   2  12  30  56  90 132 182 240 306 380

Now, introduce the scalar product (“dot product”, or “inner product”: the sum of the products of the corresponding entries of the two sequences of numbers) in R with %*%:

arr1 %*% arr2
     [,1]
[1,] 1430

which is, of course, the same as:

sum(arr1 * arr2)
[1] 1430

Now we introduce the transpose, t(). It is more intuitive to begin with a matrix:

mat <- matrix(1:9, 
              ncol = 3)
print(mat)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

And t(mat) is:

t(mat)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

It is easy to understand: the rows become columns, and the columns become rows. But what happens if we transpose a unidimensional array of numbers?

print(arr1)
 [1]  2  4  6  8 10 12 14 16 18 20
t(arr1)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    4    6    8   10   12   14   16   18    20

No difference? Not really. R defaults to column vectors; only the second example (i.e. t(arr1)) is a row vector.

Dot product, again:

# - arr1 will become a row vector after t();
# - arr2 will remain a column vector:
t(arr1) %*% arr2
     [,1]
[1,] 1430

But:

# - arr1 will be a column vector;
# - arr2 will become a row vector after t():
arr1 %*% t(arr2)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    2    6   10   14   18   22   26   30   34    38
 [2,]    4   12   20   28   36   44   52   60   68    76
 [3,]    6   18   30   42   54   66   78   90  102   114
 [4,]    8   24   40   56   72   88  104  120  136   152
 [5,]   10   30   50   70   90  110  130  150  170   190
 [6,]   12   36   60   84  108  132  156  180  204   228
 [7,]   14   42   70   98  126  154  182  210  238   266
 [8,]   16   48   80  112  144  176  208  240  272   304
 [9,]   18   54   90  126  162  198  234  270  306   342
[10,]   20   60  100  140  180  220  260  300  340   380

A faster way to obtain a dot product of two vectors is to use crossprod():

crossprod(arr1,arr2)
     [,1]
[1,] 1430

But the class of crossprod(arr1,arr2) will be:

class(crossprod(arr1,arr2))
[1] "matrix" "array" 

drop() can be used to strip the matrix and array classes and obtain a scalar value as a result:

# as scalar:
drop(crossprod(arr1, arr2))
[1] 1430

Also, a more efficient way to obtain arr1 %*% t(arr2) is to use tcrossproduct():

tcrossprod(arr1, arr2)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    2    6   10   14   18   22   26   30   34    38
 [2,]    4   12   20   28   36   44   52   60   68    76
 [3,]    6   18   30   42   54   66   78   90  102   114
 [4,]    8   24   40   56   72   88  104  120  136   152
 [5,]   10   30   50   70   90  110  130  150  170   190
 [6,]   12   36   60   84  108  132  156  180  204   228
 [7,]   14   42   70   98  126  154  182  210  238   266
 [8,]   16   48   80  112  144  176  208  240  272   304
 [9,]   18   54   90  126  162  198  234  270  306   342
[10,]   20   60  100  140  180  220  260  300  340   380

in place of the already seen, but slower:

arr1 %*% t(arr2)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    2    6   10   14   18   22   26   30   34    38
 [2,]    4   12   20   28   36   44   52   60   68    76
 [3,]    6   18   30   42   54   66   78   90  102   114
 [4,]    8   24   40   56   72   88  104  120  136   152
 [5,]   10   30   50   70   90  110  130  150  170   190
 [6,]   12   36   60   84  108  132  156  180  204   228
 [7,]   14   42   70   98  126  154  182  210  238   266
 [8,]   16   48   80  112  144  176  208  240  272   304
 [9,]   18   54   90  126  162  198  234  270  306   342
[10,]   20   60  100  140  180  220  260  300  340   380

Note. From the crossprod() documentation: Vectors are promoted to single-column or single-row matrices, depending on the context.

Basic matric algebra:

mat1 <- matrix(1:9, 
               nrow = 3)
mat1
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
mat2 <- matrix(seq(2, 18, 2), 
               nrow = 3)
mat2
     [,1] [,2] [,3]
[1,]    2    8   14
[2,]    4   10   16
[3,]    6   12   18

Matrix multiplication vectorized is, again, element-wise in R:

mat1 * mat2
     [,1] [,2] [,3]
[1,]    2   32   98
[2,]    8   50  128
[3,]   18   72  162

Real algebraic matrix multiplication is obtained by %*%:

mat1 %*% mat2
     [,1] [,2] [,3]
[1,]   60  132  204
[2,]   72  162  252
[3,]   84  192  300

And then, what is often used in statistics, X'X, is of course:

crossprod(mat1, mat2)
     [,1] [,2] [,3]
[1,]   28   64  100
[2,]   64  154  244
[3,]  100  244  388

which is the same as (less efficient):

t(mat1) %*% mat2
     [,1] [,2] [,3]
[1,]   28   64  100
[2,]   64  154  244
[3,]  100  244  388

While XX' is:

tcrossprod(mat1, mat2)
     [,1] [,2] [,3]
[1,]  132  156  180
[2,]  156  186  216
[3,]  180  216  252

the same as (less efficient):

mat1 %*% t(mat2)
     [,1] [,2] [,3]
[1,]  132  156  180
[2,]  156  186  216
[3,]  180  216  252

1.3 Multidimensional Arrays

Multdimensional arrays in R are created by array()

input <- c(5, 9, 3, 10, 11, 12, 13, 14, 15) 
length(input)
[1] 9
arr1 <- array(vector1, 
              dim = c(3, 3, 2)) 
print(arr1) 
, , 1

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

, , 2

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15
arr1[, , 1]
     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15
arr1[, , 2]
     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

Let’s check something:

prod(c(3, 3, 2)) == length(input)
[1] FALSE

So arr1 was produced by recycling - that is why arr1[ , , 1] and arr[ , , 3] are identical():

identical(arr1[ , , 1], arr1[ , , 2])
[1] TRUE

Everything else works as expected:

apply(arr1, 1, sum)
[1] 56 68 60
apply(arr1, 2, sum)
[1] 34 66 84
apply(arr1, 3, sum)
[1] 92 92

2 Strings and regex

We will now provide a very short and concise overview of some of the R’s functionality for string processing. The later is found among the most interesting and difficult topics in computer science. On the other hand, the work of a contemporary Data Scientist - a practitioner who needs to invest time and resources to get its data sets cleaned and properly formatted for mathematical modeling - is heavily loaded with text and string processing steps. Many data sources that are available out there provide only unstructured, or semi-structured data, and that’s were the skills of string handling, text processing, and, finally, data wrangling (next session) come into play. The caveat here is that string processing is a huge domain in itself, and that is why we can provide an overview and an introduction here. It’s one of those things were a disciple becomes an expert by necessity, and were progress really means practice.

To go beyond this session: Gaston Sanchez’s “Handling and Processing Strings in R” is probably the best that is out there.

library(stringr)

On {stringr}, from Introduction to stringr, 2016-08-19: “Simplifies string operations by eliminating options that you don’t need 95% of the time (the other 5% of the time you can functions from base R or stringi)” - and it reallly does. Now,

Kick it! Strings in R are character vectors:

string_1 <- "Hello world"
string_2 <- "Sun shines!"
string_1
[1] "Hello world"
string_2
[1] "Sun shines!"
is.character(string_1) # TRUE
[1] TRUE
as.character(200*5)
[1] "1000"
as.numeric("1000")
[1] 1000
as.double("3.14")
[1] 3.14

Remember the character data type? Strings in R are nothing but instantiations of this data type. A character is a very “old” data type in R, so that all integers and doubled coerce to characters when appropriate. For example,

number <- 10
paste("Text", number)
[1] "Text 10"

We will discuss paste() later, but you can see from the example that is “puts things together into a character vector” (it concatenates strings, technically). However, the numeric 10 is lost in a new string, isn’t it… in R coercion, character eats everything.

One needs to be careful when it comes to quoting string constants here (i.e. minding the occasion when the usage of ' and " is appropriate):

# Using " and '
# either:
string_1 <- "Hello 'World'"
string_1
[1] "Hello 'World'"
# or
string_1 <- 'Hello "World"'
string_1 # prints: "Hello \"World\"" - what is this: \ ?
[1] "Hello \"World\""

What is this: \?!! It was not in my string? Don’t worry, \ is R’s escape character. In the character vector above - 'Hello "World"' - we find two instantiations of " enclosed by '. On the output, R transferred all instantiations of ' to ", making it four instantiations of " altogether now. The escape character \ is used to signal that the second instantiation of " is not a beginning of a new string, but a token to be printed, and that the third instantiation of " is not an ending of a string, but also a token to be printed to the output device.

If you care about this much, take a look at the difference between writeLines() and print():

# try:
writeLines(string_1)
Hello "World"
print(string_1)
[1] "Hello \"World\""

You could also start experimenting with cat(). More on escapism in R:

# Escaping in R: use \, the R escape character
string_1 <- 'Hello \"World\"'
string_1
[1] "Hello \"World\""
writeLines(string_1)
Hello "World"

Escaping the escape character:

writeLines("\\") # nice
\

Yes that’s how you get to use the escape character as a printable character in R, if you were wondering. Wait until it comes to regular expressions where things in R really tend to get nasty.


2.1 Elementary Functions on Strings in R

To obtain a length of a string in R…

# Length of strings
length(string_1) # of course
[1] 1

But of course it is. Maybe nchar() would do better:

nchar(string_1) # base function
[1] 13

Concatenating strings in R:

string_3 <- c(string_1, string_2) # a character vector of length == 2
writeLines(string_3)
Hello "World"
Sun shines!

No. No, no, no… that’s a character vector of length == 2, we need to use paste() here:

string_3 <- paste(string_1, string_2, sep = ", ") # length == 1, base function
writeLines(string_3)
Hello "World", Sun shines!

Where {base} has paste(), {stringr} has str_c():

strD <- c("First", "Second", "Third")
# both paste {base} and str_c {stringr} are vectorized
paste("Prefix-", strD, sep = "-") # - base R
[1] "Prefix--First"  "Prefix--Second" "Prefix--Third" 
str_c("Prefix-", strD, sep = "-") # {stringr}
[1] "Prefix--First"  "Prefix--Second" "Prefix--Third" 

How to split strings into subcomponents? In {base} it’s done by strsplit(), while {stringr} has ‘str_split()’:

# Splitting strings in R
# with strsplit {base}
string_1 <- "The quick brown fox jumps over the lazy dog"
string_1
[1] "The quick brown fox jumps over the lazy dog"

Base R:

splitA <- strsplit(string_1, " ") # is.list(splitA) == T
splitA
[[1]]
[1] "The"   "quick" "brown" "fox"   "jumps" "over"  "the"   "lazy"  "dog"  

strsplit() returns a list; unlist() it to get to your result:

splitA <- unlist(strsplit(string_1, " "))
splitA
[1] "The"   "quick" "brown" "fox"   "jumps" "over"  "the"   "lazy"  "dog"  

Extracting a part of it by combining strsplit() and paste():

# "The quick brown" from "The quick brown fox jumps over the lazy dog"
splitA <- paste(unlist(strsplit(string_1," "))[1:3], collapse = " ")
splitA
[1] "The quick brown"
string_1
[1] "The quick brown fox jumps over the lazy dog"

There’s a fixed argument that you need to know about in strsplit():

splitA <- strsplit(string_1," ")
splitA
[[1]]
[1] "The"   "quick" "brown" "fox"   "jumps" "over"  "the"   "lazy"  "dog"  
splitA <- strsplit(string_1," ", fixed = T) 
# fixed=T says: match the split argument 
# exactly, otherwise, split is an regular expression; default is: fixed = FALSE
splitA
[[1]]
[1] "The"   "quick" "brown" "fox"   "jumps" "over"  "the"   "lazy"  "dog"  

The str_split() function in {stringr} has some very useful, additional functionality in comparison to {base} strplit(). For example:

string_11 <- "Above all, don't lie to yourself. The man who lies to himself and listens to his own lie comes to a point that he cannot distinguish the truth within him, or around him, and so loses all respect for himself and for others. And having no respect he ceases to love."
string_11
[1] "Above all, don't lie to yourself. The man who lies to himself and listens to his own lie comes to a point that he cannot distinguish the truth within him, or around him, and so loses all respect for himself and for others. And having no respect he ceases to love."
str_split(string_11, boundary("word"))
[[1]]
 [1] "Above"       "all"         "don't"       "lie"         "to"          "yourself"   
 [7] "The"         "man"         "who"         "lies"        "to"          "himself"    
[13] "and"         "listens"     "to"          "his"         "own"         "lie"        
[19] "comes"       "to"          "a"           "point"       "that"        "he"         
[25] "cannot"      "distinguish" "the"         "truth"       "within"      "him"        
[31] "or"          "around"      "him"         "and"         "so"          "loses"      
[37] "all"         "respect"     "for"         "himself"     "and"         "for"        
[43] "others"      "And"         "having"      "no"          "respect"     "he"         
[49] "ceases"      "to"          "love"       
# including punctuation and special characters
str_split(string_11, boundary("word", skip_word_none = F))
[[1]]
  [1] "Above"       " "           "all"         ","           " "           "don't"      
  [7] " "           "lie"         " "           "to"          " "           "yourself"   
 [13] "."           " "           "The"         " "           "man"         " "          
 [19] "who"         " "           "lies"        " "           "to"          " "          
 [25] "himself"     " "           "and"         " "           "listens"     " "          
 [31] "to"          " "           "his"         " "           "own"         " "          
 [37] "lie"         " "           "comes"       " "           "to"          " "          
 [43] "a"           " "           "point"       " "           "that"        " "          
 [49] "he"          " "           "cannot"      " "           "distinguish" " "          
 [55] "the"         " "           "truth"       " "           "within"      " "          
 [61] "him"         ","           " "           "or"          " "           "around"     
 [67] " "           "him"         ","           " "           "and"         " "          
 [73] "so"          " "           "loses"       " "           "all"         " "          
 [79] "respect"     " "           "for"         " "           "himself"     " "          
 [85] "and"         " "           "for"         " "           "others"      "."          
 [91] " "           "And"         " "           "having"      " "           "no"         
 [97] " "           "respect"     " "           "he"          " "           "ceases"     
[103] " "           "to"          " "           "love"        "."          

2.2 Subsetting and transforming strings

See, I have a character vector, and I need only the first three characters from each component:

# Subsetting strings
string_1 <- c("Data", "Science", "Serbia")
# {base}
substr(string_1, 1, 3)
[1] "Dat" "Sci" "Ser"

Let’s start transforming strings with substr():

# {base}
string_2 <- string_1 # just a copy of string_1
substr(string_2, 1, 3) <- "WowWow" # check the result!
string_2
[1] "Wowa"    "Wowence" "Wowbia" 
substr(string_2, 1, 4) <- "WowWow" # check the result!
string_2
[1] "WowW"    "WowWnce" "WowWia" 
substr(string_2, 1, 6) <- "WowWow" # check the result!
string_2
[1] "WowW"    "WowWowe" "WowWow" 

UPPER CASE to lower case w. tolower():

string_1 <- "Belgrade"
# {base}
tolower(string_1)
[1] "belgrade"

Now everything to UPPER CASE with {base} toupper():

string_1 <- tolower(string_1)
toupper(string_1)
[1] "BELGRADE"

A useful {stringr} function str_to_title() capitalizes only the first character:

string_1 <- c("belgrade", "paris", "london", "moscow")
str_to_title(string_1)
[1] "Belgrade" "Paris"    "London"   "Moscow"  

Removing overhead white spaces from strings is a notorious operation in text-mining:

# Remove whitespace
string_1 <- c("  Remove whitespace  ");
string_1
[1] "  Remove whitespace  "

There goes {stringr} str_trim() to clean-up:

str_trim(string_1) # {stringr}
[1] "Remove whitespace"

There’s a side argument that we use to remove the leading (side = ‘left’) and trailing (side = ‘right’) whitespaces:

# remove leading whitespace
str_trim(string_1, side = "left")
[1] "Remove whitespace  "
# remove trailing whitespace
str_trim(string_1, side = "right")
[1] "  Remove whitespace"

Using {base} gsub() to remove all whitespace:

# remove all whitespace?
string_1 <- c("  Remove    whitespace  ") # how about this one?
string_1
[1] "  Remove    whitespace  "
# there are different ways to do it. Try:
gsub(" ", "", string_1, fixed = T) # (!(fixed==T)), the first (pattern) argument is regex
[1] "Removewhitespace"

gsub() is definitely something you need to learn about:

# replacing, in general:
string_1 <- "The quick brown fox jumps over the lazy dog The quick brown"
gsub("The quick brown", "The slow red", string_1, fixed=T)
[1] "The slow red fox jumps over the lazy dog The slow red"

Again, mind the fixed argument - by default, gsub() likes regular expressions.

2.3 Searching in strings

string_1
[1] "The quick brown fox jumps over the lazy dog The quick brown"

Does string_1 contain The quick brown?

# Searching for something in a string {stringr}
str_detect(string_1, "The quick brown") # T or F
[1] TRUE

Where is it? Use str_locate from {stringr}:

str_locate(string_1, "The quick brown")[[1]] # first match
[1] 1

And what if there is more than one match?

str_locate_all(string_1, "The quick brown")[[1]] # all matches
     start end
[1,]     1  15
[2,]    45  59

You might have heard that people in text-mining use term-frequency matrices a lot. These matrices typically list all interesting terms from a set of documents in their rows, and the documents themselves are represented by columns; cell entries are counts that provide an information on how many times a particular term have occurred in a particular document.

We will not build a full term-frequency matrix in R now (check the {tm} package for R’s functionality in text-mining), but only demonstrate how to use str_locate_all() to count the number of occurrences:

# term frequency, as we know, is very important in text-mining:
term1 <- str_locate_all(string_1, "The quick brown")[[1]] # all matches for term1 
# ie. "The quick brown"
term1
     start end
[1,]     1  15
[2,]    45  59

Hm, it’s easy now:

dim(term1)[1] # how many matches = how many rows in the str_locate_all output matrix
[1] 2

2.3 Sorting strings in R

# Sorting character vectors in R {base}
string_1 <- c("New York", "Paris", "London", "Moscow", "Tokyo")
string_1
[1] "New York" "Paris"    "London"   "Moscow"   "Tokyo"   

It’s really easy:

sort(string_1)
[1] "London"   "Moscow"   "New York" "Paris"    "Tokyo"   

And with decreasing=T:

sort(string_1, decreasing = T)
[1] "Tokyo"    "Paris"    "New York" "Moscow"   "London"  

Further Readings

R Markdown

R Markdown is what I have used to produce this beautiful Notebook. We will learn more about it near the end of the course, but if you already feel ready to dive deep, here’s a book: R Markdown: The Definitive Guide, Yihui Xie, J. J. Allaire, Garrett Grolemunds.

Exercises

A specialized R Markdown Notebook on Regular expressions will be shared soon. The exercises will be found there.


Goran S. Milovanović

DataKolektiv, 2020/21

contact:


License: GPLv3 This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.


LS0tDQp0aXRsZTogSW50cm8gdG8gRGF0YSBTY2llbmNlIChOb24tVGVjaG5pY2FsIEJhY2tncm91bmQsIFIpIC0gU2Vzc2lvbjA1QQ0KYXV0aG9yOg0KLSBuYW1lOiBHb3JhbiBTLiBNaWxvdmFub3ZpxIcsIFBoRA0KICBhZmZpbGlhdGlvbjogRGF0YUtvbGVrdGl2LCBDaGllZiBTY2llbnRpc3QgJiBPd25lcjsgRGF0YSBTY2llbnRpc3QgZm9yIFdpa2lkYXRhLCBXTURFDQphYnN0cmFjdDogDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6DQogICAgY29kZV9mb2xkaW5nOiBzaG93DQogICAgdGhlbWU6IHNwYWNlbGFiDQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZmxvYXQ6IHllcw0KICAgIHRvY19kZXB0aDogNQ0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogeWVzDQogICAgdG9jX2RlcHRoOiA1DQotLS0NCg0KIVtdKC4uL19pbWcvREtfTG9nb18xMDAucG5nKQ0KDQoqKioNCiMgU2Vzc2lvbiAwNTogVmVjdG9yIGFuZCBtYXRyaXggYXJpdGhtZXRpYy4gU3RyaW5ncyBhbmQgdGV4dDoge3N0cmluZ3J9DQogDQoqKkZlZWRiYWNrKiogc2hvdWxkIGJlIHNlbmQgdG8gYGdvcmFuLm1pbG92YW5vdmljQGRhdGFrb2xla3Rpdi5jb21gLiANClRoZXNlIG5vdGVib29rcyBhY2NvbXBhbnkgdGhlIEludHJvIHRvIERhdGEgU2NpZW5jZTogTm9uLVRlY2huaWNhbCBCYWNrZ3JvdW5kIGNvdXJzZSAyMDIwLzIxLg0KDQoqKioNCg0KIyMjIFdoYXQgZG8gd2Ugd2FudCB0byBkbyB0b2RheT8NCg0KV2UgaGF2ZSBsZWFybmVkIGEgbG90IGFib3V0IHZlY3RvcnMgYW5kIG1hdHJpY2VzIGluIFIgYWxyZWFkeS4gSG93ZXZlciwgZm9sbG93aW5nIHRoZSBmaXJzdCBmb3VyIChpbnRlbnNpdmUpIHNlc3Npb25zIG9uIFIgcHJvZ3JhbW1pbmcsIGNvdmVyaW5nIGV2ZXJ5dGhpbmcgZnJvbSB2ZWN0b3JzIGFuZCBsaXN0cyAodGhhdCBhcmUgYWxzbyB2ZWN0b3JzIGluIFIpIHRvIGl0ZXJhdGlvbnMsIGRlY2lzaW9ucywgYW5kIGZ1bmN0aW9ucyAuLi4gdGhhdCBrbm93bGVkZ2UgbWlnaHQgYmUgc2NhdHRlcmVkIGEgYml0LiBOb3cgd2Ugd2FudCB0byBjb25zb2xpZGF0ZSBvdXIga25vd2xlZGdlIG9uIHZlY3RvcnMgYW5kIHRoZW4gaW50cm9kdWNlIG11bHRpZGltZW5zaW9uYWwgYXJyYXlzIGFuZCBzb21lIGJhc2ljIGxpbmVhciBhbGdlYnJhLiBBZnRlciBhbGwsIHVuZGVyc3RhbmRpbmcgaG93ICp2ZWN0b3JzKiBvcGVyYXRlIGluIGEgKnZlY3Rvcml6ZWQgcHJvZ3JhbW1pbmcgbGFuZ3VhZ2UqIGlzIHByZXR0eSBtdWNoIHBhcnQgb2YgYmVpbmcgaW4gY29tbWFuZC4uLg0KRm9sbG93aW5nIG91ciBvdmVydmlldyBvZiB2ZWN0b3JzLCBtYXRyaWNlcywgYW5kIGFycmF5cywgd2UgcHJvY2VlZCB0byBhIHN1cGVyLWltcG9ydGFudCB0b3BpYyBvZiBzdHJpbmdzIGFuZCB0ZXh0IHByb2Nlc3NpbmcgaW4gUi4gV2UgaW50cm9kdWNlIHRoZSBbe3N0cmluZ3J9XShodHRwczovL3N0cmluZ3IudGlkeXZlcnNlLm9yZy8pIHBhY2thZ2UgYW5kIGRpc2N1c3MgdGhlIGJhc2ljcyBvZiBbUmVndWxhciBleHByZXNzaW9ucyAocmVnZXgpXShodHRwczovL3N0YXQuZXRoei5jaC9SLW1hbnVhbC9SLWRldmVsL2xpYnJhcnkvYmFzZS9odG1sL3JlZ2V4Lmh0bWwpLiBXaGlsZSBSZWd1bGFyIGV4cHJlc3Npb24gYXJlIGEgdG9waWMgdGhhdCBkZXNlcnZlcyBhIGNvdXJzZSBvbiB0aGVpciBvd24sIHRoZSBiYXNpY3MgYXJlIGRlZmluaXRlbHkgYW4gZXNzZW50aWFsIHBhcnQgb2YgYW55IERhdGEgU2NpZW5jZSBhbmQgQW5hbHl0aWNzIHJvbGUuDQoNCg0KDQojIyMgMC4gUHJlcmVxdWlzaXRzLg0KDQpJbnN0YWxsIHRoZSBmb2xsb3dpbmcgcGFja2FnZXM6DQoNCmBgYHtyIGVjaG8gPSBULCBldmFsID0gRiwgbWVzc2FnZSA9IEZ9DQppbnN0YWxsLnBhY2thZ2VzKCdzdHJpbmdyJykNCmBgYA0KDQoqKk5vdGUuKiogQnkgbm93LCBtYW55IG9mIHlvdSBoYXZlIHByb2JhYmx5IGFscmVhZHkgaW5zdGFsbGVkIFt7dGlkeXZlcnNlfV0oaHR0cHM6Ly93d3cudGlkeXZlcnNlLm9yZy8pLiBJZiB0aGF0IGlzIHRoZSBjYXNlLCBgbGlicmFyeSh0aWR5dmVyc2UpYCB3b3VsZCBkbyBqdXN0IGZpbmUgLSB7c3RyaW5ncn0gaXMgdGhlcmUuDQoNCiMjIyAxLiBWZWN0b3JzIGFuZCBtYXRyaWNlcw0KDQojIyMjIDEuMSBTdWJzZXR0aW5nIGFuZCByZWN5Y2xpbmcNCg0KQSByZW1pbmRlci4gRmlyc3Qgb2YgYWxsOiB2ZWN0b3JpemlhdGlvbiBpcyBhbHdheXMgdHVybmVkIG9uLCB0aGF0IGlzIHNpbXBseSB0aGUgbmF0dXJlIG9mIFIuLi4NCg0KYGBge3IgZWNobyA9IFR9DQphIDwtIGMoNywgMSwgMywgOSwgMTUpDQpiIDwtIDUNCmEgKyBiDQpgYGANCi4uLiBidXQgcmVjeWNsaW5nIGlzIGFsc28gYWx3YXlzIG9uOiB0aGUgcmVzdWx0IHRoYXQgd2UgaGF2ZSBvYnNlcnZlZCBpcyBhIGNvbnNlcXVlbmNlIG9mIHRoZSBmYWN0IHRoYXQgYGJgLCBhIG51bWVyaWMgdmVjdG9yIG9mIGxlbmd0aCBvbmUsIHdhcyByZWN5Y2xlZCBhcyBtYW55IHRpbWVzIGFzIHdhcyBuZWNlc3NhcnkgdG8gbWVldCB0aGUgbGVuZ3RoIG9mIGBhYCB3aGljaCBpcyBmaXZlLiBTZWU6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KYSA8LSAxOjEwDQpiIDwtIGMoMiwgMykNCmEgXiBiDQpgYGANClNxdWFyZSwgdGhlbiBjdWJlLCB0aGVuIHNxdWFyZSwgdGhlbiBjdWJlLi4uIGFuZCBzbyBvbi4gQmVjYXVzZSB3ZSBhcmUgcmVjeWNsaW5nIGBiIDwtIGMoMiwgMylgLg0KDQpUaGUgc2FtZSBmb3IgbWF0cmljZXM6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KYSA8LSBtYXRyaXgoMTo5LCANCiAgICAgICAgICAgIG5jb2wgPSAzKQ0KcHJpbnQoYSkNCmBgYA0KTm93DQoNCmBgYHtyIGVjaG8gPSBUfQ0KYV4yDQpgYGANCkJ1dA0KDQpgYGB7ciBlY2hvID0gVH0NCmFeYygyLCAzKQ0KYGBgDQpBZ2FpbjogaG93IGRvZXMgUiBvcmRlciB0aGUgaW5kaWNlcyBvZiBhIG1hdHJpeD8gTWluZCB0aGUgd2FybmluZywgYnkgdGhlIHdheS4NCg0KKipUaGUgcmVjeWNsaW5nIHJ1bGU6KioNCg0KPiBJZiB0d28gdmVjdG9ycyBhcmUgb2YgdW5lcXVhbCBsZW5ndGgsIHRoZSBzaG9ydGVyIG9uZSB3aWxsIGJlIHJlY3ljbGVkIGluIG9yZGVyIHRvIG1hdGNoIHRoZSBsb25nZXIgdmVjdG9yLg0KDQpOb3csIGFzIG9mIHN1YnNldHRpbmcgdmVjdG9ycyBhbmQgbWF0cmljZXMuDQoNCmBgYHtyIGVjaG8gPSBUfQ0KYSA8LSBzZXEoMiwgMTAwLCAyKQ0KcHJpbnQoYSkNCmBgYA0KV2UgY2FuIHN1YnNldCBieSBpbmRpY2VzOg0KDQpgYGB7ciBlY2hvID0gVH0NCmFbMToyMF0NCmBgYA0KQnV0IHdlIGNhbiBhbHNvIGNyZWF0ZSBhIG1hc2sgYW5kIHN1YnNldCBieSBpdDoNCg0KYGBge3IgZWNobyA9IFR9DQphIDwtIHNlcSgyLCAxMDAsIDIpDQpwcmludChhKQ0KYGBgDQoNCmBgYHtyIGVjaG8gPSBUfQ0KbWFzayA8LSByZXAoYyhULCBGKSwgdGltZXMgPSBsZW5ndGgoYSkvMikNCnByaW50KG1hc2spDQpgYGANCmBgYHtyIGVjaG8gPSBUfQ0KbGVuZ3RoKGEpID09IGxlbmd0aChtYXNrKQ0KYGBgDQpgYGB7ciBlY2hvID0gVH0NCmFfbWFzayA8LSBhW21hc2tdDQpwcmludChhX21hc2spDQpgYGANCmBgYHtyIGVjaG8gPSBUfQ0KbGVuZ3RoKGFfbWFzaykNCmBgYA0KKipSZW1pbmRlci4qKiBVbmlkaW1lbnNpb25hbCB2ZWN0b3JzIGRvIG5vdCBoYXZlIGEgZGltZW5zaW9uIGluIFI6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KcHJpbnQoYSkNCmBgYA0KDQpgYGB7ciBlY2hvID0gVH0NCmRpbShhKQ0KYGBgDQpUaGV5IG9ubHkgaGF2ZSBhIGxlbmd0aDoNCg0KYGBge3IgZWNobyA9IFR9DQpsZW5ndGgoYSkNCmBgYA0KVW5saWtlIG1hdHJpY2VzIG9yIGRhdGFmcmFtZXM6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KYSA8LSBtYXRyaXgoMTo5LCANCiAgICAgICAgICAgIG5jb2wgPSAzKQ0KZGltKGEpDQpgYGANCkRpZCB5b3UgZXZlciB0aGluayBhYm91dCB1c2luZyBuZWdhdGl2ZSBpbmRpY2VzPw0KDQpgYGB7ciBlY2hvID0gVH0NCmEgPC0gMToxMA0KYVstMl0NCmBgYA0KU28sIG5lZ2F0aXZlIGluZGljZXMgZGVsZXRlIGVsZW1lbnRzIGZyb20gYSB2ZWN0b3IsIGFzIHdlbGwgYXMgYEZBTFNFYCBkZWxldGVzIHRoZW0gd2hlbiB1c2VkIGluIGEgbWFzayEgU2VlOg0KDQpgYGB7ciBlY2hvID0gVH0NCmEgPC0gbWF0cml4KDE6OSwgDQogICAgICAgICAgICBucm93ID0gMykNCnByaW50KGEpDQpgYGANCk5vdzoNCg0KYGBge3IgZWNobyA9IFR9DQphWy0yLCAtMl0NCmBgYA0KDQpXaGF0IGhhcyBqdXN0IGhhcHBlbmVkPyBXZWxsLi4uIGBbLTIsIC0yXWAgbWVhbnM6IHJlbW92ZSB0aGUgMm5kIHJvdyBhbmQgdGhlIDJuZCBjb2x1bW4uIFRoZXJlIGFyZSBpbnRlcmVzdGluZyBjb21iaW5hdGlvbnMgdG8gcmVtZW1iZXIsIHN1Y2ggYXMuLi4NCg0KYGBge3IgZWNobyA9IFR9DQphWy0yLCBdDQpgYGANCg0KLi4uIHdoaWNoIHJlYWRzOiByZW1vdmUgdGhlIHNlY29uZCByb3csIGJ1dCBrZWVwIGFsbCBjb2x1bW5zLiBSZW1lbWJlciBob3cgd2UgdXNlZCB0byBzdWJzZXQgZGF0YWZyYW1lcz8gT3I6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KYVstMiwgM10NCmBgYA0KXl4gcmVtb3ZlZCB0aGUgMm5kIHJvdywgYW5kIHRoZW4ga2VwdCBldmVyeXRoaW5nIGZyb20gdGhlIDNyZCBjb2x1bW4gb2YgYGFgLiBNaW5kIHRoZSBjbGFzc2VzLCBpdCBpcyBub3QgYSBgbWF0cml4YCBhbnltb3JlLi4uDQoNCmBgYHtyIGVjaG8gPSBUfQ0KY2xhc3MoYVstMiwgM10pDQpgYGANCi4uLiBzbyBgZGltKGFbLTIsIDNdKWAgaXMsIG9mIGNvdXJzZToNCg0KYGBge3IgZWNobyA9IFR9DQpkaW0oYVstMiwgM10pDQpgYGANCg0KIyMjIyAxLjIgQmFzaWMgTGluZWFyIEFsZ2VicmENCg0KTGV0J3MgYmVnaW4gYnkgY3JlYXRpbmcgdHdvIHZlY3RvcnMsIGBhcnIxYCBhbmQgYGFycjJgOg0KDQpgYGB7ciBlY2hvID0gVH0NCmFycjEgPC0gc2VxKDIsMjAsMikNCmFycjIgPC0gc2VxKDEsMTksMikNCnByaW50KCJhcnIxOiAiKQ0KcHJpbnQoYXJyMSkNCnByaW50KCJhcnIyOiAiKQ0KcHJpbnQoYXJyMikNCmBgYA0KVmVjdG9yaXplZCwgZWxlbWVudC13aXNlIG11bHRpcGxpY2F0aW9uOg0KDQpgYGB7ciBlY2hvID0gVH0NCmFycjEgKiBhcnIyDQpgYGANCk5vdywgaW50cm9kdWNlIHRoZSBzY2FsYXIgcHJvZHVjdCAoImRvdCBwcm9kdWN0Iiwgb3IgImlubmVyIHByb2R1Y3QiOiB0aGUgc3VtIG9mIHRoZSBwcm9kdWN0cyBvZiB0aGUgY29ycmVzcG9uZGluZyBlbnRyaWVzIG9mIHRoZSB0d28gc2VxdWVuY2VzIG9mIG51bWJlcnMpIGluIFIgd2l0aCBgJSolYDoNCg0KYGBge3IgZWNobyA9IFR9DQphcnIxICUqJSBhcnIyDQpgYGANCndoaWNoIGlzLCBvZiBjb3Vyc2UsIHRoZSBzYW1lIGFzOg0KDQpgYGB7ciBlY2hvID0gVH0NCnN1bShhcnIxICogYXJyMikNCmBgYA0KTm93IHdlIGludHJvZHVjZSB0aGUgdHJhbnNwb3NlLCBgdCgpYC4gSXQgaXMgbW9yZSBpbnR1aXRpdmUgdG8gYmVnaW4gd2l0aCBhIG1hdHJpeDoNCg0KYGBge3IgZWNobyA9IFR9DQptYXQgPC0gbWF0cml4KDE6OSwgDQogICAgICAgICAgICAgIG5jb2wgPSAzKQ0KcHJpbnQobWF0KQ0KYGBgDQpBbmQgYHQobWF0KWAgaXM6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KdChtYXQpDQpgYGANCkl0IGlzIGVhc3kgdG8gdW5kZXJzdGFuZDogdGhlIHJvd3MgYmVjb21lIGNvbHVtbnMsIGFuZCB0aGUgY29sdW1ucyBiZWNvbWUgcm93cy4gQnV0IHdoYXQgaGFwcGVucyBpZiB3ZSB0cmFuc3Bvc2UgYSB1bmlkaW1lbnNpb25hbCBhcnJheSBvZiBudW1iZXJzPw0KDQpgYGB7ciBlY2hvID0gVH0NCnByaW50KGFycjEpDQpgYGANCmBgYHtyIGVjaG8gPSBUfQ0KdChhcnIxKQ0KYGBgDQpObyBkaWZmZXJlbmNlPyBOb3QgcmVhbGx5LiBSIGRlZmF1bHRzIHRvIGNvbHVtbiB2ZWN0b3JzOyBvbmx5IHRoZSBzZWNvbmQgZXhhbXBsZSAoaS5lLiBgdChhcnIxKWApIGlzIGEgcm93IHZlY3Rvci4NCg0KRG90IHByb2R1Y3QsIGFnYWluOg0KDQpgYGB7ciBlY2hvID0gVH0NCiMgLSBhcnIxIHdpbGwgYmVjb21lIGEgcm93IHZlY3RvciBhZnRlciB0KCk7DQojIC0gYXJyMiB3aWxsIHJlbWFpbiBhIGNvbHVtbiB2ZWN0b3I6DQp0KGFycjEpICUqJSBhcnIyDQpgYGANCkJ1dDoNCg0KYGBge3IgZWNobyA9IFR9DQojIC0gYXJyMSB3aWxsIGJlIGEgY29sdW1uIHZlY3RvcjsNCiMgLSBhcnIyIHdpbGwgYmVjb21lIGEgcm93IHZlY3RvciBhZnRlciB0KCk6DQphcnIxICUqJSB0KGFycjIpDQpgYGANCkEgZmFzdGVyIHdheSB0byBvYnRhaW4gYSBkb3QgcHJvZHVjdCBvZiB0d28gdmVjdG9ycyBpcyB0byB1c2UgYGNyb3NzcHJvZCgpYDoNCg0KYGBge3IgZWNobyA9IFR9DQpjcm9zc3Byb2QoYXJyMSxhcnIyKQ0KYGBgDQpCdXQgdGhlIGNsYXNzIG9mIGBjcm9zc3Byb2QoYXJyMSxhcnIyKWAgd2lsbCBiZToNCg0KYGBge3IgZWNobyA9IFR9DQpjbGFzcyhjcm9zc3Byb2QoYXJyMSxhcnIyKSkNCmBgYA0KYGRyb3AoKWAgY2FuIGJlIHVzZWQgdG8gc3RyaXAgdGhlIGBtYXRyaXhgIGFuZCBgYXJyYXlgIGNsYXNzZXMgYW5kIG9idGFpbiBhIHNjYWxhciB2YWx1ZSBhcyBhIHJlc3VsdDoNCg0KYGBge3IgZWNobyA9IFR9DQojIGFzIHNjYWxhcjoNCmRyb3AoY3Jvc3Nwcm9kKGFycjEsIGFycjIpKQ0KYGBgDQoNCkFsc28sIGEgbW9yZSBlZmZpY2llbnQgd2F5IHRvIG9idGFpbiBgYXJyMSAlKiUgdChhcnIyKWAgaXMgdG8gdXNlIGB0Y3Jvc3Nwcm9kdWN0KClgOg0KDQpgYGB7ciBlY2hvID0gVH0NCnRjcm9zc3Byb2QoYXJyMSwgYXJyMikNCmBgYA0KaW4gcGxhY2Ugb2YgdGhlIGFscmVhZHkgc2VlbiwgYnV0IHNsb3dlcjoNCg0KYGBge3IgZWNobyA9IFR9DQphcnIxICUqJSB0KGFycjIpDQpgYGANCg0KPiBOb3RlLiBGcm9tIHRoZSBbYGNyb3NzcHJvZCgpYCBkb2N1bWVudGF0aW9uXShodHRwczovL3N0YXQuZXRoei5jaC9SLW1hbnVhbC9SLXBhdGNoZWQvbGlicmFyeS9iYXNlL2h0bWwvY3Jvc3Nwcm9kLmh0bWwpOiBWZWN0b3JzIGFyZSBwcm9tb3RlZCB0byBzaW5nbGUtY29sdW1uIG9yIHNpbmdsZS1yb3cgbWF0cmljZXMsIGRlcGVuZGluZyBvbiB0aGUgY29udGV4dC4NCg0KQmFzaWMgbWF0cmljIGFsZ2VicmE6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KbWF0MSA8LSBtYXRyaXgoMTo5LCANCiAgICAgICAgICAgICAgIG5yb3cgPSAzKQ0KbWF0MQ0KYGBgDQoNCmBgYHtyIGVjaG8gPSBUfQ0KbWF0MiA8LSBtYXRyaXgoc2VxKDIsIDE4LCAyKSwgDQogICAgICAgICAgICAgICBucm93ID0gMykNCm1hdDINCmBgYA0KTWF0cml4IG11bHRpcGxpY2F0aW9uIHZlY3Rvcml6ZWQgaXMsIGFnYWluLCBlbGVtZW50LXdpc2UgaW4gUjoNCg0KYGBge3IgZWNobyA9IFR9DQptYXQxICogbWF0Mg0KYGBgDQpSZWFsIGFsZ2VicmFpYyBtYXRyaXggbXVsdGlwbGljYXRpb24gaXMgb2J0YWluZWQgYnkgYCUqJWA6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KbWF0MSAlKiUgbWF0Mg0KYGBgDQpBbmQgdGhlbiwgd2hhdCBpcyBvZnRlbiB1c2VkIGluIHN0YXRpc3RpY3MsIGBYJ1hgLCBpcyBvZiBjb3Vyc2U6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KY3Jvc3Nwcm9kKG1hdDEsIG1hdDIpDQpgYGANCg0Kd2hpY2ggaXMgdGhlIHNhbWUgYXMgKGxlc3MgZWZmaWNpZW50KToNCg0KYGBge3IgZWNobyA9IFR9DQp0KG1hdDEpICUqJSBtYXQyDQpgYGANCg0KDQpXaGlsZSBgWFgnYCBpczoNCg0KYGBge3IgZWNobyA9IFR9DQp0Y3Jvc3Nwcm9kKG1hdDEsIG1hdDIpDQpgYGANCnRoZSBzYW1lIGFzIChsZXNzIGVmZmljaWVudCk6DQoNCmBgYHtyIGVjaG8gPSBUfQ0KbWF0MSAlKiUgdChtYXQyKQ0KYGBgDQoNCiMjIyMgMS4zIE11bHRpZGltZW5zaW9uYWwgQXJyYXlzDQoNCk11bHRkaW1lbnNpb25hbCBhcnJheXMgaW4gUiBhcmUgY3JlYXRlZCBieSBgYXJyYXkoKWANCg0KYGBge3IgZWNobyA9IFR9DQppbnB1dCA8LSBjKDUsIDksIDMsIDEwLCAxMSwgMTIsIDEzLCAxNCwgMTUpIA0KbGVuZ3RoKGlucHV0KQ0KYGBgDQpgYGB7ciBlY2hvID0gVH0NCmFycjEgPC0gYXJyYXkodmVjdG9yMSwgDQogICAgICAgICAgICAgIGRpbSA9IGMoMywgMywgMikpIA0KcHJpbnQoYXJyMSkgDQpgYGANCg0KYGBge3IgZWNobyA9IFR9DQphcnIxWywgLCAxXQ0KYGBgDQpgYGB7ciBlY2hvID0gVH0NCmFycjFbLCAsIDJdDQpgYGANCkxldCdzIGNoZWNrIHNvbWV0aGluZzoNCg0KYGBge3IgZWNobyA9IFR9DQpwcm9kKGMoMywgMywgMikpID09IGxlbmd0aChpbnB1dCkNCmBgYA0KU28gYGFycjFgIHdhcyBwcm9kdWNlZCBieSByZWN5Y2xpbmcgLSB0aGF0IGlzIHdoeSBgYXJyMVsgLCAsIDFdYCBhbmQgYGFyclsgLCAsIDNdYCBhcmUgYGlkZW50aWNhbCgpYDoNCg0KYGBge3IgZWNobyA9IFR9DQppZGVudGljYWwoYXJyMVsgLCAsIDFdLCBhcnIxWyAsICwgMl0pDQpgYGANCkV2ZXJ5dGhpbmcgZWxzZSB3b3JrcyBhcyBleHBlY3RlZDoNCg0KYGBge3IgZWNobyA9IFR9DQphcHBseShhcnIxLCAxLCBzdW0pDQpgYGANCg0KYGBge3IgZWNobyA9IFR9DQphcHBseShhcnIxLCAyLCBzdW0pDQpgYGANCg0KYGBge3IgZWNobyA9IFR9DQphcHBseShhcnIxLCAzLCBzdW0pDQpgYGANCg0KIyMjIDIgU3RyaW5ncyBhbmQgcmVnZXgNCg0KV2Ugd2lsbCBub3cgcHJvdmlkZSBhIHZlcnkgc2hvcnQgYW5kIGNvbmNpc2Ugb3ZlcnZpZXcgb2Ygc29tZSBvZiB0aGUgUidzIGZ1bmN0aW9uYWxpdHkgZm9yIHN0cmluZyBwcm9jZXNzaW5nLiBUaGUgbGF0ZXIgaXMgZm91bmQgYW1vbmcgdGhlIG1vc3QgaW50ZXJlc3RpbmcgYW5kIGRpZmZpY3VsdCB0b3BpY3MgaW4gY29tcHV0ZXIgc2NpZW5jZS4gT24gdGhlIG90aGVyIGhhbmQsIHRoZSB3b3JrIG9mIGEgY29udGVtcG9yYXJ5IERhdGEgU2NpZW50aXN0IC0gYSBwcmFjdGl0aW9uZXIgd2hvIG5lZWRzIHRvIGludmVzdCB0aW1lIGFuZCByZXNvdXJjZXMgdG8gZ2V0IGl0cyBkYXRhIHNldHMgY2xlYW5lZCBhbmQgcHJvcGVybHkgZm9ybWF0dGVkIGZvciBtYXRoZW1hdGljYWwgbW9kZWxpbmcgLSBpcyBoZWF2aWx5IGxvYWRlZCB3aXRoIHRleHQgYW5kIHN0cmluZyBwcm9jZXNzaW5nIHN0ZXBzLiBNYW55IGRhdGEgc291cmNlcyB0aGF0IGFyZSBhdmFpbGFibGUgb3V0IHRoZXJlIHByb3ZpZGUgb25seSB1bnN0cnVjdHVyZWQsIG9yIHNlbWktc3RydWN0dXJlZCBkYXRhLCBhbmQgdGhhdCdzIHdlcmUgdGhlIHNraWxscyBvZiBzdHJpbmcgaGFuZGxpbmcsIHRleHQgcHJvY2Vzc2luZywgYW5kLCBmaW5hbGx5LCBkYXRhIHdyYW5nbGluZyAobmV4dCBzZXNzaW9uKSBjb21lIGludG8gcGxheS4gVGhlIGNhdmVhdCBoZXJlIGlzIHRoYXQgc3RyaW5nIHByb2Nlc3NpbmcgaXMgYSAqaHVnZSogZG9tYWluIGluIGl0c2VsZiwgYW5kIHRoYXQgaXMgd2h5IHdlIGNhbiBwcm92aWRlIGFuIG92ZXJ2aWV3IGFuZCBhbiBpbnRyb2R1Y3Rpb24gaGVyZS4gSXQncyBvbmUgb2YgdGhvc2UgdGhpbmdzIHdlcmUgYSBkaXNjaXBsZSBiZWNvbWVzIGFuIGV4cGVydCBieSBuZWNlc3NpdHksIGFuZCB3ZXJlIHByb2dyZXNzIHJlYWxseSBtZWFucyAqcHJhY3RpY2UqLiANCg0KVG8gZ28gYmV5b25kIHRoaXMgc2Vzc2lvbjogW0dhc3RvbiBTYW5jaGV6J3MgIkhhbmRsaW5nIGFuZCBQcm9jZXNzaW5nIFN0cmluZ3MgaW4gUiJdKGh0dHA6Ly9nYXN0b25zYW5jaGV6LmNvbS9IYW5kbGluZ19hbmRfUHJvY2Vzc2luZ19TdHJpbmdzX2luX1IucGRmKSBpcyBwcm9iYWJseSB0aGUgYmVzdCB0aGF0IGlzIG91dCB0aGVyZS4NCg0KYGBgIHtyIGVjaG89VH0NCmxpYnJhcnkoc3RyaW5ncikNCmBgYA0KDQpPbiB7c3RyaW5ncn0sIGZyb20gW0ludHJvZHVjdGlvbiB0byBzdHJpbmdyLCAyMDE2LTA4LTE5XShodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvc3RyaW5nci92aWduZXR0ZXMvc3RyaW5nci5odG1sKTogIipTaW1wbGlmaWVzIHN0cmluZyBvcGVyYXRpb25zIGJ5IGVsaW1pbmF0aW5nIG9wdGlvbnMgdGhhdCB5b3UgZG9u4oCZdCBuZWVkIDk1JSBvZiB0aGUgdGltZSAodGhlIG90aGVyIDUlIG9mIHRoZSB0aW1lIHlvdSBjYW4gZnVuY3Rpb25zIGZyb20gYmFzZSBSIG9yIHN0cmluZ2kpKiIgLSBhbmQgaXQgcmVhbGxseSBkb2VzLiBOb3csDQoNCktpY2sgaXQhIFN0cmluZ3MgaW4gUiBhcmUgY2hhcmFjdGVyIHZlY3RvcnM6DQoNCmBgYCB7ciBlY2hvPVR9DQpzdHJpbmdfMSA8LSAiSGVsbG8gd29ybGQiDQpzdHJpbmdfMiA8LSAiU3VuIHNoaW5lcyEiDQpgYGANCg0KYGBgIHtyIGVjaG89VH0NCnN0cmluZ18xDQpgYGANCg0KYGBgIHtyIGVjaG89VH0NCnN0cmluZ18yDQpgYGANCg0KYGBgIHtyIGVjaG89VH0NCmlzLmNoYXJhY3RlcihzdHJpbmdfMSkgIyBUUlVFDQpgYGANCmBgYCB7ciBlY2hvPVR9DQphcy5jaGFyYWN0ZXIoMjAwKjUpDQpgYGANCg0KYGBgIHtyIGVjaG89VH0NCmFzLm51bWVyaWMoIjEwMDAiKQ0KYGBgDQoNCmBgYCB7ciBlY2hvPVR9DQphcy5kb3VibGUoIjMuMTQiKQ0KYGBgDQoNClJlbWVtYmVyIHRoZSBgY2hhcmFjdGVyYCBkYXRhIHR5cGU/IFN0cmluZ3MgaW4gUiBhcmUgbm90aGluZyBidXQgaW5zdGFudGlhdGlvbnMgb2YgdGhpcyBkYXRhIHR5cGUuIEEgYGNoYXJhY3RlcmAgaXMgYSB2ZXJ5ICJvbGQiIGRhdGEgdHlwZSBpbiBSLCBzbyB0aGF0IGFsbCBpbnRlZ2VycyBhbmQgZG91YmxlZCBjb2VyY2UgdG8gY2hhcmFjdGVycyB3aGVuIGFwcHJvcHJpYXRlLiBGb3IgZXhhbXBsZSwNCg0KYGBgIHtyIGVjaG8gPSBUfQ0KbnVtYmVyIDwtIDEwDQpwYXN0ZSgiVGV4dCIsIG51bWJlcikNCmBgYA0KDQpXZSB3aWxsIGRpc2N1c3MgYHBhc3RlKClgIGxhdGVyLCBidXQgeW91IGNhbiBzZWUgZnJvbSB0aGUgZXhhbXBsZSB0aGF0IGlzICJwdXRzIHRoaW5ncyB0b2dldGhlciBpbnRvIGEgY2hhcmFjdGVyIHZlY3RvciIgKGl0ICpjb25jYXRlbmF0ZXMqIHN0cmluZ3MsIHRlY2huaWNhbGx5KS4gSG93ZXZlciwgdGhlIG51bWVyaWMgYDEwYCBpcyBsb3N0IGluIGEgbmV3IHN0cmluZywgaXNuJ3QgaXQuLi4gaW4gUiBjb2VyY2lvbiwgYGNoYXJhY3RlcmAgZWF0cyBldmVyeXRoaW5nLg0KDQpPbmUgbmVlZHMgdG8gYmUgY2FyZWZ1bCB3aGVuIGl0IGNvbWVzIHRvIHF1b3Rpbmcgc3RyaW5nIGNvbnN0YW50cyBoZXJlIChpLmUuIG1pbmRpbmcgdGhlIG9jY2FzaW9uIHdoZW4gdGhlIHVzYWdlIG9mIGAnYCBhbmQgYCJgIGlzIGFwcHJvcHJpYXRlKToNCg0KYGBgIHtyIGVjaG89VH0NCiMgVXNpbmcgIiBhbmQgJw0KIyBlaXRoZXI6DQpzdHJpbmdfMSA8LSAiSGVsbG8gJ1dvcmxkJyINCnN0cmluZ18xDQpgYGANCg0KYGBgIHtyIGVjaG89VH0NCiMgb3INCnN0cmluZ18xIDwtICdIZWxsbyAiV29ybGQiJw0Kc3RyaW5nXzEgIyBwcmludHM6ICJIZWxsbyBcIldvcmxkXCIiIC0gd2hhdCBpcyB0aGlzOiBcID8NCmBgYA0KDQpXaGF0IGlzIHRoaXM6IGBcYD8hISBJdCB3YXMgbm90IGluIG15IHN0cmluZz8gRG9uJ3Qgd29ycnksIGBcYCBpcyBSJ3MgZXNjYXBlIGNoYXJhY3Rlci4gSW4gdGhlIGNoYXJhY3RlciB2ZWN0b3IgYWJvdmUgLSBgJ0hlbGxvICJXb3JsZCInYCAtIHdlIGZpbmQgdHdvIGluc3RhbnRpYXRpb25zIG9mIGAiYCBlbmNsb3NlZCBieSBgJ2AuIE9uIHRoZSBvdXRwdXQsIFIgdHJhbnNmZXJyZWQgYWxsIGluc3RhbnRpYXRpb25zIG9mIGAnYCB0byBgImAsIG1ha2luZyBpdCBmb3VyIGluc3RhbnRpYXRpb25zIG9mIGAiYCBhbHRvZ2V0aGVyIG5vdy4gVGhlIGVzY2FwZSBjaGFyYWN0ZXIgYFxgIGlzIHVzZWQgdG8gc2lnbmFsIHRoYXQgdGhlICoqc2Vjb25kKiogaW5zdGFudGlhdGlvbiBvZiBgImAgaXMgbm90IGEgYmVnaW5uaW5nIG9mIGEgbmV3IHN0cmluZywgYnV0IGEgdG9rZW4gdG8gYmUgcHJpbnRlZCwgYW5kIHRoYXQgdGhlIHRoaXJkIGluc3RhbnRpYXRpb24gb2YgYCJgIGlzIG5vdCBhbiBlbmRpbmcgb2YgYSBzdHJpbmcsIGJ1dCBhbHNvIGEgdG9rZW4gdG8gYmUgcHJpbnRlZCB0byB0aGUgb3V0cHV0IGRldmljZS4NCg0KSWYgeW91IGNhcmUgYWJvdXQgdGhpcyBtdWNoLCB0YWtlIGEgbG9vayBhdCB0aGUgZGlmZmVyZW5jZSBiZXR3ZWVuIGB3cml0ZUxpbmVzKClgIGFuZCBgcHJpbnQoKWA6DQoNCmBgYCB7ciBlY2hvPVR9DQojIHRyeToNCndyaXRlTGluZXMoc3RyaW5nXzEpDQpwcmludChzdHJpbmdfMSkNCmBgYA0KDQpZb3UgY291bGQgYWxzbyBzdGFydCBleHBlcmltZW50aW5nIHdpdGggYGNhdCgpYC4gTW9yZSBvbiBlc2NhcGlzbSBpbiBSOg0KDQpgYGAge3IgZWNobz1UfQ0KIyBFc2NhcGluZyBpbiBSOiB1c2UgXCwgdGhlIFIgZXNjYXBlIGNoYXJhY3Rlcg0Kc3RyaW5nXzEgPC0gJ0hlbGxvIFwiV29ybGRcIicNCnN0cmluZ18xDQpgYGANCmBgYCB7ciBlY2hvPVR9DQp3cml0ZUxpbmVzKHN0cmluZ18xKQ0KYGBgDQoNCkVzY2FwaW5nIHRoZSBlc2NhcGUgY2hhcmFjdGVyOg0KDQpgYGAge3IgZWNobz1UfQ0Kd3JpdGVMaW5lcygiXFwiKSAjIG5pY2UNCmBgYA0KDQpZZXMgdGhhdCdzIGhvdyB5b3UgZ2V0IHRvIHVzZSB0aGUgZXNjYXBlIGNoYXJhY3RlciBhcyBhIHByaW50YWJsZSBjaGFyYWN0ZXIgaW4gUiwgaWYgeW91IHdlcmUgd29uZGVyaW5nLiBXYWl0IHVudGlsIGl0IGNvbWVzIHRvIHJlZ3VsYXIgZXhwcmVzc2lvbnMgd2hlcmUgdGhpbmdzIGluIFIgcmVhbGx5IHRlbmQgdG8gZ2V0IG5hc3R5Lg0KDQoqKioNCg0KIyMjIyAyLjEgRWxlbWVudGFyeSBGdW5jdGlvbnMgb24gU3RyaW5ncyBpbiBSDQoNClRvIG9idGFpbiBhIGxlbmd0aCBvZiBhIHN0cmluZyBpbiBSLi4uDQoNCmBgYCB7ciBlY2hvPVR9DQojIExlbmd0aCBvZiBzdHJpbmdzDQpsZW5ndGgoc3RyaW5nXzEpICMgb2YgY291cnNlDQpgYGANCg0KQnV0IG9mIGNvdXJzZSBpdCBpcy4gTWF5YmUgYG5jaGFyKClgIHdvdWxkIGRvIGJldHRlcjoNCg0KYGBgIHtyIGVjaG89VH0NCm5jaGFyKHN0cmluZ18xKSAjIGJhc2UgZnVuY3Rpb24NCmBgYA0KDQpDb25jYXRlbmF0aW5nIHN0cmluZ3MgaW4gUjoNCg0KYGBgIHtyIGVjaG89VH0NCnN0cmluZ18zIDwtIGMoc3RyaW5nXzEsIHN0cmluZ18yKSAjIGEgY2hhcmFjdGVyIHZlY3RvciBvZiBsZW5ndGggPT0gMg0Kd3JpdGVMaW5lcyhzdHJpbmdfMykNCmBgYA0KDQpOby4gTm8sIG5vLCBuby4uLiB0aGF0J3MgYSBjaGFyYWN0ZXIgdmVjdG9yIG9mIGxlbmd0aCA9PSAyLCB3ZSBuZWVkIHRvIHVzZSBgcGFzdGUoKWAgaGVyZToNCg0KYGBgIHtyIGVjaG89VH0NCnN0cmluZ18zIDwtIHBhc3RlKHN0cmluZ18xLCBzdHJpbmdfMiwgc2VwID0gIiwgIikgIyBsZW5ndGggPT0gMSwgYmFzZSBmdW5jdGlvbg0Kd3JpdGVMaW5lcyhzdHJpbmdfMykNCmBgYA0KDQpXaGVyZSB7YmFzZX0gaGFzIGBwYXN0ZSgpYCwge3N0cmluZ3J9IGhhcyBgc3RyX2MoKWA6DQoNCmBgYCB7ciBlY2hvPVR9DQpzdHJEIDwtIGMoIkZpcnN0IiwgIlNlY29uZCIsICJUaGlyZCIpDQojIGJvdGggcGFzdGUge2Jhc2V9IGFuZCBzdHJfYyB7c3RyaW5ncn0gYXJlIHZlY3Rvcml6ZWQNCnBhc3RlKCJQcmVmaXgtIiwgc3RyRCwgc2VwID0gIi0iKSAjIC0gYmFzZSBSDQpzdHJfYygiUHJlZml4LSIsIHN0ckQsIHNlcCA9ICItIikgIyB7c3RyaW5ncn0NCmBgYA0KDQpIb3cgdG8gc3BsaXQgc3RyaW5ncyBpbnRvIHN1YmNvbXBvbmVudHM/IEluIHtiYXNlfSBpdCdzIGRvbmUgYnkgYHN0cnNwbGl0KClgLCB3aGlsZSB7c3RyaW5ncn0gaGFzICdzdHJfc3BsaXQoKSc6DQoNCmBgYCB7ciBlY2hvID0gVH0NCiMgU3BsaXR0aW5nIHN0cmluZ3MgaW4gUg0KIyB3aXRoIHN0cnNwbGl0IHtiYXNlfQ0Kc3RyaW5nXzEgPC0gIlRoZSBxdWljayBicm93biBmb3gganVtcHMgb3ZlciB0aGUgbGF6eSBkb2ciDQpzdHJpbmdfMQ0KYGBgDQpCYXNlIFI6DQoNCmBgYCB7ciBlY2hvID0gVH0NCnNwbGl0QSA8LSBzdHJzcGxpdChzdHJpbmdfMSwgIiAiKSAjIGlzLmxpc3Qoc3BsaXRBKSA9PSBUDQpzcGxpdEENCmBgYA0KDQpgc3Ryc3BsaXQoKWAgcmV0dXJucyBhIGxpc3Q7IGB1bmxpc3QoKWAgaXQgdG8gZ2V0IHRvIHlvdXIgcmVzdWx0Og0KDQpgYGAge3IgZWNobyA9IFR9DQpzcGxpdEEgPC0gdW5saXN0KHN0cnNwbGl0KHN0cmluZ18xLCAiICIpKQ0Kc3BsaXRBDQpgYGANCg0KRXh0cmFjdGluZyBhIHBhcnQgb2YgaXQgYnkgY29tYmluaW5nIGBzdHJzcGxpdCgpYCBhbmQgYHBhc3RlKClgOg0KDQpgYGAge3IgZWNobyA9IFR9DQojICJUaGUgcXVpY2sgYnJvd24iIGZyb20gIlRoZSBxdWljayBicm93biBmb3gganVtcHMgb3ZlciB0aGUgbGF6eSBkb2ciDQpzcGxpdEEgPC0gcGFzdGUodW5saXN0KHN0cnNwbGl0KHN0cmluZ18xLCIgIikpWzE6M10sIGNvbGxhcHNlID0gIiAiKQ0Kc3BsaXRBDQpgYGANCg0KYGBgIHtyIGVjaG8gPSBUfQ0Kc3RyaW5nXzENCmBgYA0KDQpUaGVyZSdzIGEgYGZpeGVkYCBhcmd1bWVudCB0aGF0IHlvdSBuZWVkIHRvIGtub3cgYWJvdXQgaW4gYHN0cnNwbGl0KClgOg0KDQpgYGAge3IgZWNobyA9IFR9DQpzcGxpdEEgPC0gc3Ryc3BsaXQoc3RyaW5nXzEsIiAiKQ0Kc3BsaXRBDQpgYGANCg0KYGBgIHtyIGVjaG8gPSBUfQ0Kc3BsaXRBIDwtIHN0cnNwbGl0KHN0cmluZ18xLCIgIiwgZml4ZWQgPSBUKSANCiMgZml4ZWQ9VCBzYXlzOiBtYXRjaCB0aGUgc3BsaXQgYXJndW1lbnQgDQojIGV4YWN0bHksIG90aGVyd2lzZSwgc3BsaXQgaXMgYW4gcmVndWxhciBleHByZXNzaW9uOyBkZWZhdWx0IGlzOiBmaXhlZCA9IEZBTFNFDQpzcGxpdEENCg0KYGBgDQoNClRoZSBgc3RyX3NwbGl0KClgIGZ1bmN0aW9uIGluIHtzdHJpbmdyfSBoYXMgc29tZSB2ZXJ5IHVzZWZ1bCwgYWRkaXRpb25hbCBmdW5jdGlvbmFsaXR5IGluIGNvbXBhcmlzb24gdG8ge2Jhc2V9IGBzdHJwbGl0KClgLiBGb3IgZXhhbXBsZToNCg0KYGBgIHtyIGVjaG8gPSBUfQ0Kc3RyaW5nXzExIDwtICJBYm92ZSBhbGwsIGRvbid0IGxpZSB0byB5b3Vyc2VsZi4gVGhlIG1hbiB3aG8gbGllcyB0byBoaW1zZWxmIGFuZCBsaXN0ZW5zIHRvIGhpcyBvd24gbGllIGNvbWVzIHRvIGEgcG9pbnQgdGhhdCBoZSBjYW5ub3QgZGlzdGluZ3Vpc2ggdGhlIHRydXRoIHdpdGhpbiBoaW0sIG9yIGFyb3VuZCBoaW0sIGFuZCBzbyBsb3NlcyBhbGwgcmVzcGVjdCBmb3IgaGltc2VsZiBhbmQgZm9yIG90aGVycy4gQW5kIGhhdmluZyBubyByZXNwZWN0IGhlIGNlYXNlcyB0byBsb3ZlLiINCnN0cmluZ18xMQ0KYGBgDQoNCmBgYCB7ciBlY2hvID0gVH0NCnN0cl9zcGxpdChzdHJpbmdfMTEsIGJvdW5kYXJ5KCJ3b3JkIikpDQpgYGANCg0KYGBgIHtyIGVjaG8gPSBUfQ0KIyBpbmNsdWRpbmcgcHVuY3R1YXRpb24gYW5kIHNwZWNpYWwgY2hhcmFjdGVycw0Kc3RyX3NwbGl0KHN0cmluZ18xMSwgYm91bmRhcnkoIndvcmQiLCBza2lwX3dvcmRfbm9uZSA9IEYpKQ0KYGBgDQoNCg0KIyMjIyAyLjIgU3Vic2V0dGluZyBhbmQgdHJhbnNmb3JtaW5nIHN0cmluZ3MNCg0KU2VlLCBJIGhhdmUgYSBjaGFyYWN0ZXIgdmVjdG9yLCBhbmQgSSBuZWVkIG9ubHkgdGhlIGZpcnN0IHRocmVlIGNoYXJhY3RlcnMgZnJvbSBlYWNoIGNvbXBvbmVudDoNCg0KYGBgIHtyIGVjaG8gPSBUfQ0KIyBTdWJzZXR0aW5nIHN0cmluZ3MNCnN0cmluZ18xIDwtIGMoIkRhdGEiLCAiU2NpZW5jZSIsICJTZXJiaWEiKQ0KIyB7YmFzZX0NCnN1YnN0cihzdHJpbmdfMSwgMSwgMykNCmBgYA0KDQpMZXQncyBzdGFydCB0cmFuc2Zvcm1pbmcgc3RyaW5ncyB3aXRoIGBzdWJzdHIoKWA6DQoNCmBgYCB7ciBlY2hvID0gVH0NCiMge2Jhc2V9DQpzdHJpbmdfMiA8LSBzdHJpbmdfMSAjIGp1c3QgYSBjb3B5IG9mIHN0cmluZ18xDQpzdWJzdHIoc3RyaW5nXzIsIDEsIDMpIDwtICJXb3dXb3ciICMgY2hlY2sgdGhlIHJlc3VsdCENCnN0cmluZ18yDQpgYGANCg0KYGBgIHtyIGVjaG8gPSBUfQ0Kc3Vic3RyKHN0cmluZ18yLCAxLCA0KSA8LSAiV293V293IiAjIGNoZWNrIHRoZSByZXN1bHQhDQpzdHJpbmdfMg0KYGBgDQoNCmBgYCB7ciBlY2hvID0gVH0NCnN1YnN0cihzdHJpbmdfMiwgMSwgNikgPC0gIldvd1dvdyIgIyBjaGVjayB0aGUgcmVzdWx0IQ0Kc3RyaW5nXzINCmBgYA0KDQpVUFBFUiBDQVNFIHRvIGxvd2VyIGNhc2Ugdy4gYHRvbG93ZXIoKWA6DQoNCmBgYCB7ciBlY2hvID0gVH0NCnN0cmluZ18xIDwtICJCZWxncmFkZSINCiMge2Jhc2V9DQp0b2xvd2VyKHN0cmluZ18xKQ0KYGBgDQoNCk5vdyBldmVyeXRoaW5nIHRvIFVQUEVSIENBU0Ugd2l0aCB7YmFzZX0gYHRvdXBwZXIoKWA6DQoNCmBgYCB7ciBlY2hvID0gVH0NCnN0cmluZ18xIDwtIHRvbG93ZXIoc3RyaW5nXzEpDQp0b3VwcGVyKHN0cmluZ18xKQ0KYGBgDQoNCkEgdXNlZnVsIHtzdHJpbmdyfSBmdW5jdGlvbiBgc3RyX3RvX3RpdGxlKClgIGNhcGl0YWxpemVzIG9ubHkgdGhlIGZpcnN0IGNoYXJhY3RlcjoNCg0KYGBgIHtyIGVjaG8gPSBUfQ0Kc3RyaW5nXzEgPC0gYygiYmVsZ3JhZGUiLCAicGFyaXMiLCAibG9uZG9uIiwgIm1vc2NvdyIpDQpzdHJfdG9fdGl0bGUoc3RyaW5nXzEpDQpgYGANCg0KUmVtb3Zpbmcgb3ZlcmhlYWQgd2hpdGUgc3BhY2VzIGZyb20gc3RyaW5ncyBpcyBhIG5vdG9yaW91cyBvcGVyYXRpb24gaW4gdGV4dC1taW5pbmc6DQoNCmBgYCB7ciBlY2hvID0gVH0NCiMgUmVtb3ZlIHdoaXRlc3BhY2UNCnN0cmluZ18xIDwtIGMoIiAgUmVtb3ZlIHdoaXRlc3BhY2UgICIpOw0Kc3RyaW5nXzENCmBgYA0KDQpUaGVyZSBnb2VzIHtzdHJpbmdyfSBgc3RyX3RyaW0oKWAgdG8gY2xlYW4tdXA6DQoNCmBgYCB7ciBlY2hvID0gVH0NCnN0cl90cmltKHN0cmluZ18xKSAjIHtzdHJpbmdyfQ0KYGBgDQoNClRoZXJlJ3MgYSBgc2lkZWAgYXJndW1lbnQgdGhhdCB3ZSB1c2UgdG8gcmVtb3ZlIHRoZSBsZWFkaW5nIChzaWRlID0gJ2xlZnQnKSBhbmQgdHJhaWxpbmcgKHNpZGUgPSAncmlnaHQnKSB3aGl0ZXNwYWNlczoNCg0KYGBgIHtyIGVjaG8gPSBUfQ0KIyByZW1vdmUgbGVhZGluZyB3aGl0ZXNwYWNlDQpzdHJfdHJpbShzdHJpbmdfMSwgc2lkZSA9ICJsZWZ0IikNCmBgYA0KDQpgYGAge3IgZWNobyA9IFR9DQojIHJlbW92ZSB0cmFpbGluZyB3aGl0ZXNwYWNlDQpzdHJfdHJpbShzdHJpbmdfMSwgc2lkZSA9ICJyaWdodCIpDQpgYGANCg0KVXNpbmcge2Jhc2V9IGBnc3ViKClgIHRvIHJlbW92ZSBhbGwgd2hpdGVzcGFjZToNCg0KYGBgIHtyIGVjaG8gPSBUfQ0KIyByZW1vdmUgYWxsIHdoaXRlc3BhY2U/DQpzdHJpbmdfMSA8LSBjKCIgIFJlbW92ZSAgICB3aGl0ZXNwYWNlICAiKSAjIGhvdyBhYm91dCB0aGlzIG9uZT8NCnN0cmluZ18xDQpgYGANCg0KYGBgIHtyIGVjaG8gPSBUfQ0KIyB0aGVyZSBhcmUgZGlmZmVyZW50IHdheXMgdG8gZG8gaXQuIFRyeToNCmdzdWIoIiAiLCAiIiwgc3RyaW5nXzEsIGZpeGVkID0gVCkgIyAoIShmaXhlZD09VCkpLCB0aGUgZmlyc3QgKHBhdHRlcm4pIGFyZ3VtZW50IGlzIHJlZ2V4DQpgYGANCg0KYGdzdWIoKWAgaXMgZGVmaW5pdGVseSBzb21ldGhpbmcgeW91IG5lZWQgdG8gbGVhcm4gYWJvdXQ6DQoNCmBgYCB7ciBlY2hvID0gVH0NCiMgcmVwbGFjaW5nLCBpbiBnZW5lcmFsOg0Kc3RyaW5nXzEgPC0gIlRoZSBxdWljayBicm93biBmb3gganVtcHMgb3ZlciB0aGUgbGF6eSBkb2cgVGhlIHF1aWNrIGJyb3duIg0KZ3N1YigiVGhlIHF1aWNrIGJyb3duIiwgIlRoZSBzbG93IHJlZCIsIHN0cmluZ18xLCBmaXhlZD1UKQ0KYGBgDQoNCkFnYWluLCBtaW5kIHRoZSBgZml4ZWRgIGFyZ3VtZW50IC0gYnkgZGVmYXVsdCwgYGdzdWIoKWAgbGlrZXMgcmVndWxhciBleHByZXNzaW9ucy4NCg0KIyMjIyAyLjMgU2VhcmNoaW5nIGluIHN0cmluZ3MNCg0KYGBgIHtyIGVjaG8gPSBUfQ0Kc3RyaW5nXzENCmBgYA0KDQpEb2VzIGBzdHJpbmdfMWAgY29udGFpbiBgVGhlIHF1aWNrIGJyb3duYD8NCg0KYGBgIHtyIGVjaG8gPSBUfQ0KIyBTZWFyY2hpbmcgZm9yIHNvbWV0aGluZyBpbiBhIHN0cmluZyB7c3RyaW5ncn0NCnN0cl9kZXRlY3Qoc3RyaW5nXzEsICJUaGUgcXVpY2sgYnJvd24iKSAjIFQgb3IgRg0KYGBgDQoNCldoZXJlIGlzIGl0PyBVc2UgYHN0cl9sb2NhdGVgIGZyb20ge3N0cmluZ3J9Og0KDQpgYGAge3IgZWNobyA9IFR9DQpzdHJfbG9jYXRlKHN0cmluZ18xLCAiVGhlIHF1aWNrIGJyb3duIilbWzFdXSAjIGZpcnN0IG1hdGNoDQpgYGANCg0KQW5kIHdoYXQgaWYgdGhlcmUgaXMgbW9yZSB0aGFuIG9uZSBtYXRjaD8NCg0KYGBgIHtyIGVjaG8gPSBUfQ0Kc3RyX2xvY2F0ZV9hbGwoc3RyaW5nXzEsICJUaGUgcXVpY2sgYnJvd24iKVtbMV1dICMgYWxsIG1hdGNoZXMNCmBgYA0KDQpZb3UgbWlnaHQgaGF2ZSBoZWFyZCB0aGF0IHBlb3BsZSBpbiB0ZXh0LW1pbmluZyB1c2UgKnRlcm0tZnJlcXVlbmN5IG1hdHJpY2VzKiBhIGxvdC4gVGhlc2UgbWF0cmljZXMgdHlwaWNhbGx5IGxpc3QgYWxsIGludGVyZXN0aW5nIHRlcm1zIGZyb20gYSBzZXQgb2YgZG9jdW1lbnRzIGluIHRoZWlyIHJvd3MsIGFuZCB0aGUgZG9jdW1lbnRzIHRoZW1zZWx2ZXMgYXJlIHJlcHJlc2VudGVkIGJ5IGNvbHVtbnM7IGNlbGwgZW50cmllcyBhcmUgY291bnRzIHRoYXQgcHJvdmlkZSBhbiBpbmZvcm1hdGlvbiBvbiBob3cgbWFueSB0aW1lcyBhIHBhcnRpY3VsYXIgdGVybSBoYXZlIG9jY3VycmVkIGluIGEgcGFydGljdWxhciBkb2N1bWVudC4NCg0KV2Ugd2lsbCBub3QgYnVpbGQgYSBmdWxsIHRlcm0tZnJlcXVlbmN5IG1hdHJpeCBpbiBSIG5vdyAoY2hlY2sgdGhlIHt0bX0gcGFja2FnZSBmb3IgUidzIGZ1bmN0aW9uYWxpdHkgaW4gdGV4dC1taW5pbmcpLCBidXQgb25seSBkZW1vbnN0cmF0ZSBob3cgdG8gdXNlIGBzdHJfbG9jYXRlX2FsbCgpYCB0byBjb3VudCB0aGUgbnVtYmVyIG9mIG9jY3VycmVuY2VzOg0KDQpgYGAge3IgZWNobyA9IFR9DQojIHRlcm0gZnJlcXVlbmN5LCBhcyB3ZSBrbm93LCBpcyB2ZXJ5IGltcG9ydGFudCBpbiB0ZXh0LW1pbmluZzoNCnRlcm0xIDwtIHN0cl9sb2NhdGVfYWxsKHN0cmluZ18xLCAiVGhlIHF1aWNrIGJyb3duIilbWzFdXSAjIGFsbCBtYXRjaGVzIGZvciB0ZXJtMSANCiMgaWUuICJUaGUgcXVpY2sgYnJvd24iDQp0ZXJtMQ0KYGBgDQoNCkhtLCBpdCdzIGVhc3kgbm93Og0KDQpgYGAge3IgZWNobyA9IFR9DQpkaW0odGVybTEpWzFdICMgaG93IG1hbnkgbWF0Y2hlcyA9IGhvdyBtYW55IHJvd3MgaW4gdGhlIHN0cl9sb2NhdGVfYWxsIG91dHB1dCBtYXRyaXgNCmBgYA0KDQojIyMjIDIuMyBTb3J0aW5nIHN0cmluZ3MgaW4gUg0KDQpgYGAge3IgZWNobyA9IFR9DQojIFNvcnRpbmcgY2hhcmFjdGVyIHZlY3RvcnMgaW4gUiB7YmFzZX0NCnN0cmluZ18xIDwtIGMoIk5ldyBZb3JrIiwgIlBhcmlzIiwgIkxvbmRvbiIsICJNb3Njb3ciLCAiVG9reW8iKQ0Kc3RyaW5nXzENCmBgYA0KDQpJdCdzIHJlYWxseSBlYXN5Og0KDQpgYGAge3IgZWNobyA9IFR9DQpzb3J0KHN0cmluZ18xKQ0KYGBgDQoNCkFuZCB3aXRoIGBkZWNyZWFzaW5nPVRgOg0KDQpgYGAge3IgZWNobyA9IFR9DQpzb3J0KHN0cmluZ18xLCBkZWNyZWFzaW5nID0gVCkNCmBgYA0KDQoNCiMjIyBGdXJ0aGVyIFJlYWRpbmdzDQoNCi0gT25jZSBhZ2FpbjogW0dhc3RvbiBTYW5jaGV6J3MgIkhhbmRsaW5nIGFuZCBQcm9jZXNzaW5nIFN0cmluZ3MgaW4gUiJdKGh0dHA6Ly9nYXN0b25zYW5jaGV6LmNvbS9IYW5kbGluZ19hbmRfUHJvY2Vzc2luZ19TdHJpbmdzX2luX1IucGRmKSAtIHRoZSBjaGFuY2VzIHlvdSB3aWxsIGV2ZXIgbmVlZCBtb3JlIHRoYW4gd2hhdCdzIGNvdmVyZWQgaW4gdGhpcyB0ZXh0LWJvb2sgYXJlIHNsaW0uDQoNCi0gKipSZWd1bGFyIEV4cHJlc3Npb25zKio6IGdvIHByby4gW1JlZ3VsYXItRXhwcmVzc2lvbnMuaW5mb10oaHR0cDovL3d3dy5yZWd1bGFyLWV4cHJlc3Npb25zLmluZm8vKSBpcyBhIHdlbGwga25vd24gbGVhcm5pbmcgcmVzb3VyY2UuIEluIG9yZGVyIHRvIGZpZ3VyZSBvdXQgdGhlIHNwZWNpZmljIHJlZ2V4IHN0YW5kYXJkIHVzZWQgaW4gUjogW1JlZ3VsYXIgRXhwcmVzc2lvbnMgYXMgdXNlZCBpbiBSXShodHRwczovL3N0YXQuZXRoei5jaC9SLW1hbnVhbC9SLWRldmVsL2xpYnJhcnkvYmFzZS9odG1sL3JlZ2V4Lmh0bWwpLiBbVGhpcyBzZWN0aW9uIG9mIFJlZ3VsYXItRXhwcmVzc2lvbnMuaW5mb10oaHR0cDovL3d3dy5yZWd1bGFyLWV4cHJlc3Npb25zLmluZm8vcmxhbmd1YWdlLmh0bWwpIGlzIG9uIHJlZ2V4IGluIFIgc3BlY2lmaWNhbGx5Lg0KDQoNCiMjIyBSIE1hcmtkb3duDQoNCltSIE1hcmtkb3duXShodHRwczovL3JtYXJrZG93bi5yc3R1ZGlvLmNvbS8pIGlzIHdoYXQgSSBoYXZlIHVzZWQgdG8gcHJvZHVjZSB0aGlzIGJlYXV0aWZ1bCBOb3RlYm9vay4gV2Ugd2lsbCBsZWFybiBtb3JlIGFib3V0IGl0IG5lYXIgdGhlIGVuZCBvZiB0aGUgY291cnNlLCBidXQgaWYgeW91IGFscmVhZHkgZmVlbCByZWFkeSB0byBkaXZlIGRlZXAsIGhlcmUncyBhIGJvb2s6IFtSIE1hcmtkb3duOiBUaGUgRGVmaW5pdGl2ZSBHdWlkZSwgWWlodWkgWGllLCBKLiBKLiBBbGxhaXJlLCBHYXJyZXR0IEdyb2xlbXVuZHMuXShodHRwczovL2Jvb2tkb3duLm9yZy95aWh1aS9ybWFya2Rvd24vKSANCg0KIyMjIEV4ZXJjaXNlcw0KDQpBIHNwZWNpYWxpemVkIFIgTWFya2Rvd24gTm90ZWJvb2sgb24gUmVndWxhciBleHByZXNzaW9ucyB3aWxsIGJlIHNoYXJlZCBzb29uLiBUaGUgZXhlcmNpc2VzIHdpbGwgYmUgZm91bmQgdGhlcmUuDQoNCioqKg0KR29yYW4gUy4gTWlsb3Zhbm92acSHDQoNCkRhdGFLb2xla3RpdiwgMjAyMC8yMQ0KDQpjb250YWN0OiBnb3Jhbi5taWxvdmFub3ZpY0BkYXRha29sZWt0aXYuY29tDQoNCiFbXSguLi9faW1nL0RLX0xvZ29fMTAwLnBuZykNCg0KKioqDQpMaWNlbnNlOiBbR1BMdjNdKGh0dHA6Ly93d3cuZ251Lm9yZy9saWNlbnNlcy9ncGwtMy4wLnR4dCkNClRoaXMgTm90ZWJvb2sgaXMgZnJlZSBzb2Z0d2FyZTogeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yIG1vZGlmeSBpdCB1bmRlciB0aGUgdGVybXMgb2YgdGhlIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNlbnNlIGFzIHB1Ymxpc2hlZCBieSB0aGUgRnJlZSBTb2Z0d2FyZSBGb3VuZGF0aW9uLCBlaXRoZXIgdmVyc2lvbiAzIG9mIHRoZSBMaWNlbnNlLCBvciAoYXQgeW91ciBvcHRpb24pIGFueSBsYXRlciB2ZXJzaW9uLg0KVGhpcyBOb3RlYm9vayBpcyBkaXN0cmlidXRlZCBpbiB0aGUgaG9wZSB0aGF0IGl0IHdpbGwgYmUgdXNlZnVsLCBidXQgV0lUSE9VVCBBTlkgV0FSUkFOVFk7IHdpdGhvdXQgZXZlbiB0aGUgaW1wbGllZCB3YXJyYW50eSBvZiBNRVJDSEFOVEFCSUxJVFkgb3IgRklUTkVTUyBGT1IgQSBQQVJUSUNVTEFSIFBVUlBPU0UuICBTZWUgdGhlIEdOVSBHZW5lcmFsIFB1YmxpYyBMaWNlbnNlIGZvciBtb3JlIGRldGFpbHMuDQpZb3Ugc2hvdWxkIGhhdmUgcmVjZWl2ZWQgYSBjb3B5IG9mIHRoZSBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSBhbG9uZyB3aXRoIHRoaXMgTm90ZWJvb2suIElmIG5vdCwgc2VlIDxodHRwOi8vd3d3LmdudS5vcmcvbGljZW5zZXMvPi4NCg0KKioqDQoNCg==