Go 语言入门学习之正则表达式_Golang

前言

在计算中，我们经常需要将特定模式的字符或字符子集匹配为另一个字符串中的字符串。此技术用于使用特别的语法来搜索给定字符串中的特定字符集。

如果搜索到的模式匹配，或者在目标字符串中找到给定的子集，则搜索被称为成功；否则被认为是不成功的。

什么是正则表达式

正则表达式（或 RegEx）是一个特殊的字符序列，它定义了用于匹配特定文本的搜索模式。在 Golang 中，有一个内置的正则表达式包: regexp 包，其中包含所有操作列表，如过滤、修改、替换、验证或提取。

正则表达式可以用于文本搜索和更高级的文本操作。正则表达式内置于 grep 和 sed 等工具，vi 和 emacs 等文本编辑器，Go、Java 和 Python 等编程语言中。表达式的语法主要遵循这些流行语言中使用的已建立的 RE2 语法。 RE2 语法是 PCRE 的一个子集，有各种注意事项。

MatchString 函数

MatchString() 函数报告作为参数传递的字符串是否包含正则表达式模式的任何匹配项。

				?

									package main

									import (

									"fmt"

									"log"

									"regexp"

									)

									func main() {

									words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}

									for _, word := range words {

									found, err := regexp.MatchString(".even", word)

									if err != nil {

									log.Fatal(err)

									}

									if found {

									fmt.Printf("%s matches\n", word)

									} else {

									fmt.Printf("%s does not match\n", word)

									}

									}

									}

运行该代码：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

但同时我们能看到编辑器有提示：

Go 语言入门学习之正则表达式

编译器已经开始提醒我们，MatchString 直接使用性能很差，所以考虑使用 regexp.Compile 函数。

Compile 函数

Compile 函数解析正则表达式，如果成功，则返回可用于匹配文本的 Regexp 对象。编译的正则表达式产生更快的代码。

MustCompile 函数是一个便利函数，它编译正则表达式并在无法解析表达式时发生 panic。

				?

									package main

									import (

									"fmt"

									"log"

									"regexp"

									)

									func main() {

									words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}

									re, err := regexp.Compile(".even")

									if err != nil {

									log.Fatal(err)

									}

									for _, word := range words {

									found := re.MatchString(word)

									if found {

									fmt.Printf("%s matches\n", word)

									} else {

									fmt.Printf("%s does not match\n", word)

									}

									}

									}

在代码示例中，我们使用了编译的正则表达式。

				?

									re, err := regexp.Compile(".even")

即使用 Compile 编译正则表达式。然后在返回的正则表达式对象上调用 MatchString 函数：

				?

									found := re.MatchString(word)

运行程序，能看到同样的代码：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

MustCompile 函数

				?

									package main

									import (

									"fmt"

									"regexp"

									)

									func main() {

									words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}

									re := regexp.MustCompile(".even")

									for _, word := range words {

									found := re.MatchString(word)

									if found {

									fmt.Printf("%s matches\n", word)

									} else {

									fmt.Printf("%s does not match\n", word)

									}

									}

									}

FindAllString 函数

FindAllString 函数返回正则表达式的所有连续匹配的切片。

				?

									package main

									import (

									"fmt"

									"os"

									"regexp"

									)

									func main() {

									var content = `Foxes are omnivorous mammals belonging to several genera

									of the family Canidae. Foxes have a flattened skull, upright triangular ears,

									a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every

									continent except Antarctica. By far the most common and widespread species of

									fox is the red fox.`

									re := regexp.MustCompile("(?i)fox(es)?")

									found := re.FindAllString(content, -1)

									fmt.Printf("%q\n", found)

									if found == nil {

									fmt.Printf("no match found\n")

									os.Exit(1)

									}

									for _, word := range found {

									fmt.Printf("%s\n", word)

									}

									}

在代码示例中，我们找到了单词 fox 的所有出现，包括它的复数形式。

				?

									re := regexp.MustCompile("(?i)fox(es)?")

使用 (?i) 语法，正则表达式不区分大小写。（es）？表示“es”字符可能包含零次或一次。

				?

									found := re.FindAllString(content, -1)

我们使用 FindAllString 查找所有出现的已定义正则表达式。第二个参数是要查找的最大匹配项； -1 表示搜索所有可能的匹配项。

运行结果：

["Foxes" "Foxes" "Foxes" "fox" "fox"]
Foxes
Foxes
Foxes
fox
fox

FindAllStringIndex 函数

				?

									package main

									import (

									"fmt"

									"regexp"

									)

									func main() {

									var content = `Foxes are omnivorous mammals belonging to several genera

									of the family Canidae. Foxes have a flattened skull, upright triangular ears,

									a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every

									continent except Antarctica. By far the most common and widespread species of

									fox is the red fox.`

									re := regexp.MustCompile("(?i)fox(es)?")

									idx := re.FindAllStringIndex(content, -1)

									for _, j := range idx {

									match := content[j[0]:j[1]]

									fmt.Printf("%s at %d:%d\n", match, j[0], j[1])

									}

									}

在代码示例中，我们在文本中找到所有出现的 fox 单词及其索引。

Foxes at 0:5
Foxes at 81:86
Foxes at 196:201
fox at 296:299
fox at 311:314

Split 函数

Split 函数将字符串切割成由定义的正则表达式分隔的子字符串。它返回这些表达式匹配之间的子字符串切片。

				?

									package main

									import (

									"fmt"

									"log"

									"regexp"

									"strconv"

									)

									func main() {

									var data = `22, 1, 3, 4, 5, 17, 4, 3, 21, 4, 5, 1, 48, 9, 42`

									sum := 0

									re := regexp.MustCompile(",\s*")

									vals := re.Split(data, -1)

									for _, val := range vals {

									n, err := strconv.Atoi(val)

									sum += n

									if err != nil {

									log.Fatal(err)

									}

									}

									fmt.Println(sum)

									}

在代码示例中，我们有一个逗号分隔的值列表。我们从字符串中截取值并计算它们的总和。

				?

									re := regexp.MustCompile(",\s*")

正则表达式包括一个逗号字符和任意数量的相邻空格。

				?

									vals := re.Split(data, -1)

我们得到了值的一部分。

				?

									for _, val := range vals {

									n, err := strconv.Atoi(val)

									sum += n

									if err != nil {

									log.Fatal(err)

									}

									}

我们遍历切片并计算总和。切片包含字符串；因此，我们使用 strconv.Atoi 函数将每个字符串转换为整数。

运行代码：

189

Go 正则表达式捕获组

圆括号 () 用于创建捕获组。这允许我们将量词应用于整个组或将交替限制为正则表达式的一部分。为了找到捕获组（Go 使用术语子表达式），我们使用 FindStringSubmatch 函数。

				?

									package main

									import (

									"fmt"

									"regexp"

									)

									func main() {

									websites := [...]string{"webcode.me", "zetcode.com", "freebsd.org", "netbsd.org"}

									re := regexp.MustCompile("(\w+)\.(\w+)")

									for _, website := range websites {

									parts := re.FindStringSubmatch(website)

									for i, _ := range parts {

									fmt.Println(parts[i])

									}

									fmt.Println("---------------------")

									}

									}

在代码示例中，我们使用组将域名分为两部分。

				?

									re := regexp.MustCompile("(\w+)\.(\w+)")

我们用括号定义了两个组。

				?

									parts := re.FindStringSubmatch(website)

FindStringSubmatch 返回包含匹配项的字符串切片，包括来自捕获组的字符串。

运行代码：

$ go run capturegroups.go
webcode.me
webcode
me
---------------------
zetcode.com
zetcode
com
---------------------
freebsd.org
freebsd
org
---------------------
netbsd.org
netbsd
org
---------------------

正则表达式替换字符串

可以用 ReplaceAllString 替换字符串。该方法返回修改后的字符串。

				?

									package main

									import (

									"fmt"

									"io/ioutil"

									"log"

									"net/http"

									"regexp"

									"strings"

									)

									func main() {

									resp, err := http.Get("http://webcode.me")

									if err != nil {

									log.Fatal(err)

									}

									defer resp.Body.Close()

									body, err := ioutil.ReadAll(resp.Body)

									if err != nil {

									log.Fatal(err)

									}

									content := string(body)

									re := regexp.MustCompile("<[^>]*>")

									replaced := re.ReplaceAllString(content, "")

									fmt.Println(strings.TrimSpace(replaced))

									}

该示例读取网页的 HTML 数据并使用正则表达式去除其 HTML 标记。

				?

									resp, err := http.Get("http://webcode.me")

我们使用 http 包中的 Get 函数创建一个 GET 请求。

				?

									body, err := ioutil.ReadAll(resp.Body)

我们读取响应对象的主体。

				?

									re := regexp.MustCompile("<[^>]*>")

这个模式定义了一个匹配 HTML 标签的正则表达式。

				?

									replaced := re.ReplaceAllString(content, "")

我们使用 ReplaceAllString 方法删除所有标签。

ReplaceAllStringFunc 函数

ReplaceAllStringFunc 返回一个字符串的副本，其中正则表达式的所有匹配项都已替换为指定函数的返回值。

				?

									package main

									import (

									"fmt"

									"regexp"

									"strings"

									)

									func main() {

									content := "an old eagle"

									re := regexp.MustCompile(`[^aeiou]`)

									fmt.Println(re.ReplaceAllStringFunc(content, strings.ToUpper))

									}

在代码示例中，我们将 strings.ToUpper 函数应用于字符串的所有字符。

				?

									$ go run replaceallfunc.go

									aN oLD eaGLe

总结

模式匹配在根据基于正则表达式和语法的特定搜索模式在字符串中搜索某些字符集时起着重要作用。

匹配的模式允许我们从字符串中提取所需的数据并以我们喜欢的方式对其进行操作。理解和使用正则表达式是处理文本的关键。

在实践中，程序员会保留一组常用的正则表达式来匹配电子邮件、电话号码等，并在需要时使用和重用它。

到此这篇关于Go 语言入门学习之正则表达式的文章就介绍到这了,更多相关Go正则表达式内容请搜索服务器之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持服务器之家！

原文链接：https://blog.51cto.com/yuzhou1su/5252080

Go 语言入门学习之正则表达式

前言

什么是正则表达式

MatchString 函数

Compile 函数

MustCompile 函数

FindAllString 函数

FindAllStringIndex 函数

Split 函数

Go 正则表达式捕获组

正则表达式替换字符串

ReplaceAllStringFunc 函数

总结

延伸 · 阅读

go语言中linkname的用法

goland设置控制台折叠效果

golang实现mysql数据库事务的提交与回滚

浅谈golang并发操作变量安全的问题

golang数组-----寻找数组中缺失的整数方法

golang gorm的预加载及软删硬删的数据操作示例

详解go-admin在线开发平台学习(安装、配置、启动)

Go语言执行系统命令行命令的方法

图文详解go语言反射实现原理

go语言开发环境安装及第一个go程序(推荐)

在Visual Studio Code中配置GO开发环境的详细教程

Win7环境下搭建Go开发环境(基于VSCode编辑器)

Go语言使用HTTP包创建WEB服务器的方法

Go语言eclipse环境搭建图文教程

Go语言实现简单的一个静态WEB服务器

Go语言实现的一个简单Web服务器

Golang 内存模型详解（一）