正規表現

reモジュールの基本

import re

# match: 先頭からマッチ
m = re.match(r'\d+', '123abc')
print(m.group())  # 123

# search: 文字列全体から最初のマッチ
m = re.search(r'\d+', 'abc123def')
print(m.group())  # 123

# findall: 全てのマッチをリストで返す
result = re.findall(r'\d+', 'a1b22c333')
print(result)  # ['1', '22', '333']

# sub: 置換
result = re.sub(r'\d+', 'X', 'a1b22c333')
print(result)  # aXbXcX

基本パターン

パターン	意味
`.`	任意の1文字
`\d`	数字 [0-9]
`\w`	英数字とアンダースコア [a-zA-Z0-9_]
`\s`	空白文字
`^`	行頭
`$`	行末
`*`	0回以上の繰り返し
`+`	1回以上の繰り返し
`?`	0回または1回
`{n,m}`	n回以上m回以下
`[]`	文字クラス
`\\|`	OR（いずれか）

グループ

# キャプチャグループ
m = re.search(r'(\d{4})-(\d{2})-(\d{2})', '日付: 2026-03-15')
if m:
    print(m.group())   # 2026-03-15
    print(m.group(1))  # 2026
    print(m.group(2))  # 03
    print(m.group(3))  # 15

# 名前付きグループ
m = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', '2026-03-15')
if m:
    print(m.group('year'))   # 2026
    print(m.group('month'))  # 03

コンパイル済みパターン

# 繰り返し使うパターンはコンパイルする
email_pattern = re.compile(
    r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
)

emails = email_pattern.findall('連絡先: user@example.com または admin@test.co.jp')
print(emails)  # ['user@example.com', 'admin@test.co.jp']

実用パターン

# 電話番号
phone = re.compile(r'0\d{1,4}-\d{1,4}-\d{4}')
print(phone.findall('電話: 03-1234-5678, 090-1234-5678'))

# URL
url = re.compile(r'https?://[\w/:%#\$&\?\(\)~\.=\+\-]+')
print(url.findall('サイト https://example.com/path?q=1 を参照'))

# HTMLタグの中身を取得
html = '<h1>タイトル</h1><p>本文</p>'
tags = re.findall(r'<(\w+)>(.*?)</\1>', html)
print(tags)  # [('h1', 'タイトル'), ('p', '本文')]

まとめ

re.search() で文字列内の最初のマッチを検索
re.findall() で全マッチをリストで取得
re.sub() でパターンマッチした部分を置換
名前付きグループで可読性を向上
よく使うパターンは re.compile() でコンパイルする

正規表現 - テキストのパターンマッチング

正規表現

reモジュールの基本

基本パターン

グループ

コンパイル済みパターン

実用パターン

まとめ