【自然语言处理篇】--Chatterbot聊天机器人
一、前述
ChatterBot是一个基于机器学习的聊天机器人引擎,构建在python上,主要特点是可以自可以从已有的对话中进行学(jiyi)习(pipei)。
二、具体
1、安装
是的,安装超级简单,用pip就可以啦
pip install chatterbot
2、流程
大家已经知道chatterbot的聊天逻辑和输入输出以及存储,是由各种adapter来限定的,我们先看看流程图,一会软再一起看点例子,看看怎么用。
3、每个部分都设计了不同的“适配器”(Adapter)。
机器人应答逻辑 => Logic Adapters
Closest Match Adapter 字符串模糊匹配(编辑距离)
Closest Meaning Adapter 借助nltk的WordNet,近义词评估
Time Logic Adapter 处理涉及时间的提问
Mathematical Evaluation Adapter 涉及数学运算
存储器后端 => Storage Adapters
Read Only Mode 只读模式,当有输入数据到chatterbot的时候,数
据库并不会发生改变
Json Database Adapter 用以存储对话数据的接口,对话数据以Json格式
进行存储。
Mongo Database Adapter 以MongoDB database方式来存储对话数据
输入形式 => Input Adapters
Variable input type adapter 允许chatter bot接收不同类型的输入的,如strings,dictionaries和Statements
Terminal adapter 使得ChatterBot可以通过终端进行对话
HipChat Adapter 使得ChatterBot 可以从HipChat聊天室获取输入语句,通过HipChat 和 ChatterBot 进行对话
Speech recognition 语音识别输入,详见chatterbot-voice
输出形式 => Output Adapters
Output format adapter支持text,json和object格式的输出
Terminal adapter
HipChat Adapter
Mailgun adapter允许chat bot基于Mailgun API进行邮件的发送
Speech synthesisTTS(Text to speech)部分,详见chatterbot-voice
4、代码
基础版本
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
# 构建ChatBot并指定Adapter
bot = ChatBot(
\'Default Response Example Bot\',
storage_adapter=\'chatterbot.storage.JsonFileStorageAdapter\',#存储的Adapter
logic_adapters=[
{
\'import_path\': \'chatterbot.logic.BestMatch\'#回话逻辑
},
{
\'import_path\': \'chatterbot.logic.LowConfidenceAdapter\',#回话逻辑
\'threshold\': 0.65,#低于置信度,则默认回答
\'default_response\': \'I am sorry, but I do not understand.\'
}
],
trainer=\'chatterbot.trainers.ListTrainer\'#给定的语料是个列表
)
# 手动给定一点语料用于训练
bot.train([
\'How can I help you?\',
\'I want to create a chat bot\',
\'Have you read the documentation?\',
\'No, I have not\',
\'This should help get you started: http://chatterbot.rtfd.org/en/latest/quickstart.html\'
])
# 给定问题并取回结果
question = \'How do I make an omelette?\'
print(question)
response = bot.get_response(question)
print(response)
print("\n")
question = \'how to make a chat bot?\'
print(question)
response = bot.get_response(question)
print(response)
结果:
How do I make an omelette?
I am sorry, but I do not understand.
how to make a chat bot?
Have you read the documentation?
处理时间和数学计算的Adapter
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
bot = ChatBot(
"Math & Time Bot",
logic_adapters=[
"chatterbot.logic.MathematicalEvaluation",
"chatterbot.logic.TimeLogicAdapter"
],
input_adapter="chatterbot.input.VariableInputTypeAdapter",
output_adapter="chatterbot.output.OutputAdapter"
)
# 进行数学计算
question = "What is 4 + 9?"
print(question)
response = bot.get_response(question)
print(response)
print("\n")
# 回答和时间相关的问题
question = "What time is it?"
print(question)
response = bot.get_response(question)
print(response)
结果:
What is 4 + 9?
( 4 + 9 ) = 13
What time is it?
The current time is 05:08 PM
导出语料到json文件
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
\'\'\'
如果一个已经训练好的chatbot,你想取出它的语料,用于别的chatbot构建,可以这么做
\'\'\'
chatbot = ChatBot(
\'Export Example Bot\',
trainer=\'chatterbot.trainers.ChatterBotCorpusTrainer\'
)
# 训练一下咯
chatbot.train(\'chatterbot.corpus.english\')
# 把语料导出到json文件中
chatbot.trainer.export_for_training(\'./my_export.json\')
反馈式学习聊天机器人
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
import logging
"""
反馈式的聊天机器人,会根据你的反馈进行学习
"""
# 把下面这行前的注释去掉,可以把一些信息写入日志中
# logging.basicConfig(level=logging.INFO)
# 创建一个聊天机器人
bot = ChatBot(
\'Feedback Learning Bot\',
storage_adapter=\'chatterbot.storage.JsonFileStorageAdapter\',
logic_adapters=[
\'chatterbot.logic.BestMatch\'
],
input_adapter=\'chatterbot.input.TerminalAdapter\',#命令行端
output_adapter=\'chatterbot.output.TerminalAdapter\'
)
DEFAULT_SESSION_ID = bot.default_session.id
def get_feedback():
from chatterbot.utils import input_function
text = input_function()
if \'Yes\' in text:
return True
elif \'No\' in text:
return False
else:
print(\'Please type either "Yes" or "No"\')
return get_feedback()
print(\'Type something to begin...\')
# 每次用户有输入内容,这个循环就会开始执行
while True:
try:
input_statement = bot.input.process_input_statement()
statement, response = bot.generate_response(input_statement, DEFAULT_SESSION_ID)
print(\'\n Is "{}" this a coherent response to "{}"? \n\'.format(response, input_statement))
if get_feedback():
bot.learn_response(response,input_statement)
bot.output.process_response(response)
# 更新chatbot的历史聊天数据
bot.conversation_sessions.update(
bot.default_session.id_string,
(statement, response, )
)
# 直到按ctrl-c 或者 ctrl-d 才会退出
except (KeyboardInterrupt, EOFError, SystemExit):
break
使用Ubuntu数据集构建聊天机器人
from chatterbot import ChatBot
import logging
\'\'\'
这是一个使用Ubuntu语料构建聊天机器人的例子
\'\'\'
# 允许打日志
logging.basicConfig(level=logging.INFO)
chatbot = ChatBot(
\'Example Bot\',
trainer=\'chatterbot.trainers.UbuntuCorpusTrainer\'
)
# 使用Ubuntu数据集开始训练
chatbot.train()
# 我们来看看训练后的机器人的应答
response = chatbot.get_response(\'How are you doing today?\')
print(response)
借助微软的聊天机器人
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
from settings import Microsoft
\'\'\'
关于获取微软的user access token请参考以下的文档
https://docs.botframework.com/en-us/restapi/directline/
\'\'\'
chatbot = ChatBot(
\'MicrosoftBot\',
directline_host = Microsoft[\'directline_host\'],
direct_line_token_or_secret = Microsoft[\'direct_line_token_or_secret\'],
conversation_id = Microsoft[\'conversation_id\'],
input_adapter=\'chatterbot.input.Microsoft\',
output_adapter=\'chatterbot.output.Microsoft\',
trainer=\'chatterbot.trainers.ChatterBotCorpusTrainer\'
)
chatbot.train(\'chatterbot.corpus.english\')
# 是的,会一直聊下去
while True:
try:
response = chatbot.get_response(None)
# 直到按ctrl-c 或者 ctrl-d 才会退出
except (KeyboardInterrupt, EOFError, SystemExit):
break
HipChat聊天室Adapter
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
from settings import HIPCHAT
\'\'\'
炫酷一点,你可以接到一个HipChat聊天室,你需要一个user token,下面文档会告诉你怎么做
https://developer.atlassian.com/hipchat/guide/hipchat-rest-api/api-access-tokens
\'\'\'
chatbot = ChatBot(
\'HipChatBot\',
hipchat_host=HIPCHAT[\'HOST\'],
hipchat_room=HIPCHAT[\'ROOM\'],
hipchat_access_token=HIPCHAT[\'ACCESS_TOKEN\'],
input_adapter=\'chatterbot.input.HipChat\',
output_adapter=\'chatterbot.output.HipChat\',
trainer=\'chatterbot.trainers.ChatterBotCorpusTrainer\'
)
chatbot.train(\'chatterbot.corpus.english\')
# 没错,while True,会一直聊下去!
while True:
try:
response = chatbot.get_response(None)
# 直到按ctrl-c 或者 ctrl-d 才会退出
except (KeyboardInterrupt, EOFError, SystemExit):
break
邮件回复的聊天系统
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
from settings import MAILGUN
\'\'\'
这个功能需要你新建一个文件settings.py,并在里面写入如下的配置:
MAILGUN = {
"CONSUMER_KEY": "my-mailgun-api-key",
"API_ENDPOINT": "https://api.mailgun.net/v3/my-domain.com/messages"
}
\'\'\'
# 下面这个部分可以改成你自己的邮箱
FROM_EMAIL = "mailgun@salvius.org"
RECIPIENTS = ["gunthercx@gmail.com"]
bot = ChatBot(
"Mailgun Example Bot",
mailgun_from_address=FROM_EMAIL,
mailgun_api_key=MAILGUN["CONSUMER_KEY"],
mailgun_api_endpoint=MAILGUN["API_ENDPOINT"],
mailgun_recipients=RECIPIENTS,
input_adapter="chatterbot.input.Mailgun",
output_adapter="chatterbot.output.Mailgun",
storage_adapter="chatterbot.storage.JsonFileStorageAdapter",
database="../database.db"
)
# 简单的邮件回复
response = bot.get_response("How are you?")
print("Check your inbox at ", RECIPIENTS)
一个中文的例子
注意chatterbot,中文聊天机器人的场景下一定要用python3.X,用python2.7会有编码问题。
#!/usr/bin/python
# -*- coding: utf-8 -*-
#手动设置一些语料
from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
Chinese_bot = ChatBot("Training demo")
Chinese_bot.set_trainer(ListTrainer)
Chinese_bot.train([
\'你好\',
\'你好\',
\'有什么能帮你的?\',
\'想买数据科学的课程\',
\'具体是数据科学哪块呢?\'
\'机器学习\',
])
# 测试一下
question = \'你好\'
print(question)
response = Chinese_bot.get_response(question)
print(response)
print("\n")
question = \'请问哪里能买数据科学的课程\'
print(question)
response = Chinese_bot.get_response(question)
print(response)
结果:
你好
你好
请问哪里能买数据科学的课程
具体是数据科学哪块呢?
利用已经提供好的小中文语料库
#!/usr/bin/python
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer
chatbot = ChatBot("ChineseChatBot")
chatbot.set_trainer(ChatterBotCorpusTrainer)
# 使用中文语料库训练它
chatbot.train("chatterbot.corpus.chinese")
# 开始对话
while True:
print(chatbot.get_response(input(">")))