glom初级教程
1.glom介绍
通常对于字典和json的提取我们都是使用如下方式
>>> data = {\'a\': {\'b\': {\'c\': \'d\'}}} >>> data[\'a\'][\'b\'][\'c\'] \'d\'
这种方式看起来简单,但是如果字段结构改变就引发了悲剧
>>> data2 = {\'a\': {\'b\': None}} >>> data2[\'a\'][\'b\'][\'c\'] Traceback (most recent call last):... TypeError: \'NoneType\' object is not subscriptable
-
target: 需要提取的dict、json、list或者其他对象。
-
spec: 我们想要的输出
>>> target = {\'galaxy\': {\'system\': {\'planet\': \'jupiter\'}}} >>> spec = \'galaxy.system.planet\' >>> glom(target, spec) \'jupiter\'
2.glom安装
pip install glom from glom import *
3.基本路径提取
-
字符串
-
Path对象
-
T
>>> target = {\'galaxy\': {\'system\': {\'planet\': \'jupiter\'}}} >>> spec = \'galaxy.system.planet\' >>> glom(target, spec) \'jupiter\'
现在数据结构换了,planet变成list了
>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\'}, {\'name\': \'jupiter\'}]}} >>> glom(target, (\'system.planets\', [\'name\'])) [\'earth\', \'jupiter\']
现在要求变了,数据加字段了,output需要多个字段 (多路径单一匹配)
>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1}, {\'name\': \'jupiter\', \'moons\': 69}]}} >>> spec1 =(\'system.planets\', [\'name\']) >>> spec2 = (\'system.planets\', [\'moons\'])} >>> pprint(glom(target, spec1)) [\'earth\', \'jupiter\'] >>> pprint(glom(target, spec2)) [1, 69]
这样写太麻烦了,glom提供了一个合并的方法,使用字典的方式格式化输出
>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},{\'name\': \'jupiter\', \'moons\': 69}]} >>> spec = {\'names\': (\'system.planets\', [\'name\']), \'moons\': (\'system.planets\', [\'moons\'])} >>> pprint(glom(target, spec)) {\'moons\': [1, 69], \'names\': [\'earth\', \'jupiter\']}
现在更复杂了,不仅多了字段,有的数据key也发生了变化 (多路径多匹配)
>>> target1 = {\'system\': {\'dwarf_planets\': [{\'name\': \'pluto\', \'moons\': 5},... {\'name\': \'ceres\', \'moons\': 0}]}} >>> target2 = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},... {\'name\': \'jupiter\', \'moons\': 69}]}} >>> spec = {\'names\': (Coalesce(\'system.planets\', \'system.dwarf_planets\'), [\'name\']),\'moons\': (Coalesce(\'system.planets\', \'system.dwarf_planets\'), [\'moons\'])} >>> pprint(glom(target, spec)) {\'moons\': [1, 69], \'names\': [\'earth\', \'jupiter\']}
>>> target = {\'a\': {\'b\': \'c\', \'d.e\': \'f\', 2: 3}} >>> glom(target, Path(\'a\', 2)) 3 >>> glom(target, Path(\'a\', \'d.e\')) \'f\'
Path支持join
>>> Path(T[\'a\'], T[\'b\'])T[\'a\'][\'b\'] >>> Path(Path(\'a\', \'b\'),Path(\'c\', \'d\')) Path(\'a\', \'b\', \'c\', \'d\')
Path支持切片
>>> path = Path(\'a\', \'b\', 1, 2) >>> path[0] Path(\'a\') >>> path[-2:] Path(1, 2)
具体用法就是将字符串路径我位置替换成相应的Path对象
>>> spec = T[\'a\'][\'b\'][\'c\'] >>> target = {\'a\': {\'b\': {\'c\': \'d\'}}} >>> glom(target, spec) \'d\'
T提取出来的就是对应的python对象,(具体用法待考证)
>>> from glom import T >>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},{\'name\': \'jupiter\', \'moons\': 69}]} >>> spec = T[\'system\'][\'planets\'][-1].values() >>> glom(target, spec) [\'jupiter\', 69] >>> spec = (\'a\', (T[\'b\'].items(), list)) # reviewed below >>> glom(target, spec) [(\'c\', \'d\')]
>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},{\'name\': \'jupiter\', \'moons\': 69}]}} >>> pprint(glom(target, (\'system.planets\', [\'moons\'], sum)})) 70 >>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},{\'name\': \'jupiter\', \'moons\': 69}]}} >>> pprint(glom(target, (\'system.planets\', [\'moons\'], [lambda x: x*2])})) [2, 138]
>>> target = {\'a\': {\'b\': {}}} >>> val = glom(target, Inspect(\'a.b\')) # wrapping a spec ---path: [\'a.b\'] target: {\'a\': {\'b\': {}}} output: {}---