本发明属于生物工程与基因编辑
技术领域:
:,具体涉及一种胞嘧啶单碱基编辑器工具及其应用。
背景技术:
::crispr/cas9系统的发展和应用为生物学和医学的发展做出重大贡献,其作为最经典的基因编辑系统由两部分组成:识别靶基因组靶点的grna(guiderna)序列和具有核酸内切酶活性的cas9。后者在grna引导下,利用自身pid(paminteractingdomain)结构域识别靶基因组的前间隔序列邻近基序(pam,protospaceradjacentmotif;靶基因组上的特定碱基,例如ngg),进而通过自身具有核酸内切酶切活性的ruvc和hnh结构域对双链dna进行切割(anders,c.,etal.,structuralbasisofpam-dependenttargetdnarecognitionbythecas9endonuclease.nature,2014.513(7519):p.569-73),生物体的dna损伤修复应答机制,通过同源重组修复或非同源重组修复将断裂的序列连接起来,实现细胞中目标基因的敲除或者插入(lin,s.,etal.,enhancedhomology-directedhumangenomeengineeringbycontrolledtimingofcrispr/cas9delivery.elife,2014.3:p.e04766)。crispr/cas9系统的编辑效率高,然而切割双链dna会引起相对较多的基因片段插入和缺失且无法准确控制其修复碱基。因此,科学家们在crispr/cas9系统的基础上发明了单碱基编辑工具,包括实现靶位点碱基a/t替换为g/c(a/t-to-g/c)的单碱基编辑器abe(adeninebaseeditor)(gaudelli,n.m.,etal.,programmablebaseeditingofattogcingenomicdnawithoutdnacleavage.nature,2017.551(7681):p.464-471)和实现碱基c/g替换为t/a(c/g-to-t/a)的单碱基编辑器cbe(cytidinebaseeditor)(komor,a.c.,etal.,programmableeditingofatargetbaseingenomicdnawithoutdouble-strandeddnacleavage.nature,2016.533(7603):p.420-4)。单碱基编辑工具主要是由胞苷脱氨酶或腺苷脱氨酶等与cas9nickase(d10a突变,活性结构域ruvc结构域失活)融合形成的对单个碱基实现精确替换的一种碱基编辑工具。cbe首先由哈弗大学davidliu实验室报道,利用来源于大鼠的胞嘧啶脱氨酶(apobec1)与ncas9融合形成的单碱基编辑器be3(apobec-xten-cas9n(d10a)-ugi)(komor,a.c.,etal.,programmableeditingofatargetbaseingenomicdnawithoutdouble-strandeddnacleavage.nature,2016.533(7603):p.420-4)。be3的编辑效率高,能够实现精确的c-to-t转换,且其缺失和插入比例远低于crispr/cas9敲除系统。随后,在be3的基础上进化出be4(apobec-xten-cas9n(d10a)-2*ugi-sv40nls),即在be3的基础上增加了一个尿嘧啶dna糖基化酶抑制剂(ugi,uracildnaglycosylaseinhibitor)和一个核定位信号(nls,nuclearlocalizationsignal)以及改变连接子长度(komor,a.c.,etal.,improvedbaseexcisionrepairinhibitionandbacteriophagemugamproteinyieldsc:g-to-t:abaseeditorswithhigherefficiencyandproductpurity.sciadv,2017.3(8):p.eaao4774)。在be4的基础上,将脱氨酶apobec1替换为anc689apobec和多添加一个核定位信号进化出ancbe4max(sv40nls-anc689apobec-xten-cas9n(d10a)-2*ugi-sv40nls)(koblan,l.w.,etal.,improvingcytidineandadeninebaseeditorsbyexpressionoptimizationandancestralreconstruction.natbiotechnol,2018.36(9):p.843-846)等,使其获得最高效的编辑效率和最低比例的副产物。ancbe4max识别的pam为ngg,对应的编辑窗口是grna范围内5’端的4-8位,其cas9n来源于化脓链球菌(streptococcuspyogenes,spcas9;共计1369个氨基酸)。然而,ancbe4max的靶向窗口和pam限制性(主要识别ngg序列的pam),大幅限制了基因组中可被靶向的范围。鉴于上述问题,科学家开发了一系列通过蛋白质工程和定向进化获得spcas9蛋白突变体,例如spcas9-ng、spry(walton,r.t.,etal.,unconstrainedgenometargetingwithnear-pamlessengineeredcrispr-cas9variants.science,2020.368(6488):p.290-296)、spcas9-vqr、spcas9-eqr(kleinstiver,b.p.,etal.,engineeredcrispr-cas9nucleaseswithalteredpamspecificities.nature,2015.523(7561):p.481-5)、spcas9-hf1(kleinstiver,b.p.,etal.,high-fidelitycrispr-cas9nucleaseswithnodetectablegenome-wideoff-targeteffects.nature,2016.529(7587):p.490-5)等;同时,科学家亦致力于寻找新的cas9蛋白同源物,例如nme2cas9(edraki,a.,etal.,acompact,high-accuracycas9withadinucleotidepamforinvivogenomeediting.molcell,2019.73(4):p.714-726.e4)、sacas9(nishimasu,h.,etal.,crystalstructureofstaphylococcusaureuscas9.cell,2015.162(5):p.1113-26)、st1cas9(zhang,y.,etal.,catalytic-statestructureandengineeringofstreptococcusthermophiluscas9.naturecatalysis,2020.3(10):p.813-823)、xcas9(hu,j.h.,etal.,evolvedcas9variantswithbroadpamcompatibilityandhighdnaspecificity.nature,2018.556(7699):p.57-63)等。在此基础上,开发出基于这些变体的一系列具有各种靶向特性和识别pam的碱基编辑器。这些经典编辑器的编辑窗口主要均为4-8位,并且各类编辑器均存在pam偏好性或部分位点靶向效率低的情况。并且,经典碱基编辑器的表达质粒大小远超出腺病毒的包装范围,不利于临床研究和应用。因此开发不同编辑窗口、不同识别pam和表达质粒更小的新型碱基编辑器,是目前基因编辑应用研究和临床应用的关键。技术实现要素:针对现有碱基编辑器存在的问题,本发明旨在提供一种胞嘧啶单碱基编辑器工具及其应用,能够高效诱导编辑窗口5’端8-14为中c-to-t的转换,且识别的pam为nnaaag,扩大碱基编辑的基因组靶向范围,缩小基因编辑工具的大小,使其更适用于腺病毒的包装范围,具有良好的应用前景。基于上述目的,本发明采用如下技术方案:第一方面,本发明提供一种融合蛋白,包含源于戈登链球菌且经密码子优化的cas9nickase同源蛋白、胞嘧啶脱氨酶和尿嘧啶糖基化酶抑制蛋白;其中,cas9nickase同源蛋白的氨基酸序列为:a):seqidno.1所示的sgocas9d9anickase的n端第2~1136位氨基酸序列;或b):与seqidno.1具有90%以上序列一致性且具有seqidno.1所示氨基酸的功能;或c):与seqidno.1具有通过内含子序列进行氨基酸序列剪切和拼接得到完整融合蛋白功能或具有与seqidno.1部分相同的氨基酸序列、且具有seqidno.1所示氨基酸的功能。源于戈登链球菌且经密码子优化的cas9nickase同源蛋白对应的dna编码序列为:i):如seqidno.2所示的sgocas9d9anickase对应的dna编码序列,该序列为经过密码子优化以后适合于真核生物表达的dna编码序列;或ii):与seqidno.1所示的氨基酸序列具有90%以上序列一致性的氨基酸对应的dna编码序列,且具有seqidno.1所示氨基酸的功能;或iii):与seqidno.2所示的dna序列具有同义密码子的dna序列。进一步地,融合蛋白还包括n端的bpnls-ancapobec1多肽和c端的2*ugi-bpnls多肽;其中,bpnls-ancapobec1多肽由bpnls多肽和ancapobec1多肽融合而成;2*ugi-bpnls多肽由ugi多肽与bpnls多肽融合而成;bpnls-ancapobec1多肽的氨基酸序列为:d):seqidno.3所示的氨基酸序列;或e):与seqidno.3具有90%以上序列一致性,且具有seqidno.3所示氨基酸的功能;2*ugi-bpnls多肽的氨基酸序列为:f):seqidno.4所示的氨基酸序列;或g):与seqidno.4具有90%以上序列一致性,且具有seqidno.4所示氨基酸的功能的氨基酸序列。进一步地,从n端到c端依次包括bpnls多肽、ancapobec1多肽、32aalinker、sgocas9d9anickase的n端第2~1136位氨基酸序列、10aalinker、2*ugi多肽、bpnls多肽;融合蛋白的氨基酸全序列为:h):seqidno.5所示的氨基酸序列;或i):由bpnls多肽、ancapobec1多肽、32aalinker、sgocas9d9anickase的n端第2~1136位氨基酸序列、10aalinker、2*ugi多肽这6个元件经重排或增减实现胞嘧啶碱基编辑为胸腺嘧啶功能的氨基酸序列。或j):与seqidno.5所示氨基酸序列具有90%以上序列一致性,且具有识别nnaaag为pam、将胞嘧啶碱基编辑为胸腺嘧啶功能的氨基酸序列。进一步地,融合蛋白识别nnaaag作为pam,其中n表示任意碱基;融合蛋白在编辑窗口8-14位将胞嘧啶碱基编辑为胸腺嘧啶。进一步地,所述融合蛋白还包括核酸定位信号多肽片段,所述核酸定位信号多肽片段的氨基酸序列如seqidno.8所示。第二方面,本发明提供一种多核苷酸序列,该多核苷酸序列为编码上述融合蛋白的多核苷酸序列,多核苷酸序列如seqidno.6所示。第三方面,本发明提供一种胞嘧啶单碱基编辑器,该编辑器是将编码上述融合蛋白的多核苷酸序列整合到表达载体上得到。进一步地,表达载体为源于戈登链球菌串联重复序列构成的grnascaffold的表达载体,表达载体的核苷酸序列如seqidno.7所示。第四方面,本发明提供一种细胞表达系统,含有权利要求7或8所述胞嘧啶单碱基编辑器;细胞为宿主细胞,宿主细胞为真核细胞或原核细胞。进一步地,细胞为小鼠脑神经瘤细胞、人胚胎肾细胞或人结肠癌细胞。进一步地,小鼠脑神经瘤细胞为n2a细胞,人胚胎肾细胞为hek293t细胞,人结肠癌细胞为hct116细胞。与现有技术相比,本发明的有益效果如下:本发明在ancbe4max的基础上将cas9n蛋白替换成sgocas9n,其潜在靶向识别的pam为nnaaag,同时将grna表达载体的scaffold替换成其来源于戈登链球菌串联重复序列设计的grnascaffold,共同组成新型的高效单碱基编辑器sgo-ancbe4max,编辑范围为靶向序列5′端8~14位的胞嘧啶,编辑系统能够将胞嘧啶转变为胸腺嘧啶(c-to-t),拓宽了碱基编辑的靶向范围,并且本发明的碱基编辑工具的蛋白大小可适用于腺病毒的包装要求。附图说明图1为sgo蛋白的结构域示意图;图2为sgo-ancbe4max的蛋白的结构域示意图;图3为sgo-ancbe4max的质粒结构示意图;图4为sgo-ancbe4max系统的grna的质粒结构示意图;图5为sgo-ancbe4max为本发明实施例3实验结果示意图,其中,sgo-ancbe4max grna共同转染hek293t细胞以后的sanger测序结果,图上方为靶向dna序列示意图和pam序列,图右方为4个对应靶向位点的编辑结果效率统计图;图6为sgo-ancbe4max编辑系统在hek293t细胞中的编辑效率统计热图;图7为sgo-ancbe4max编辑系统在hek293t细胞中,合计10个位点的编辑效率统计图的柱状图进行归一化处理以后的统计图,虚线框为编辑窗口示意图;图8为sgo-ancbe4max编辑系统在hct116细胞中的编辑效率统计图;图9为n2a细胞中的sgocas9-ancbe4max系统的编辑效率图。具体实施方式为更好地说明本发明的目的、技术方案和优点,下面将结合具体实施例对本发明作进一步说明。本领域技术人员应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。实施例中所用的试验方法如无特殊说明,均为常规方法;所用的材料、试剂等,如无特殊说明,均可从商业途径得到。实施例1sgocas9-ancbe4max质粒的构建本实施例提供sgocas9-ancbe4max质粒的构建方法,具体步骤如下:首先,利用氨基酸序列比对工具clustalw2将戈登链球菌(streptococcus_gordonii_str._challis_substr._ch1;简称sgo)来源的cas9蛋白同源物sgocas9与spcas9进行氨基酸序列比对,划分出sgocas9的功能结构域(如图1所示),并找出sgocas9与spcas9的ruvc域功能位点,其中spcas9的ruvc域功能位点的第10位为天冬氨酸d10,而sgocas9的ruvc域功能位点的第10位为天冬氨酸d9,将天冬氨酸d9突变成丙氨酸(a),从而获得sgocas9d9anickase,sgocas9d9anickase的第2~1136位氨基酸序列seqidno.1所示。其次,将戈登链球菌sgocas9d9a的原核密码子进行真核优化,从而获得适合真核细胞表达的sgocas9d9a的编码dna序列,如seqidno.2所示。优化以后的sgocas9d9a商业公司全基因合成。构建策略是在ancbe4max的基础上,将ancbe4max的spcas9d10a替换为sgocas9d9a,其中ancbe4max由商业公司全基因合成。为了减少pcr引入点突变,将ancbe4max的部分32aalinker-spcas9d10a-10aalinker-ugi通过内切酶bamhi酶切除,然后在商业公司合成sgocas9d9a时补上ancbe4max被切除的部分即32aalinker-sgocas9d9a-10aalinker-ugi,32aalinker-sgocas9d9a-10aalinker-ugi的核苷酸序列如seqidno.9所示,序列的接头包含内切酶bamhi酶切位点。通过限制性内切酶bamhi(r0136l)酶切ancbe4max(载体为pcmv)质粒,酶切反应的条件是37℃的水浴酶切2h,酶切体系(50μl)为:10xbuffer:5μl,载体:5μg,bamhi酶:3μl,ddh2o:加至50μl;通过凝胶电泳鉴定是否酶切完全;酶切完全后利用cleanup试剂盒(axypreppcr清洁试剂盒)纯化线性化载体,用15μlddh2o洗脱。将合成的32aalinker-sgocas9d9a-10aalinker-ugi进行pcr扩增,并在两端酶切位点外引入保护碱基,利用由金唯智生物科技有限公司合成的pcr引物进行pcr反应扩增载体片段,其中,sgopcr正向引物序列为:agcggaggatcctctggcagcgagacacca;sgopcr反向引物序列为:cctccggatcctccgctcagcatcttgatctta。并利用cleanup试剂盒(axypreppcr清洁试剂盒)纯化。纯化以后的pcr产物进行bamh1酶切反应,酶切体系参照上述体系。将纯化的32aalinker-sgocas9d9a-10aalinker-ugi与bamh1线性化载体pcmv_ancbe4max酶连获得初步连接产物。连接体系(10μl):纯化线性化载体pcmv_ancbe4max:1μl(50ng),32aalinker-sgocas9d9a-10aalinker-ugibamh1酶切产物:1μl(100ng),t4dnaligasebuffer:1μl,t4dnaligase:1μl,ddh2o:6μl。酶连条件:16℃连接2h。酶连产物转化后涂板,挑取单克隆摇菌测序和克隆鉴定,构建得到sgocas9-ancbe4max蛋白即融合蛋白,其全氨基酸序列如seqidno.5所示,其dna序列如seqidno.6所示。构建得到的融合蛋白的结构示意图如图2所示,融合蛋白依次包括bpnls、ancapobec1多肽片段、32aalinker、sgocas9d9anickase的n端第2~1136个氨基酸组成的多肽片段、10aalinker、2*ugi多肽和bpnls多肽序列);sgocas9-ancbe4max质粒结构图谱如图3所示,包括质粒结构域(即融合蛋白)和氨苄青霉素多肽序列。将克隆鉴定为阳性的单克隆菌液扩大培养,按照试剂盒步骤抽提质粒(tiangen:tianpuremidiplasmidkit)并测浓度,确保转染时用量足够且没有盐和蛋白等杂质污染。实施例2sgocas9-ancbe4max系统grna质粒的构建本实施例提供以一种sgocas9-ancbe4max系统grna质粒的构建方法,包括如下步骤:2.1sgocas9-ancbe4max系统grna质粒的载体构建以pgl3-u6-sgrna(addgene#51133)为表达骨架,构建适用于sgocas9grna编辑系统的grna表达载体。根据戈登链球菌来源的串联重复序列,设计适用于sgocas9grna作用系统的scaffold序列,将pgl3-u6-sgrna(addgene#51133)的scaffold(适用于spcas9)替换为sgocas9grnascaffold,构建成功的完整质粒如seqidno.7所示,命名为pgl3-u6-sgogrna,其质粒结构示意图见图4。连接入靶向grna序列的酶切位点为两个bsai,质粒由商业公司全基因合成。2.2sgocas9-ancbe4max系统靶向grna质粒的构建设计grna并合成两条互补配对的oligos,上游序列为:5’-accg-24nt-3’,下游序列为:5’-aaac-24nt-3’(24nt可替换序列与上游序列互补配对),上游序列为24nt-nnaaag(pam所在dna链),上下游序列通过程序(95℃,5min;95℃-85℃at-2℃/s;85℃-25℃at-0.1℃/s;holdat4℃)退火,连接到经过bsai(neb:r0539l)线性化的pgl3-u6-sgogrna载体上。线性化酶切体系如下所示:pgl3-u6-sgogrna2μg;buffer(neb:r0539l)6μl;bsai2μl;ddh2o补齐到60μl。37℃酶切过夜。连接体系如下:t4连接buffer(neb:m0202l)1μl,线性化载体20ng,退火的oligo片段(10μm)5μl,t4dna连接酶(neb:m0202l)0.5μl,ddh2o补齐到10μl。16℃连接过夜。连接的载体通过转化,挑菌和鉴定。对阳性克隆扩增提取质粒(axygene:ap-mn-p-250g)并测定浓度。实施例3基因编辑工具sgocas9-ancbe4max的基因编辑效果试验挑选人内源基因emx1、fancf、cdkn2a、cftr、dnmt1、dyrk1a、runx1和vegfa等,共设计10条grna,合成20条oligos,序列见表1。表1分别基于emx1、fancf、cdkn2a、cftr、dnmt1、dyrk1a、runx1和vegfa合成的20条oligos利用实施例1和实施例2构建的sgocas9-ancbe4max质粒和pgl3-u6-sgogrna质粒(序列表1合成的20条oligos退火并与线性化的pgl3-u6-sgogrna载体酶连后构建的grna质粒:sgsgo-1,sgsgo-2,sgsgo-3,sgsgo-4,sgsgo-5,sgsgo-6,sgsgo-7,sgsgo-8,sgsgo-9和sgsgo-10)构成的碱基编辑系统转染hek293t细胞,过程如下:3.1hek293t细胞(来自atcc)复苏,在10cm培养皿(corning,430167)中培养,培养基为混有10%的胎牛血清(hyclone,sv30087)的dmem(hyclone,sh30243.01)。培养温度为37℃,二氧化碳浓度为5%。多次传代后当细胞密度为90%时,细胞分盘至24孔板(杰特生物)。3.2hek293t细胞复苏三代后观察细胞状态,将状态良好的细胞铺板24孔板中,铺板细胞培养18~24h后,当细胞浓度为80%时对其进行转染,转染过程中各成分的用量:sgocas9-ancbe4max质粒1μg,pgl3-u6-sgogrna质粒:0.5μg,eztrans转染试剂(李记生物)4.5μl。3.3具体转染步骤(同上海李记生物eztrans转染试剂高效版步骤)为:3.3.1配置a试剂:对于每孔细胞,将1.5μg质粒dna(1μgsgocas9-ancbe4max质粒 0.5μgpgl3-u6-sgogrna质粒)稀释到50μl无血清无双抗的高糖dmem培养基(或者opti-mem培养基),混匀。3.3.2配置b试剂:对于每孔细胞,将4.5μleztrans转染试剂(eztrans:质粒dna=2:1)稀释到50μl无血清无双抗的高糖dmem培养基(或者opti-memⅰ培养基),轻轻混匀。(不能使用含血清的培养基稀释质粒和eztrans转染试剂,因为血清含有大量的带负电蛋白质,可能干扰转染试剂对核酸的吸附,从而影响转染效率)3.3.3a试剂和b试剂同时静置5min,将b试剂尽快全部加入到a试剂中,轻轻混匀。(混合的顺序不能颠倒进行)3.3.4室温静置15min,以形成eztrans-dna复合物。将配置好的eztrans-dna转染复合物全部均匀滴入到含细胞的培养皿中,轻轻晃动培养皿或轻微振荡,让eztrans-dna复合物分散均匀。3.3.5在37℃,5%co2培养箱培养4~6h,去除含eztrans-dna复合物的培养液,更换新的培养液,培养3天。3.4转染的细胞培养3天后用胰酶消化细胞获取细胞,进一步通过流式分选获取gfp阳性的细胞(fitc荧光强度top15%),收取的细胞利用酚氯仿法抽取基因组dna。3.5以选取的内源基因靶向位点上下游各100~130bp分别设计并合成pcr引物,加水稀释至10μm。用诺唯赞高保真酶试剂盒(vazyme,p501-d2)pcr扩增各基因组靶向位点片段。pcr产物样品用axyprepdna凝胶回收试剂盒(axygen,ap-gx-250g)做割胶回收,去除非特异性条带。pcr引物序列如表2所示。表2pcr引物序列表3.6通过凝胶电泳初步鉴定目的片段是否扩增成功,扩增成功的目的片段进行sanger测序,分析测序结果观察靶位点是否存在特定碱基点突变(c-to-t或g-to-a)。测序结果见附图5,左图第一行为靶向dna序列示意图;第二行为靶向基因编辑的实验结果,箭头指示为c-to-t编辑位置;右图为grna范围内不同位置c-to-t的编辑效率统计结果。此图共展示了4个编辑位点的编辑结果,由图5可见,本发明获得的基因编辑工具sgocas9-ancbe4max可导致高效的c-to-t转换。实施例44.1利用实施例1和实施例2构建的sgocas9-ancbe4max质粒和pgl3-u6-sgogrna质粒构成的碱基编辑系统转染hek293t细胞,共检测了sgocas9-ancbe4max在10个人类基因组位点的编辑效率,编辑效率统计如图6所示,结果显示,sgocas9-ancbe4max碱基编辑系统实现了5’端1~24位中c-to-t的不同程度的转换。4.2将每条grna范围内的编辑效率进行归一化处理,即将编辑效率最高的点normalize为1,进而统计其它胞嘧啶碱基的编辑效率,并将grna范围内的所有点进行统计,从而统计出sgocas9-ancbe4max碱基编辑系统在grna范围内的高效编辑窗口如图7所示。结果显示,本实施例获得的出sgocas9-ancbe4max碱基编辑系统的编辑范围在grna5’端的8~14位,证明通过寻找cas9同源蛋白构建新型碱基编辑工具和系统的可能性,并获得了高效用于内源性基因编辑的具有不同靶向范围(pam为nnaaag)和编辑窗口的胞嘧啶碱基编辑器。实施例5利用实施例1和实施例2构建的sgocas9-ancbe4max质粒和pgl3-u6-sgogrna质粒构成的碱基编辑系统转染人结肠癌hct116细胞和鼠神经瘤n2a细胞,过程如下:n2a细胞(来自atcc)复苏,在10cm培养皿(corning,430167)中培养,培养基为混有10%的胎牛血清(hyclone,sv30087)的dmem(hyclone,sh30243.01)。培养温度为37℃,二氧化碳浓度为5%。多次传代后当细胞密度为90%时,细胞分盘至24孔板。hct116细胞(来自atcc)复苏,在10cm培养皿(corning,430167)中培养,培养基为混有10%的胎牛血清(hyclone,sv30087)的ripm1640培养基(gibco,11875093)。培养温度为37℃,二氧化碳浓度为5%。多次传代后当细胞密度为90%时,细胞分盘至24孔板。细胞转染方案、细胞分选方案、编辑效率方案同上述实施例3和实施例4。hct116细胞中的sgocas9-ancbe4max系统的编辑效率见图8,靶向序列为:sgo-1/-2/-3/-6/-8(见附表1)。n2a细胞中的sgocas9-ancbe4max系统的编辑效率见图9,靶向序列为:ggaactcgatcgcatcattgcatg。由图8和图9可见,sgocas9-ancbe4max碱基编辑系统在hct116和n2a细胞中亦可导致高效的c-to-t转换;图8横坐标显示5个人类基因组位点的编辑效率最高的碱基位置,纵坐标为c-to-t的转换效率;图9箭头指示位置即为n2a的编辑位点,通过峰图可以看出高效的c-to-t转换。综上所述,本发明有效克服了现有技术中的碱基编辑工具适用范围限制性,包括pam和编辑窗口限制性,并且本发明涉及的胞嘧啶碱基编辑器的尺寸小于经典spcas9介导的碱基编辑器(共计1710个氨基酸),可能适用于慢病毒或腺病毒的包装和应用,具有高度产业利用价值。最后所应当说明的是,以上实施例仅用以说明本发明的技术方案而非对本发明保护范围的限制,尽管参照较佳实施例对本发明作了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的实质和范围。sequencelisting<110>广州大学<120>一种胞嘧啶单碱基编辑器工具及其应用<130>pzs215311-<160>9<170>patentinversion3.5<210>1<211>1135<212>prt<213>sgocas9d9anickase<400>1asnglyleuvalleuglyleualaileglyilealaservalglyval151015glyileleuglulysaspthrglylysileilehisalaserserarg202530leupheproalaalathralaaspasnasnvalgluargargserasn354045argglnglyargargleuasnargarglyslyshisargservalarg505560leuglnaspleuphegluglytyrglyleuleuthrasppheserlys65707580valsermetasnleuasnprotyrglnleuargvalglnglymetglu859095asnglnleuthrasnglugluleuphevalalaleulysasnileval100105110lysargargglyilesertyrleuaspaspalasergluaspglygly115120125thrvalserserasptyrglylysalavalglugluasnarglysleu130135140leualaglulysthrproglyglnileglnleugluargpheglulys145150155160tyrglyglnleuargglyaspphethrvalglugluasnglyglulys165170175hisargleuileasnvalpheserthrseralatyrarglysgluala180185190gluargileleuarglysglnglnglupheasnserlysilethrasp195200205glupheilegluasptyrleuileileleuthrglylysarglystyr210215220tyrhisglyproglyasnglulysserargthrasptyrglyargphe225230235240argthraspglythrthrleuaspasnilepheglyileleuilegly245250255lyscysthrphetyrthrgluglutyrargalaserlysalasertyr260265270thralaglnglupheasnleuleuasnaspleuasnasnleuthrval275280285prothrgluthrlyslysleuserglugluglnlyslysleuileile290295300glutyralalysseralalysthrleuglyalaserthrleuleulys305310315320tyrilealalysmetileaspalaservalaspglnileargglytyr325330335argvalaspvalasnasnlysproglumethisthrphegluvaltyr340345350arglysmetglnserleugluthrilelysvalglugluleuproarg355360365lysvalleuaspgluleualahisileleuthrleuasnthrgluarg370375380gluglyilegluglualaileasnserlysleulysaspilepheasn385390395400argaspglnvalleugluleuvalglnphearglysasnasnserser405410415leupheserlysglytrphisasnpheserilelysleumetmetglu420425430leuileprogluleutyrgluthrserglugluglnmetthrileleu435440445thrargleuglylysglnargserlysgluthrserlysargthrlys450455460tyrileaspglulysgluleuthrglugluiletyrasnprovalval465470475480alalysservalargglnalailelysileileasnglualathrlys485490495lystyrglyilepheaspasnilevalileglumetalaarggluasn500505510asnglugluaspalalyslysasptyrilelysargglnlysalaasn515520525glnaspglulysasnalaalametglulysalaalapheglntyrasn530535540glylyslysgluleuproaspasnilephehisglyhislysgluleu545550555560thrthrlysileargleutrphisglnglnglyglulyscysleutyr565570575thrglylysasnileproileseraspleuilehisasnglntyrlys580585590tyrgluileasphisileleuproleuserleuserpheaspaspser595600605leuserasnlysvalleuvalleualathralaasnglnglulysgly610615620glnargthrpropheglnalaleuaspsermetaspaspalatrpser625630635640tyrarggluphelyssertyrvallysaspserlysleuleuserasn645650655lyslyslysasptyrleuleuthrglugluaspileserlysileglu660665670vallysglnlyspheilegluargasnleuvalaspthrargtyrser675680685serargvalvalleuasnalaleuglnaspphetyrlysserhisgln690695700leuaspthrthrileservalvalargglyglnphethrserglnleu705710715720argarglystrpglyileglulysserarggluthrtyrhishishis725730735alavalaspalaleuileilealaalaserserglnleuargleutrp740745750lyslyshisserasnproleuilealatyrlysgluglyglnpheval755760765aspsergluthrglygluilevalserleuseraspgluglutyrlys770775780gluleuvalphelysalaprotyrasphisphevalaspthrleuarg785790795800serlyslysphegluaspserileleuphesertyrglnvalaspser805810815lystyrasnarglysileseraspalathriletyralathrarglys820825830alalysleuasplysglulyslysglutyrthrtyrthrleuglylys835840845ilelysaspiletyralaleuglythrlysthrproserlysthrgly850855860phetyrlyspheleuaspleutyrlysthrasplysserglnpheleu865870875880mettyrglnlysasparglysthrtrpaspgluvalileglulysile885890895ilegluglntyrargprophelysglutyrasplysasnglylysglu900905910valasppheasnpropheglulystyrargileglyasnglyproile915920925arglystyrserlyslysglyasnglyprogluilelysserleulys930935940tyrtyraspileleuleuglylyshislysasnilethrproaspgly945950955960serargasnthrvalalaleuleuserleuasnprotrpargthrasp965970975valtyrtyrasnsergluthrlyslystyrglupheleuglyleulys980985990tyralaaspleucyspheglugluglyglyalatyrglyileserglu99510001005vallystyrlyslysileargglulysgluglyileglylysasn101010151020sergluphelysphethrleutyrlysasnaspleuileleuile102510301035lysaspthrgluthrasncysglnglnphepheargphetrpser104010451050argthrglylysaspasnprolysserpheglulyshislysile105510601065gluleulysprotyrglulysalalyspheglulysglygluglu107010751080leulysvalleuglylysvalproproserserasnglnphegln108510901095lysasnmetglnilegluasnleuseriletyrlysvallysthr110011051110aspileleuglyasnlyshispheilelyslysgluglyaspglu111511201125prolysleulysphelyslys11301135<210>2<211>3405<212>dna<213>sgocas9d9anickase<400>2aacggcctggtgctgggcctggccatcggcatcgcctctgtgggcgtgggcatcctggag60aaagacactggcaagatcattcacgcttcgagcagactgttcccagccgccacagccgac120aacaatgtggagagacggagcaatagacagggcagacggctaaaccggcggaaaaagcac180agatccgtgcggctgcaggacctgtttgaaggatacggcctgctgacagacttcagcaag240gtgtccatgaacctgaatccctaccagctgcgggtgcagggaatggaaaaccagctgacc300aacgaggagctgttcgtggccctgaagaatatcgtgaagagaagaggcatcagctacctg360gacgatgccagcgaggacggcggcaccgtgagtagcgactacggcaaggctgtggaagaa420aacagaaaactgctggcggaaaagacgcccggccaaatccagctggaacgcttcgagaag480tatggccagctgagaggcgacttcaccgtggaagaaaatggcgagaagcatagactgatc540aacgtgttcagcaccagcgcctacagaaaggaagctgaacggatcctgcggaagcagcag600gagttcaacagcaagatcacagacgagtttattgaggactacctgatcatcctgacagga660aaacggaagtactaccacggacctggcaacgagaagagcagaaccgactacggcagattc720agaaccgacggcaccaccctggacaacatcttcggcatcctgattggaaagtgtacattc780tacaccgaagagtatcgggcctctaaggccagctacacagcccaggagttcaacctgctc840aacgatctgaacaacctgaccgtgcctaccgagacaaagaaactgagcgaggagcagaag900aagctgatcatcgagtacgccaaatctgccaagaccctcggcgccagcaccctgctgaaa960tatatcgccaaaatgatcgacgccagcgtcgaccagatcagaggctaccgggtggacgtg1020aacaacaagcccgagatgcacaccttcgaggtctaccgaaagatgcagagcctggaaaca1080atcaaggtggaagaactgcctagaaaggtcctggatgaactggcccacatcctcaccctg1140aataccgagagagagggcatcgaggaggccatcaacagcaagctgaaggacatcttcaac1200cgcgaccaggtgctggagctggtgcagttcagaaagaacaacagcagtctgttctccaag1260ggatggcacaacttcagcatcaagctgatgatggaactgatcccagagctgtatgaaaca1320tccgaagaacagatgaccatcctgacaagactgggcaaacagcgttctaaggagacctct1380aagcggaccaaatacatcgatgagaaagaactgaccgaggagatctataaccccgtggtg1440gccaaaagcgtccggcaggccatcaagatcatcaacgaggccactaagaagtacggcatt1500ttcgacaacatcgtgatcgagatggccagagaaaacaacgaagaagatgccaagaaagat1560tatattaaaaggcaaaaagctaatcaagatgaaaagaacgccgccatggaaaaggctgca1620ttccagtacaatggcaagaaggaactgcctgataatatctttcacggccacaaggagctg1680acaacaaaaattcggctgtggcaccagcagggagaaaagtgcctgtacaccggaaagaat1740atccctatctctgatcttattcacaaccagtacaagtacgagatcgaccacatcctgccc1800ctgtccctgagctttgacgactctctgagcaacaaggttctggttctggccaccgccaac1860caggagaagggccaaagaactcctttccaggccctggacagcatggacgacgcctggagc1920tacagagagttcaagagctacgtgaaagactctaaactgctgtctaacaagaagaaagac1980tacctgttgacagaggaggatatctccaagatcgaggtcaagcagaaattcatcgagaga2040aatctggtggataccagatacagctccagagtggttctgaatgcccttcaagacttctac2100aagagccaccagctggacaccaccatctcagtggtgcggggccagtttaccagccagctg2160cggagaaagtggggcatcgagaaaagcagggaaacctaccaccaccatgccgtagacgct2220cttatcattgctgcctctagccagctgcggctgtggaagaagcacagcaaccctctgatc2280gcctataaggagggccagtttgtggacagcgagacaggcgagatcgtgtctctgtccgac2340gaagaatacaaggaactggtgtttaaggccccttacgatcactttgtggataccctgaga2400agcaagaaattcgaagatagcatcctgtttagctatcaagtggattctaagtacaacaga2460aagatctccgatgcaacaatctacgcgaccaggaaggctaagctggataaggaaaagaag2520gagtacacatacaccctcggaaagatcaaagatatctacgccctgggcacaaagacccct2580tccaagaccggattctacaagttcctggacctgtacaagaccgataagagccagttcctg2640atgtaccaaaaggatagaaagacctgggacgaggtgatcgagaaaatcatcgagcagtac2700cggccttttaaggagtacgacaagaacggcaaagaggtggatttcaaccccttcgagaag2760tacagaatcggcaatggccccatccggaaatacagcaagaagggcaacggacctgagatc2820aagagtctgaaatattacgacatcctgctgggcaaacacaagaacatcactcctgacgga2880tctagaaacaccgtggccctgctgagcctgaacccttggagaacagacgtgtactacaac2940agcgaaacaaagaagtacgagttcctgggactcaagtacgccgacctgtgcttcgaagag3000ggcggagcctacggcatcagcgaggtgaagtacaagaagatcagagaaaaggagggcatc3060ggcaagaatagcgagttcaagttcaccctgtacaagaacgacctgattctgatcaaggac3120accgaaaccaactgccagcagttcttcagattctggagcagaaccggtaaggacaaccct3180aaatctttcgaaaagcataagatcgagctgaagccttacgagaaagccaagttcgagaaa3240ggcgaggagctaaaagtgctgggcaaggtgccaccttcttccaaccagtttcagaagaac3300atgcaaatcgagaacttgagcatctacaaggtcaagacagacatcctgggtaacaaacac3360tttatcaaaaaggagggagatgaacccaagctcaagttcaagaag3405<210>3<211>235<212>prt<213>bpnls-ancapobec1<400>3prolyslyslysarglysvalsersergluthrglyprovalalaval151015aspprothrleuargargargilegluprohisgluphegluvalphe202530pheaspproarggluleuarglysgluthrcysleuleutyrgluile354045lystrpglythrserhislysiletrparghisserserlysasnthr505560thrlyshisvalgluvalasnpheileglulysphethrsergluarg65707580hisphecysproserthrsercysserilethrtrppheleusertrp859095serprocysglyglucysserlysalailethrglupheleusergln100105110hisproasnvalthrleuvaliletyrvalalaargleutyrhishis115120125metaspglnglnasnargglnglyleuargaspleuvalasnsergly130135140valthrileglnilemetthralaproglutyrasptyrcystrparg145150155160asnphevalasntyrproproglylysglualahistrpproargtyr165170175proproleutrpmetlysleutyralaleugluleuhisalaglyile180185190leuglyleuproprocysleuasnileleuargarglysglnprogln195200205leuthrphephethrilealaleuglnsercyshistyrglnargleu210215220proprohisileleutrpalathrglyleulys225230235<210>4<211>197<212>prt<213>2*ugi-bpnls<400>4thrasnleuseraspileileglulysgluthrglylysglnleuval151015ileglngluserileleumetleuproglugluvalglugluvalile202530glyasnlysprogluseraspileleuvalhisthralatyraspglu354045serthraspgluasnvalmetleuleuthrseraspalaproglutyr505560lysprotrpalaleuvalileglnaspserasnglygluasnlysile65707580lysmetleuserglyglyserglyglyserglyglyserthrasnleu859095seraspileileglulysgluthrglylysglnleuvalileglnglu100105110serileleumetleuproglugluvalglugluvalileglyasnlys115120125progluseraspileleuvalhisthralatyraspgluserthrasp130135140gluasnvalmetleuleuthrseraspalaproglutyrlysprotrp145150155160alaleuvalileglnaspserasnglygluasnlysilelysmetleu165170175serglyglyserlysargthralaaspglysergluphegluprolys180185190lyslysarglysval195<210>5<211>1609<212>prt<213>sgocas9-acnbe4max<400>5prolyslyslysarglysvalsersergluthrglyprovalalaval151015aspprothrleuargargargilegluprohisgluphegluvalphe202530pheaspproarggluleuarglysgluthrcysleuleutyrgluile354045lystrpglythrserhislysiletrparghisserserlysasnthr505560thrlyshisvalgluvalasnpheileglulysphethrsergluarg65707580hisphecysproserthrsercysserilethrtrppheleusertrp859095serprocysglyglucysserlysalailethrglupheleusergln100105110hisproasnvalthrleuvaliletyrvalalaargleutyrhishis115120125metaspglnglnasnargglnglyleuargaspleuvalasnsergly130135140valthrileglnilemetthralaproglutyrasptyrcystrparg145150155160asnphevalasntyrproproglylysglualahistrpproargtyr165170175proproleutrpmetlysleutyralaleugluleuhisalaglyile180185190leuglyleuproprocysleuasnileleuargarglysglnprogln195200205leuthrphephethrilealaleuglnsercyshistyrglnargleu210215220proprohisileleutrpalathrglyleulysserglyglyserser225230235240glyglyserserglysergluthrproglythrsergluseralathr245250255progluserserglyglyserserglyglyserasnglyleuvalleu260265270glyleualaileglyilealaservalglyvalglyileleuglulys275280285aspthrglylysileilehisalaserserargleupheproalaala290295300thralaaspasnasnvalgluargargserasnargglnglyargarg305310315320leuasnargarglyslyshisargservalargleuglnaspleuphe325330335gluglytyrglyleuleuthrasppheserlysvalsermetasnleu340345350asnprotyrglnleuargvalglnglymetgluasnglnleuthrasn355360365glugluleuphevalalaleulysasnilevallysargargglyile370375380sertyrleuaspaspalasergluaspglyglythrvalserserasp385390395400tyrglylysalavalglugluasnarglysleuleualaglulysthr405410415proglyglnileglnleugluargpheglulystyrglyglnleuarg420425430glyaspphethrvalglugluasnglyglulyshisargleuileasn435440445valpheserthrseralatyrarglysglualagluargileleuarg450455460lysglnglnglupheasnserlysilethraspglupheilegluasp465470475480tyrleuileileleuthrglylysarglystyrtyrhisglyprogly485490495asnglulysserargthrasptyrglyargpheargthraspglythr500505510thrleuaspasnilepheglyileleuileglylyscysthrphetyr515520525thrgluglutyrargalaserlysalasertyrthralaglngluphe530535540asnleuleuasnaspleuasnasnleuthrvalprothrgluthrlys545550555560lysleuserglugluglnlyslysleuileileglutyralalysser565570575alalysthrleuglyalaserthrleuleulystyrilealalysmet580585590ileaspalaservalaspglnileargglytyrargvalaspvalasn595600605asnlysproglumethisthrphegluvaltyrarglysmetglnser610615620leugluthrilelysvalglugluleuproarglysvalleuaspglu625630635640leualahisileleuthrleuasnthrgluarggluglyilegluglu645650655alaileasnserlysleulysaspilepheasnargaspglnvalleu660665670gluleuvalglnphearglysasnasnserserleupheserlysgly675680685trphisasnpheserilelysleumetmetgluleuileprogluleu690695700tyrgluthrserglugluglnmetthrileleuthrargleuglylys705710715720glnargserlysgluthrserlysargthrlystyrileaspglulys725730735gluleuthrglugluiletyrasnprovalvalalalysservalarg740745750glnalailelysileileasnglualathrlyslystyrglyilephe755760765aspasnilevalileglumetalaarggluasnasnglugluaspala770775780lyslysasptyrilelysargglnlysalaasnglnaspglulysasn785790795800alaalametglulysalaalapheglntyrasnglylyslysgluleu805810815proaspasnilephehisglyhislysgluleuthrthrlysilearg820825830leutrphisglnglnglyglulyscysleutyrthrglylysasnile835840845proileseraspleuilehisasnglntyrlystyrgluileasphis850855860ileleuproleuserleuserpheaspaspserleuserasnlysval865870875880leuvalleualathralaasnglnglulysglyglnargthrprophe885890895glnalaleuaspsermetaspaspalatrpsertyrarggluphelys900905910sertyrvallysaspserlysleuleuserasnlyslyslysasptyr915920925leuleuthrglugluaspileserlysilegluvallysglnlysphe930935940ilegluargasnleuvalaspthrargtyrserserargvalvalleu945950955960asnalaleuglnaspphetyrlysserhisglnleuaspthrthrile965970975servalvalargglyglnphethrserglnleuargarglystrpgly980985990ileglulysserarggluthrtyrhishishisalavalaspalaleu99510001005ileilealaalaserserglnleuargleutrplyslyshisser101010151020asnproleuilealatyrlysgluglyglnphevalaspserglu102510301035thrglygluilevalserleuseraspgluglutyrlysgluleu104010451050valphelysalaprotyrasphisphevalaspthrleuargser105510601065lyslysphegluaspserileleuphesertyrglnvalaspser107010751080lystyrasnarglysileseraspalathriletyralathrarg108510901095lysalalysleuasplysglulyslysglutyrthrtyrthrleu110011051110glylysilelysaspiletyralaleuglythrlysthrproser111511201125lysthrglyphetyrlyspheleuaspleutyrlysthrasplys113011351140serglnpheleumettyrglnlysasparglysthrtrpaspglu114511501155valileglulysileilegluglntyrargprophelysglutyr116011651170asplysasnglylysgluvalasppheasnpropheglulystyr117511801185argileglyasnglyproilearglystyrserlyslysglyasn119011951200glyprogluilelysserleulystyrtyraspileleuleugly120512101215lyshislysasnilethrproaspglyserargasnthrvalala122012251230leuleuserleuasnprotrpargthraspvaltyrtyrasnser123512401245gluthrlyslystyrglupheleuglyleulystyralaaspleu125012551260cyspheglugluglyglyalatyrglyilesergluvallystyr126512701275lyslysileargglulysgluglyileglylysasnsergluphe128012851290lysphethrleutyrlysasnaspleuileleuilelysaspthr129513001305gluthrasncysglnglnphepheargphetrpserargthrgly131013151320lysaspasnprolysserpheglulyshislysilegluleulys132513301335protyrglulysalalyspheglulysglyglugluleulysval134013451350leuglylysvalproproserserasnglnpheglnlysasnmet135513601365glnilegluasnleuseriletyrlysvallysthraspileleu137013751380glyasnlyshispheilelyslysgluglyaspgluprolysleu138513901395lysphelyslysserglyglyserglyglyserglyglyserthr140014051410asnleuseraspileileglulysgluthrglylysglnleuval141514201425ileglngluserileleumetleuproglugluvalglugluval143014351440ileglyasnlysprogluseraspileleuvalhisthralatyr144514501455aspgluserthraspgluasnvalmetleuleuthrseraspala146014651470proglutyrlysprotrpalaleuvalileglnaspserasngly147514801485gluasnlysilelysmetleuserglyglyserglyglysergly149014951500glyserthrasnleuseraspileileglulysgluthrglylys150515101515glnleuvalileglngluserileleumetleuproglugluval152015251530glugluvalileglyasnlysprogluseraspileleuvalhis153515401545thralatyraspgluserthraspgluasnvalmetleuleuthr155015551560seraspalaproglutyrlysprotrpalaleuvalileglnasp156515701575serasnglygluasnlysilelysmetleuserglyglyserlys158015851590argthralaaspglysergluphegluprolyslyslysarglys159516001605val<210>6<211>4827<212>dna<213>sgocas9-acnbe4max<400>6ccaaagaagaagcggaaagtcagcagtgaaaccggaccagtggcagtggacccaaccctg60aggagacggattgagccccatgaatttgaagtgttctttgacccaagggagctgaggaag120gagacatgcctgctgtacgagatcaagtggggcacaagccacaagatctggcgccacagc180tccaagaacaccacaaagcacgtggaagtgaatttcatcgagaagtttacctccgagcgg240cacttctgcccctctaccagctgttccatcacatggtttctgtcttggagcccttgcggc300gagtgttccaaggccatcaccgagttcctgtctcagcaccctaacgtgaccctggtcatc360tacgtggcccggctgtatcaccacatggaccagcagaacaggcagggcctgcgcgatctg420gtgaattctggcgtgaccatccagatcatgacagccccagagtacgactattgctggcgg480aacttcgtgaattatccacctggcaaggaggcacactggccaagatacccacccctgtgg540atgaagctgtatgcactggagctgcacgcaggaatcctgggcctgcctccatgtctgaat600atcctgcggagaaagcagccccagctgacatttttcaccattgctctgcagtcttgtcac660tatcagcggctgcctcctcatattctgtgggctacaggcctgaagtctggaggatctagc720ggaggatcctctggcagcgagacaccaggaacaagcgagtcagcaacaccagagagcagt780ggcggcagcagcggcggcagcaacggcctggtgctgggcctggccatcggcatcgcctct840gtgggcgtgggcatcctggagaaagacactggcaagatcattcacgcttcgagcagactg900ttcccagccgccacagccgacaacaatgtggagagacggagcaatagacagggcagacgg960ctaaaccggcggaaaaagcacagatccgtgcggctgcaggacctgtttgaaggatacggc1020ctgctgacagacttcagcaaggtgtccatgaacctgaatccctaccagctgcgggtgcag1080ggaatggaaaaccagctgaccaacgaggagctgttcgtggccctgaagaatatcgtgaag1140agaagaggcatcagctacctggacgatgccagcgaggacggcggcaccgtgagtagcgac1200tacggcaaggctgtggaagaaaacagaaaactgctggcggaaaagacgcccggccaaatc1260cagctggaacgcttcgagaagtatggccagctgagaggcgacttcaccgtggaagaaaat1320ggcgagaagcatagactgatcaacgtgttcagcaccagcgcctacagaaaggaagctgaa1380cggatcctgcggaagcagcaggagttcaacagcaagatcacagacgagtttattgaggac1440tacctgatcatcctgacaggaaaacggaagtactaccacggacctggcaacgagaagagc1500agaaccgactacggcagattcagaaccgacggcaccaccctggacaacatcttcggcatc1560ctgattggaaagtgtacattctacaccgaagagtatcgggcctctaaggccagctacaca1620gcccaggagttcaacctgctcaacgatctgaacaacctgaccgtgcctaccgagacaaag1680aaactgagcgaggagcagaagaagctgatcatcgagtacgccaaatctgccaagaccctc1740ggcgccagcaccctgctgaaatatatcgccaaaatgatcgacgccagcgtcgaccagatc1800agaggctaccgggtggacgtgaacaacaagcccgagatgcacaccttcgaggtctaccga1860aagatgcagagcctggaaacaatcaaggtggaagaactgcctagaaaggtcctggatgaa1920ctggcccacatcctcaccctgaataccgagagagagggcatcgaggaggccatcaacagc1980aagctgaaggacatcttcaaccgcgaccaggtgctggagctggtgcagttcagaaagaac2040aacagcagtctgttctccaagggatggcacaacttcagcatcaagctgatgatggaactg2100atcccagagctgtatgaaacatccgaagaacagatgaccatcctgacaagactgggcaaa2160cagcgttctaaggagacctctaagcggaccaaatacatcgatgagaaagaactgaccgag2220gagatctataaccccgtggtggccaaaagcgtccggcaggccatcaagatcatcaacgag2280gccactaagaagtacggcattttcgacaacatcgtgatcgagatggccagagaaaacaac2340gaagaagatgccaagaaagattatattaaaaggcaaaaagctaatcaagatgaaaagaac2400gccgccatggaaaaggctgcattccagtacaatggcaagaaggaactgcctgataatatc2460tttcacggccacaaggagctgacaacaaaaattcggctgtggcaccagcagggagaaaag2520tgcctgtacaccggaaagaatatccctatctctgatcttattcacaaccagtacaagtac2580gagatcgaccacatcctgcccctgtccctgagctttgacgactctctgagcaacaaggtt2640ctggttctggccaccgccaaccaggagaagggccaaagaactcctttccaggccctggac2700agcatggacgacgcctggagctacagagagttcaagagctacgtgaaagactctaaactg2760ctgtctaacaagaagaaagactacctgttgacagaggaggatatctccaagatcgaggtc2820aagcagaaattcatcgagagaaatctggtggataccagatacagctccagagtggttctg2880aatgcccttcaagacttctacaagagccaccagctggacaccaccatctcagtggtgcgg2940ggccagtttaccagccagctgcggagaaagtggggcatcgagaaaagcagggaaacctac3000caccaccatgccgtagacgctcttatcattgctgcctctagccagctgcggctgtggaag3060aagcacagcaaccctctgatcgcctataaggagggccagtttgtggacagcgagacaggc3120gagatcgtgtctctgtccgacgaagaatacaaggaactggtgtttaaggccccttacgat3180cactttgtggataccctgagaagcaagaaattcgaagatagcatcctgtttagctatcaa3240gtggattctaagtacaacagaaagatctccgatgcaacaatctacgcgaccaggaaggct3300aagctggataaggaaaagaaggagtacacatacaccctcggaaagatcaaagatatctac3360gccctgggcacaaagaccccttccaagaccggattctacaagttcctggacctgtacaag3420accgataagagccagttcctgatgtaccaaaaggatagaaagacctgggacgaggtgatc3480gagaaaatcatcgagcagtaccggccttttaaggagtacgacaagaacggcaaagaggtg3540gatttcaaccccttcgagaagtacagaatcggcaatggccccatccggaaatacagcaag3600aagggcaacggacctgagatcaagagtctgaaatattacgacatcctgctgggcaaacac3660aagaacatcactcctgacggatctagaaacaccgtggccctgctgagcctgaacccttgg3720agaacagacgtgtactacaacagcgaaacaaagaagtacgagttcctgggactcaagtac3780gccgacctgtgcttcgaagagggcggagcctacggcatcagcgaggtgaagtacaagaag3840atcagagaaaaggagggcatcggcaagaatagcgagttcaagttcaccctgtacaagaac3900gacctgattctgatcaaggacaccgaaaccaactgccagcagttcttcagattctggagc3960agaaccggtaaggacaaccctaaatctttcgaaaagcataagatcgagctgaagccttac4020gagaaagccaagttcgagaaaggcgaggagctaaaagtgctgggcaaggtgccaccttct4080tccaaccagtttcagaagaacatgcaaatcgagaacttgagcatctacaaggtcaagaca4140gacatcctgggtaacaaacactttatcaaaaaggagggagatgaacccaagctcaagttc4200aagaagagcggcgggagcggcgggagcggggggagcactaatctgagcgacatcattgag4260aaggagactgggaaacagctggtcattcaggagtccatcctgatgctgcctgaggaggtg4320gaggaagtgatcggcaacaagccagagtctgacatcctggtgcacaccgcctacgacgag4380tccacagatgagaatgtgatgctgctgacctctgacgcccccgagtataagccttgggcc4440ctggtcatccaggattctaacggcgagaataagatcaagatgctgagcggaggatccgga4500ggatctggaggcagcaccaacctgtctgacatcatcgagaaggagacaggcaagcagctg4560gtcatccaggagagcatcctgatgctgcccgaagaagtcgaagaagtgatcggaaacaag4620cctgagagcgatatcctggtccataccgcctacgacgagagtaccgacgaaaatgtgatg4680ctgctgacatccgacgccccagagtataagccctgggctctggtcatccaggattccaac4740ggagagaacaaaatcaaaatgctgtctggcggctcaaaaagaaccgccgacggcagcgaa4800ttcgagcccaagaagaagaggaaagtc4827<210>7<211>4941<212>dna<213>pgl3-u6-sgogrnainsertsite-scaffold<400>7gagggcctatttcccatgattccttcatatttgcatatacgatacaaggctgttagagag60ataattggaattaatttgactgtaaacacaaagatattagtacaaaatacgtgacgtaga120aagtaataatttcttgggtagtttgcagttttaaaattatgttttaaaatggactatcat180atgcttaccgtaacttgaaagtatttcgatttcttggctttatatatcttgtggaaagga240cgaaacaccgtgagaccgagagagggtctcagtttttgtactctcaaggaaacttgcaga300agctacaaagataaggcttcatgccgaattcaacaccctgtcatttatggcggggtgttt360ttttttttaaagaattctcgacctcgagacaaatggcagtattcatccacaattttaaaa420gaaaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacag480acatacaaactaaagaattacaaaaacaaattacaaaaattcaaaattttcgggtttatt540acagggacagcagagatccactttggccgcggctcgagggggttggggttgcgccttttc600caaggcagccctgggtttgcgcagggacgcggctgctctgggcgtggttccgggaaacgc660agcggcgccgaccctgggactcgcacattcttcacgtccgttcgcagcgtcacccggatc720ttcgccgctacccttgtgggccccccggcgacgcttcctgctccgcccctaagtcgggaa780ggttccttgcggttcgcggcgtgccggacgtgacaaacggaagccgcacgtctcactagt840accctcgcagacggacagcgccagggagcaatggcagcgcgccgaccgcgatgggctgtg900gccaatagcggctgctcagcagggcgcgccgagagcagcggccgggaaggggcggtgcgg960gaggcggggtgtggggcggtagtgtgggccctgttcctgcccgcgcggtgttccgcattc1020tgcaagcctccggagcgcacgtcggcagtcggctccctcgttgaccgaatcaccgacctc1080tctccccagggggatccatggtgagcaagggcgaggagctgttcaccggggtggtgccca1140tcctggtcgagctggacggcgacgtaaacggccacaagttcagcgtgtccggcgagggcg1200agggcgatgccacctacggcaagctgaccctgaagttcatctgcaccaccggcaagctgc1260ccgtgccctggcccaccctcgtgaccaccctgacctacggcgtgcagtgcttcagccgct1320accccgaccacatgaagcagcacgacttcttcaagtccgccatgcccgaaggctacgtcc1380aggagcgcaccatcttcttcaaggacgacggcaactacaagacccgcgccgaggtgaagt1440tcgagggcgacaccctggtgaaccgcatcgagctgaagggcatcgacttcaaggaggacg1500gcaacatcctggggcacaagctggagtacaactacaacagccacaacgtctatatcatgg1560ccgacaagcagaagaacggcatcaaggtgaacttcaagatccgccacaacatcgaggacg1620gcagcgtgcagctcgccgaccactaccagcagaacacccccatcggcgacggccccgtgc1680tgctgcccgacaaccactacctgagcacccagtccgccctgagcaaagaccccaacgaga1740agcgcgatcacatggtcctgctggagttcgtgaccgccgccgggatcactctcggcatgg1800acgagctgtacaagtaaagcggccgcgactctagatcataatcagccataccacatttgt1860agaggttttacttgctttaaaaaacctcccacacctccccctgaacctgaaacataaaat1920gaatgcaattgttgttgttaacttgtttattgcagcttataatggttacaaataaagcaa1980tagcatcacaaatttcacaaataaagcatttttttcactgcattctagttgtggtttgtc2040caaactcatcaatgtatcttagtcgaccgatgcccttgagagccttcaacccagtcagct2100ccttccggtgggcgcggggcatgactatcgtcgccgcacttatgactgtcttctttatca2160tgcaactcgtaggacaggtgccggcagcgctcttccgcttcctcgctcactgactcgctg2220cgctcggtcgttcggctgcggcgagcggtatcagctcactcaaaggcggtaatacggtta2280tccacagaatcaggggataacgcaggaaagaacatgtgagcaaaaggccagcaaaaggcc2340aggaaccgtaaaaaggccgcgttgctggcgtttttccataggctccgcccccctgacgag2400catcacaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactataaagatac2460caggcgtttccccctggaagctccctcgtgcgctctcctgttccgaccctgccgcttacc2520ggatacctgtccgcctttctcccttcgggaagcgtggcgctttctcatagctcacgctgt2580aggtatctcagttcggtgtaggtcgttcgctccaagctgggctgtgtgcacgaacccccc2640gttcagcccgaccgctgcgccttatccggtaactatcgtcttgagtccaacccggtaaga2700cacgacttatcgccactggcagcagccactggtaacaggattagcagagcgaggtatgta2760ggcggtgctacagagttcttgaagtggtggcctaactacggctacactagaagaacagta2820tttggtatctgcgctctgctgaagccagttaccttcggaaaaagagttggtagctcttga2880tccggcaaacaaaccaccgctggtagcggtggtttttttgtttgcaagcagcagattacg2940cgcagaaaaaaaggatctcaagaagatcctttgatcttttctacggggtctgacgctcag3000tggaacgaaaactcacgttaagggattttggtcatgagattatcaaaaaggatcttcacc3060tagatccttttaaattaaaaatgaagttttaaatcaatctaaagtatatatgagtaaact3120tggtctgacagttaccaatgcttaatcagtgaggcacctatctcagcgatctgtctattt3180cgttcatccatagttgcctgactccccgtcgtgtagataactacgatacgggagggctta3240ccatctggccccagtgctgcaatgataccgcgggacccacgctcaccggctccagattta3300tcagcaataaaccagccagccggaagggccgagcgcagaagtggtcctgcaactttatcc3360gcctccatccagtctattaattgttgccgggaagctagagtaagtagttcgccagttaat3420agtttgcgcaacgttgttgccattgctacaggcatcgtggtgtcacgctcgtcgtttggt3480atggcttcattcagctccggttcccaacgatcaaggcgagttacatgatcccccatgttg3540tgcaaaaaagcggttagctccttcggtcctccgatcgttgtcagaagtaagttggccgca3600gtgttatcactcatggttatggcagcactgcataattctcttactgtcatgccatccgta3660agatgcttttctgtgactggtgagtactcaaccaagtcattctgagaatagtgtatgcgg3720cgaccgagttgctcttgcccggcgtcaatacgggataataccgcgccacatagcagaact3780ttaaaagtgctcatcattggaaaacgttcttcggggcgaaaactctcaaggatcttaccg3840ctgttgagatccagttcgatgtaacccactcgtgcacccaactgatcttcagcatctttt3900actttcaccagcgtttctgggtgagcaaaaacaggaaggcaaaatgccgcaaaaaaggga3960ataagggcgacacggaaatgttgaatactcatactcttcctttttcaatattattgaagc4020atttatcagggttattgtctcatgagcggatacatatttgaatgtatttagaaaaataaa4080caaataggggttccgcgcacatttccccgaaaagtgccacctgacgcgccctgtagcggc4140gcattaagcgcggcgggtgtggtggttacgcgcagcgtgaccgctacacttgccagcgcc4200ctagcgcccgctcctttcgctttcttcccttcctttctcgccacgttcgccggctttccc4260cgtcaagctctaaatcgggggctccctttagggttccgatttagtgctttacggcacctc4320gaccccaaaaaacttgattagggtgatggttcacgtagtgggccatcgccctgatagacg4380gtttttcgccctttgacgttggagtccacgttctttaatagtggactcttgttccaaact4440ggaacaacactcaaccctatctcggtctattcttttgatttataagggattttgccgatt4500tcggcctattggttaaaaaatgagctgatttaacaaaaatttaacgcgaattttaacaaa4560atattaacgcttacaatttgccattcgccattcaggctgcgcaactgttgggaagggcga4620tcggtgcgggcctcttcgctattacgccagcccaagctaccatgataagtaagtaatatt4680aaggtacgggaggtacttggagcggccgcaataaaatatctttattttcattacatctgt4740gtgttggttttttgtgtgaatcgatagtactaacatacgctctccatcaaaacaaaacga4800aacaaaacaaactagcaaaataggctgtccccagtgcaagtgcaggtgccagaacatttc4860tctatcgataggtaccgattagtgaacggatctcgacggtatcgatcacgagactagcct4920cgagcggccgcccccttcacc4941<210>8<211>7<212>prt<213>核酸定位信号多肽<400>8prolyslyslysarglysval15<210>9<211>3785<212>dna<213>32aalinker-sgocas9d9a-10aalinker-ugi<400>9agcggaggatcctctggcagcgagacaccaggaacaagcgagtcagcaacaccagagagc60agtggcggcagcagcggcggcagcaacggcctggtgctgggcctggccatcggcatcgcc120tctgtgggcgtgggcatcctggagaaagacactggcaagatcattcacgcttcgagcaga180ctgttcccagccgccacagccgacaacaatgtggagagacggagcaatagacagggcaga240cggctaaaccggcggaaaaagcacagatccgtgcggctgcaggacctgtttgaaggatac300ggcctgctgacagacttcagcaaggtgtccatgaacctgaatccctaccagctgcgggtg360cagggaatggaaaaccagctgaccaacgaggagctgttcgtggccctgaagaatatcgtg420aagagaagaggcatcagctacctggacgatgccagcgaggacggcggcaccgtgagtagc480gactacggcaaggctgtggaagaaaacagaaaactgctggcggaaaagacgcccggccaa540atccagctggaacgcttcgagaagtatggccagctgagaggcgacttcaccgtggaagaa600aatggcgagaagcatagactgatcaacgtgttcagcaccagcgcctacagaaaggaagct660gaacggatcctgcggaagcagcaggagttcaacagcaagatcacagacgagtttattgag720gactacctgatcatcctgacaggaaaacggaagtactaccacggacctggcaacgagaag780agcagaaccgactacggcagattcagaaccgacggcaccaccctggacaacatcttcggc840atcctgattggaaagtgtacattctacaccgaagagtatcgggcctctaaggccagctac900acagcccaggagttcaacctgctcaacgatctgaacaacctgaccgtgcctaccgagaca960aagaaactgagcgaggagcagaagaagctgatcatcgagtacgccaaatctgccaagacc1020ctcggcgccagcaccctgctgaaatatatcgccaaaatgatcgacgccagcgtcgaccag1080atcagaggctaccgggtggacgtgaacaacaagcccgagatgcacaccttcgaggtctac1140cgaaagatgcagagcctggaaacaatcaaggtggaagaactgcctagaaaggtcctggat1200gaactggcccacatcctcaccctgaataccgagagagagggcatcgaggaggccatcaac1260agcaagctgaaggacatcttcaaccgcgaccaggtgctggagctggtgcagttcagaaag1320aacaacagcagtctgttctccaagggatggcacaacttcagcatcaagctgatgatggaa1380ctgatcccagagctgtatgaaacatccgaagaacagatgaccatcctgacaagactgggc1440aaacagcgttctaaggagacctctaagcggaccaaatacatcgatgagaaagaactgacc1500gaggagatctataaccccgtggtggccaaaagcgtccggcaggccatcaagatcatcaac1560gaggccactaagaagtacggcattttcgacaacatcgtgatcgagatggccagagaaaac1620aacgaagaagatgccaagaaagattatattaaaaggcaaaaagctaatcaagatgaaaag1680aacgccgccatggaaaaggctgcattccagtacaatggcaagaaggaactgcctgataat1740atctttcacggccacaaggagctgacaacaaaaattcggctgtggcaccagcagggagaa1800aagtgcctgtacaccggaaagaatatccctatctctgatcttattcacaaccagtacaag1860tacgagatcgaccacatcctgcccctgtccctgagctttgacgactctctgagcaacaag1920gttctggttctggccaccgccaaccaggagaagggccaaagaactcctttccaggccctg1980gacagcatggacgacgcctggagctacagagagttcaagagctacgtgaaagactctaaa2040ctgctgtctaacaagaagaaagactacctgttgacagaggaggatatctccaagatcgag2100gtcaagcagaaattcatcgagagaaatctggtggataccagatacagctccagagtggtt2160ctgaatgcccttcaagacttctacaagagccaccagctggacaccaccatctcagtggtg2220cggggccagtttaccagccagctgcggagaaagtggggcatcgagaaaagcagggaaacc2280taccaccaccatgccgtagacgctcttatcattgctgcctctagccagctgcggctgtgg2340aagaagcacagcaaccctctgatcgcctataaggagggccagtttgtggacagcgagaca2400ggcgagatcgtgtctctgtccgacgaagaatacaaggaactggtgtttaaggccccttac2460gatcactttgtggataccctgagaagcaagaaattcgaagatagcatcctgtttagctat2520caagtggattctaagtacaacagaaagatctccgatgcaacaatctacgcgaccaggaag2580gctaagctggataaggaaaagaaggagtacacatacaccctcggaaagatcaaagatatc2640tacgccctgggcacaaagaccccttccaagaccggattctacaagttcctggacctgtac2700aagaccgataagagccagttcctgatgtaccaaaaggatagaaagacctgggacgaggtg2760atcgagaaaatcatcgagcagtaccggccttttaaggagtacgacaagaacggcaaagag2820gtggatttcaaccccttcgagaagtacagaatcggcaatggccccatccggaaatacagc2880aagaagggcaacggacctgagatcaagagtctgaaatattacgacatcctgctgggcaaa2940cacaagaacatcactcctgacggatctagaaacaccgtggccctgctgagcctgaaccct3000tggagaacagacgtgtactacaacagcgaaacaaagaagtacgagttcctgggactcaag3060tacgccgacctgtgcttcgaagagggcggagcctacggcatcagcgaggtgaagtacaag3120aagatcagagaaaaggagggcatcggcaagaatagcgagttcaagttcaccctgtacaag3180aacgacctgattctgatcaaggacaccgaaaccaactgccagcagttcttcagattctgg3240agcagaaccggtaaggacaaccctaaatctttcgaaaagcataagatcgagctgaagcct3300tacgagaaagccaagttcgagaaaggcgaggagctaaaagtgctgggcaaggtgccacct3360tcttccaaccagtttcagaagaacatgcaaatcgagaacttgagcatctacaaggtcaag3420acagacatcctgggtaacaaacactttatcaaaaaggagggagatgaacccaagctcaag3480ttcaagaagagcggcgggagcggcgggagcggggggagcactaatctgagcgacatcatt3540gagaaggagactgggaaacagctggtcattcaggagtccatcctgatgctgcctgaggag3600gtggaggaagtgatcggcaacaagccagagtctgacatcctggtgcacaccgcctacgac3660gagtccacagatgagaatgtgatgctgctgacctctgacgcccccgagtataagccttgg3720gccctggtcatccaggattctaacggcgagaataagatcaagatgctgagcggaggatcc3780ggagg3785当前第1页1 2 3 当前第1页1 2 3 
技术特征:1.一种融合蛋白,其特征在于,包含源于戈登链球菌且经密码子优化的cas9nickase同源蛋白、胞嘧啶脱氨酶和尿嘧啶糖基化酶抑制蛋白;
所述cas9nickase同源蛋白的氨基酸序列为:
a):seqidno.1所示的sgocas9d9anickase的n端第2~1136位氨基酸序列;
或b):与seqidno.1具有90%以上序列一致性且具有seqidno.1所示氨基酸的功能;
或c):与seqidno.1具有通过内含子序列进行氨基酸序列剪切和拼接得到完整融合蛋白功能或具有与seqidno.1部分相同的氨基酸序列、且具有seqidno.1所示氨基酸的功能。
2.根据权利要求1所述融合蛋白,其特征在于,所述融合蛋白还包括n端的bpnls-ancapobec1多肽和c端的2*ugi-bpnls多肽;其中,bpnls-ancapobec1多肽由bpnls多肽和ancapobec1多肽融合而成;2*ugi-bpnls多肽由ugi多肽与bpnls多肽融合而成;
所述bpnls-ancapobec1多肽的氨基酸序列为:
d):seqidno.3所示的氨基酸序列;
或e):与seqidno.3具有90%以上序列一致性,且具有seqidno.3所示氨基酸的功能;
所述2*ugi-bpnls多肽的氨基酸序列为:
f):seqidno.4所示的氨基酸序列;
或g):与seqidno.4具有90%以上序列一致性,且具有seqidno.4所示氨基酸的功能的氨基酸序列。
3.根据权利要求1或2所述融合蛋白,其特征在于,从n端到c端依次包括bpnls多肽、ancapobec1多肽、32aalinker、sgocas9d9anickase的n端第2~1136位氨基酸序列、10aalinker、2*ugi多肽、bpnls多肽;
所述融合蛋白的氨基酸全序列为:
h):seqidno.5所示的氨基酸序列;
或i):由bpnls多肽、ancapobec1多肽、32aalinker、sgocas9d9anickase的n端第2~1136位氨基酸序列、10aalinker、2*ugi多肽这6个元件经重排或增减实现胞嘧啶碱基编辑为胸腺嘧啶功能的氨基酸序列;
或j):与seqidno.5所示氨基酸序列具有90%以上序列一致性,且具有识别nnaaag为pam、将胞嘧啶碱基编辑为胸腺嘧啶功能的氨基酸序列。
4.根据权利要求3所述融合蛋白,其特征在于,所述融合蛋白识别nnaaag作为pam,其中n表示任意碱基;所述融合蛋白在编辑窗口8-14位将胞嘧啶碱基编辑为胸腺嘧啶。
5.根据权利要求4所述融合蛋白,其特征在于,所述融合蛋白还包括核酸定位信号多肽片段,所述核酸定位信号多肽片段的氨基酸序列如seqidno.8所示。
6.一种多核苷酸序列,其特征在于,所述多核苷酸序列为编码权利要求1~4任一所述融合蛋白的多核苷酸序列;所述多核苷酸序列如seqidno.6所示。
7.一种胞嘧啶单碱基编辑器,其特征在于,所述胞嘧啶单碱基编辑器是将编码权利要求1~4任一项所述融合蛋白的多核苷酸序列整合到表达载体上得到。
8.根据权利要求7所述胞嘧啶单碱基编辑器,其特征在于,所述表达载体包括源于戈登链球菌串联重复序列构成的grnascaffold的表达载体,所述表达载体的核苷酸序列如seqidno.7所示。
9.一种细胞表达系统,其特征在于,含有权利要求7或8所述胞嘧啶单碱基编辑器;所述细胞为宿主细胞,所述宿主细胞为真核细胞或原核细胞。
10.根据权利要求9所述细胞表达系统,其特征在于,所述细胞为小鼠脑神经瘤细胞、人胚胎肾细胞或人结肠癌细胞。
技术总结本发明属于生物工程与基因编辑技术领域,具体涉及一种胞嘧啶单碱基编辑器工具及其应用。本发明胞嘧啶单碱基编辑器含有编码融合蛋白的多核苷酸序列,融合蛋白包含源于戈登链球菌且经密码子优化的Cas9 nickase同源蛋白SgoCas9 D9A nickase,本发明胞嘧啶单碱基编辑器能够识别NNAAAG作为PAM,可编辑窗口为gRNA靶向序列范围内8‑14位的胞嘧啶,可实现特定碱基C‑to‑T的转变,拓宽了碱基编辑的靶向范围和应用范围。
技术研发人员:乔云波;李丽平
受保护的技术使用者:广州大学
技术研发日:2021.05.12
技术公布日:2021.08.03