Loading...
Loading...
Compare original and translation side by side
| Task Category | Examples |
|---|---|
| Sequence Operations | Create, modify, translate DNA/RNA/protein sequences |
| File Format Handling | Parse or convert FASTA, GenBank, FASTQ, PDB, mmCIF |
| NCBI Database Access | Query GenBank, PubMed, Protein, Gene, Taxonomy |
| Similarity Searches | Execute BLAST locally or via NCBI, parse results |
| Alignment Work | Pairwise or multiple sequence alignments |
| Structural Analysis | Parse PDB files, compute distances, DSSP assignment |
| Tree Construction | Build, manipulate, visualize phylogenetic trees |
| Motif Discovery | Find and score sequence patterns |
| Sequence Statistics | GC content, molecular weight, melting temperature |
| 任务类别 | 示例 |
|---|---|
| 序列操作 | 创建、修改、翻译DNA/RNA/蛋白质序列 |
| 文件格式处理 | 解析或转换FASTA、GenBank、FASTQ、PDB、mmCIF格式 |
| NCBI数据库访问 | 查询GenBank、PubMed、Protein、Gene、Taxonomy数据库 |
| 相似性搜索 | 本地或通过NCBI运行BLAST、解析结果 |
| 序列比对工作 | 双序列或多序列比对 |
| 结构分析 | 解析PDB文件、计算距离、DSSP赋值 |
| 进化树构建 | 构建、操作、可视化进化树 |
| 基序发现 | 查找并评分序列模式 |
| 序列统计 | GC含量、分子量、解链温度 |
| Module | Purpose | Reference |
|---|---|---|
| Bio.Seq / Bio.SeqIO | Sequence objects and file I/O | |
| Bio.Align / Bio.AlignIO | Pairwise and multiple alignments | |
| Bio.Entrez | NCBI database programmatic access | |
| Bio.Blast | BLAST execution and result parsing | |
| Bio.PDB | 3D structure manipulation | |
| Bio.Phylo | Phylogenetic tree operations | |
| Bio.motifs, Bio.SeqUtils, etc. | Motifs, utilities, restriction sites | |
| 模块 | 用途 | 参考文档 |
|---|---|---|
| Bio.Seq / Bio.SeqIO | 序列对象与文件输入输出 | |
| Bio.Align / Bio.AlignIO | 双序列与多序列比对 | |
| Bio.Entrez | NCBI数据库程序化访问 | |
| Bio.Blast | BLAST运行与结果解析 | |
| Bio.PDB | 3D结构操作 | |
| Bio.Phylo | 进化树操作 | |
| Bio.motifs, Bio.SeqUtils, etc. | 基序、工具函数、酶切位点 | |
uv pip install biopythonfrom Bio import Entrez
Entrez.email = "researcher@institution.edu"
Entrez.api_key = "your_ncbi_api_key" # Optional: increases rate limit to 10 req/suv pip install biopythonfrom Bio import Entrez
Entrez.email = "researcher@institution.edu"
Entrez.api_key = "your_ncbi_api_key" # 可选:将请求速率限制提升至10次/秒from Bio import SeqIO
records = SeqIO.parse("data.fasta", "fasta")
for rec in records:
print(f"{rec.id}: {len(rec)} bp")from Bio import SeqIO
records = SeqIO.parse("data.fasta", "fasta")
for rec in records:
print(f"{rec.id}: {len(rec)} bp")from Bio.Seq import Seq
dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
protein = dna.translate()from Bio.Seq import Seq
dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
protein = dna.translate()from Bio import Entrez
Entrez.email = "researcher@institution.edu"
handle = Entrez.esearch(db="nucleotide", term="insulin[Gene] AND human[Organism]")
results = Entrez.read(handle)
handle.close()from Bio import Entrez
Entrez.email = "researcher@institution.edu"
handle = Entrez.esearch(db="nucleotide", term="insulin[Gene] AND human[Organism]")
results = Entrez.read(handle)
handle.close()from Bio.Blast import NCBIWWW, NCBIXML
result = NCBIWWW.qblast("blastp", "swissprot", "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQQIAAALEHHHHHH")
record = NCBIXML.read(result)from Bio.Blast import NCBIWWW, NCBIXML
result = NCBIWWW.qblast("blastp", "swissprot", "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQQIAAALEHHHHHH")
record = NCBIXML.read(result)from Bio.PDB import PDBParser
parser = PDBParser(QUIET=True)
structure = parser.get_structure("protein", "structure.pdb")
for atom in structure.get_atoms():
print(atom.name, atom.coord)from Bio.PDB import PDBParser
parser = PDBParser(QUIET=True)
structure = parser.get_structure("protein", "structure.pdb")
for atom in structure.get_atoms():
print(atom.name, atom.coord)from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
alignment = AlignIO.read("aligned.fasta", "fasta")
calc = DistanceCalculator("identity")
dm = calc.get_distance(alignment)
tree = DistanceTreeConstructor().nj(dm)
Phylo.draw_ascii(tree)from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
alignment = AlignIO.read("aligned.fasta", "fasta")
calc = DistanceCalculator("identity")
dm = calc.get_distance(alignment)
tree = DistanceTreeConstructor().nj(dm)
Phylo.draw_ascii(tree)| File | Contents |
|---|---|
| Bio.Seq objects, SeqIO parsing/writing, large file handling, format conversion |
| Pairwise alignment, BLOSUM matrices, AlignIO, external aligners |
| NCBI Entrez API, esearch/efetch/elink, batch downloads, search syntax |
| Remote/local BLAST, XML parsing, result filtering, batch queries |
| Bio.PDB, SMCRA hierarchy, DSSP, superimposition, spatial queries |
| Tree I/O, distance matrices, tree construction, consensus, visualization |
| Motifs, SeqUtils, restriction enzymes, population genetics, GenomeDiagram |
| 文件 | 内容 |
|---|---|
| Bio.Seq对象、SeqIO解析/写入、大文件处理、格式转换 |
| 双序列比对、BLOSUM矩阵、AlignIO、外部比对工具 |
| NCBI Entrez API、esearch/efetch/elink、批量下载、搜索语法 |
| 远程/本地BLAST、XML解析、结果过滤、批量查询 |
| Bio.PDB、SMCRA层级、DSSP、结构叠加、空间查询 |
| 进化树输入输出、距离矩阵、进化树构建、共识树、可视化 |
| 基序、SeqUtils、限制性内切酶、群体遗传学、GenomeDiagram |
from Bio import Entrez, SeqIO
from Bio.SeqUtils import gc_fraction
Entrez.email = "researcher@institution.edu"
handle = Entrez.efetch(db="nucleotide", id="NM_001301717", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()
print(f"Organism: {record.annotations['organism']}")
print(f"Length: {len(record)} bp")
print(f"GC: {gc_fraction(record.seq):.1%}")from Bio import Entrez, SeqIO
from Bio.SeqUtils import gc_fraction
Entrez.email = "researcher@institution.edu"
handle = Entrez.efetch(db="nucleotide", id="NM_001301717", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()
print(f"物种: {record.annotations['organism']}")
print(f"长度: {len(record)} bp")
print(f"GC含量: {gc_fraction(record.seq):.1%}")from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
output_records = []
for record in SeqIO.parse("input.fasta", "fasta"):
if len(record) >= 200 and gc_fraction(record.seq) > 0.4:
output_records.append(record)
SeqIO.write(output_records, "filtered.fasta", "fasta")from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
output_records = []
for record in SeqIO.parse("input.fasta", "fasta"):
if len(record) >= 200 and gc_fraction(record.seq) > 0.4:
output_records.append(record)
SeqIO.write(output_records, "filtered.fasta", "fasta")from Bio.Blast import NCBIWWW, NCBIXML
query = "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
result_handle = NCBIWWW.qblast("blastp", "nr", query, hitlist_size=20)
record = NCBIXML.read(result_handle)
for alignment in record.alignments:
for hsp in alignment.hsps:
if hsp.expect < 1e-10:
identity_pct = (hsp.identities / hsp.align_length) * 100
print(f"{alignment.accession}: {identity_pct:.1f}% identity, E={hsp.expect:.2e}")from Bio.Blast import NCBIWWW, NCBIXML
query = "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
result_handle = NCBIWWW.qblast("blastp", "nr", query, hitlist_size=20)
record = NCBIXML.read(result_handle)
for alignment in record.alignments:
for hsp in alignment.hsps:
if hsp.expect < 1e-10:
identity_pct = (hsp.identities / hsp.align_length) * 100
print(f"{alignment.accession}: {identity_pct:.1f}% 同源性, E值={hsp.expect:.2e}")from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
import matplotlib.pyplot as plt
alignment = AlignIO.read("sequences.aln", "clustal")
calculator = DistanceCalculator("blosum62")
dm = calculator.get_distance(alignment)
constructor = DistanceTreeConstructor()
tree = constructor.nj(dm)
tree.root_at_midpoint()
tree.ladderize()
fig, ax = plt.subplots(figsize=(12, 8))
Phylo.draw(tree, axes=ax)
fig.savefig("phylogeny.png", dpi=150)from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
import matplotlib.pyplot as plt
alignment = AlignIO.read("sequences.aln", "clustal")
calculator = DistanceCalculator("blosum62")
dm = calculator.get_distance(alignment)
constructor = DistanceTreeConstructor()
tree = constructor.nj(dm)
tree.root_at_midpoint()
tree.ladderize()
fig, ax = plt.subplots(figsize=(12, 8))
Phylo.draw(tree, axes=ax)
fig.savefig("phylogeny.png", dpi=150)from Bio import SeqIO, Entrez
from Bio.Seq import Seqwith open("sequences.fasta") as f:
for record in SeqIO.parse(f, "fasta"):
process(record)undefinedfrom Bio import SeqIO, Entrez
from Bio.Seq import Seqwith open("sequences.fasta") as f:
for record in SeqIO.parse(f, "fasta"):
process(record)undefined
**Error Handling**: Wrap network operations
```python
from urllib.error import HTTPError
try:
handle = Entrez.efetch(db="nucleotide", id=accession)
record = SeqIO.read(handle, "genbank")
except HTTPError as e:
print(f"Fetch failed: {e.code}")
**错误处理**:网络操作包裹异常捕获
```python
from urllib.error import HTTPError
try:
handle = Entrez.efetch(db="nucleotide", id=accession)
record = SeqIO.read(handle, "genbank")
except HTTPError as e:
print(f"获取失败:{e.code}")| Issue | Resolution |
|---|---|
| "No handlers could be found for logger 'Bio.Entrez'" | Set |
| HTTP 400 from NCBI | Verify accession/ID format is correct |
| "ValueError: EOF" during parse | Confirm file format matches format string |
| Alignment length mismatch | Sequences must be pre-aligned for AlignIO |
| Slow BLAST queries | Use local BLAST for large-scale searches |
| PDB parser warnings | Use |
| 问题 | 解决方法 |
|---|---|
| 日志报错:No handlers could be found for logger 'Bio.Entrez' | 在任何查询前设置 |
| NCBI返回HTTP 400错误 | 验证登录号/ID格式是否正确 |
| 解析时出现"ValueError: EOF" | 确认文件格式与指定的格式字符串匹配 |
| 比对长度不匹配 | 序列必须预先比对才能使用AlignIO |
| BLAST查询缓慢 | 大规模搜索使用本地BLAST |
| PDB解析器警告 | 使用 |