基础技能
JavaScript 文档
以最新的 JavaScript 标准为基准。通过简单但足够详细的内容,为你讲解从基础到高阶的 JavaScript 相关知识。
Java 文档
C/C++ 文档
Node.js 文档
GO 文档
爬取技能
URL 处理模块
urllib3 is a powerful, user-friendly HTTP client for Python
A comprehensive HTTP client library.
让 HTTP 服务人类
Asynchronous HTTP Client/Server for asyncio and Python.
PySpider 爬虫框架官方文档
Scrapy 爬虫框架官方文档
This library intends to make parsing HTML as simple and intuitive as possible.
Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library.
Selenium 是支持 web 浏览器自动化的一系列工具和库的综合项目。
Splash is a javascript rendering service
Everything is done in 100% pure Python so it's extremely easy to install and use
Run JavaScript code from Python.
asyncio 是用来编写并发代码的库,使用 async/await 语法。
gevent is a coroutine -based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libev or libuv event loop.
Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
Twisted is an event-driven networking engine written in Python
解析技能
Python 正则表达式官方文档
The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt.
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
cssselect2 is a straightforward implementation of CSS3 Selectors for markup documents (HTML, XML, etc.) that can be read by ElementTree-like parsers (including cElementTree, lxml, html5lib_, etc.)
html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.
Universal Feed Parser is a Python module for downloading and parsing syndicated feeds.
goose3
Article scraping & curation
OCRmyPDF adds an optical charcter recognition (OCR) text layer to scanned PDF files, allowing them to be searched.
Pdfminer.six is a python package for extracting information from PDF documents.
Manipulate audio with a simple and easy high level interface
PyYAML is a YAML parser and emitter for Python.
Measure the readability of a given text using surface characteristics
A pure-python HTML screen-scraping library
untangle is a tiny Python library which converts an XML document to a Python object.
convert xml file to python native dict object
清洗技能
Numpy 科学计算 官方中文文档
Pandas 结构化数据分析 官方中文文档
结巴中文分词
Matplotlib 2D绘图库 官方中文文档
Gensim is a FREE Python library
A simple Python module for parsing human names into their individual components.
NLTK is a leading platform for building Python programs to work with human language data.
Python port of Google's libphonenumber
PyNLPIR is a Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.
SnowNLP是一个python写的类库,可以方便的处理中文文本内容
An Efficient Lexical Analyzer for Chinese
translate chinese hanzi to pinyin by python, inspired by flyerhzm’s chinese_pinyin gem
存储技能
MongoDB API 文档
PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python
Redis API 文档
The Python interface to the Redis key-value store.
MySQL 文档
A simple database interface for Python that builds on top of FreeTDSto provide a Python DB-API (PEP-249) interface to Microsoft SQL Server.
Python Mysql Client
cx_Oracle is a Python extension module that enables access to Oracle Database.
Python Elasticsearch Client
JSON (JavaScript Object Notation), specified by RFC 7159 (which obsoletes RFC 4627) and by ECMA-404, is a lightweight data interchange format inspired byJavaScript object literal syntax
A fast yet powerful Python Markdown parser with renderers and plugins, compatible with sane CommonMark rules.
Python adapter for PostgreSQL
Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line.
Python ODBC bridge
A Pure-Python library built as a PDF toolkit.
The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
This package is for reading data and formatting information from older Excel files
xlwt is a library for writing data and formatting information to older Excel files (ie: .xls)
反爬工具
AST explorer
JavaScript AST visualizer
js-code-to-svg-flowchart
阿里出品的在线图片 OCR 识别应用
Convert curl syntax to Python, Ansible URI, MATLAB, Node.js, R, PHP, Strest, Go, Dart, JSON, Elixir, Rust
百度在线字体编辑器
奇Q在线字体编辑器
A simple HTTP Request & Response Service.
加速技能
Redis-based components for Scrapy.
Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators).
Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.
multiprocessing is a package that supports spawning processes using an API similar to the threading module.
The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.
This module constructs higher-level threading interfaces on top of the lower level _thread module. See also the queue module.
Doing subprocess in Python should be easy
a lightweight alternative.
RabbitMQ是实现了高级消息队列协议(AMQP)的开源消息代理软件(亦称面向消息的中间件)。
RQ (Redis Queue) is a simple Python library for queueing jobs and processing them in the background with workers.
部署技能
Learn how Docker helps developers bring their ideas to life by conquering the complexity of app development.
Kubernetes 是用于自动部署,扩展和管理容器化应用程序的开源系统。
Red Hat OpenShift is an open source container application platform based on the Kubernetes container orchestrator for enterprise app development and deployment.
Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.
Scrapyd-client is a client for scrapyd.
python-scrapyd-api is a very simple Python wrapper for working withScrapyd‘s API;it allows a Python application to talk to, and therefore control, the Scrapy Daemon.
用于 Scrapyd 集群管理的 web 应用,支持 Scrapy 日志分析和可视化。
分布式爬虫管理平台-量身打造的企业级产品,让您轻轻松松管理爬虫
爬取工具
AnyProxy是一个开放式的HTTP代理服务器。
Mobile App Automation Made Awesome.
Charles is an HTTP proxy / HTTP monitor / Reverse Proxy that enables a developer to view all of the HTTP and SSL / HTTPS traffic between their machine and the Internet.
Google Chrome 网络浏览器
Google Chrome 网络浏览器
Fiddler is a free web debugging tool which logs all HTTP(S) traffic between your computer and the Internet. Inspect traffic, set breakpoints, and fiddle with incoming or outgoing data.
mitmproxy is a free and open source interactive HTTPS proxy.
Wireshark is a network packet analyzer. A network packet analyzer presents captured packet data in as much detail as possible.
浏览器插件
EditThisCookie is a cookie manager. You can add, delete, edit, search, protect and block cookies!
Tampermonkey is the most popular userscript manager, with over 10 million weekly users. It's available for Microsoft Edge, Chrome, Safari, Opera Next, and Firefox.
ReRes 可以用来更改页面请求响应的内容。通过指定规则,您可以把请求映射到其他的url,也可以映射到本机的文件或者目录。ReRes支持单个url映射,也支持目录映射。
Extract, edit, and evaluate XPath queries with ease.
轻松快捷地管理和切换多个代理设置。
Makes JSON easy to read. Open source.
Python 文档