IDF2014:英特尔RealSense 3D摄像头试玩天极网笔记本频道 06:00
  在2014年的英特尔信息技术峰会上,英特尔展示了其最新的RealSense 3D摄像头。RealSense 3D已经不再是一个概念,而是一个成形的产品,我们已经可以在市场上看到它。3D摄像头原来体积比较大,现在英特尔把它缩小了,可以直接放到里面。它可以检测到人脸的70个点,所以通过它开发的,比如,你的能够知道你是高兴还是悲伤,还可以做出很多新奇的应用出来。
  在本届IDF峰会现场,我们看到了不少基于RealSense 3D摄像头所开发出来的产品和应用,下面让我们一起来看一看。
  新的RealSense 3D摄像头的体积更小,可以非常方便的整合到笔记本、等产品当中。而且它不仅仅具有Kinect一类的产品的功能,它还能通过内建的传感器非常精准的进行手势甚至是面部表情的识别,可以达到与人眼视觉相仿的效果。此外它还能够通过背景的差异来分析图像,可以实现一些例如视频去背景的功能。
  Win10正式版已经发布,相信Win10其中一项名为Windows Hello的快速登录功能(俗称&刷脸&)吸引了很多人的目光。从微软方面了解到的情况看,目前只有Intel的RealSense 3D摄像头能够支持这一功能,那么问题来了,到底这个RealSense 3D是个什么鬼?为啥只能它才能玩转Windows Hello?一篇文章带大家快速了解一下。软件名称:Win10系统正式版软件版本:简体中文版软件大小:3642.58MB软件授权:共享适用平台:Win10下载地址:图01 你家摄像头为啥用不了Windows Hello(图片来源网络)  1. 揭秘Windows Hello  很多人都认为Windows Hello的原理并不复杂,无非就是预先给使用者&拍个照&,然后登录时再比一比坐在电脑前面的家伙是不是主人罢了,但事实上这里面的细节远比你想像得复杂得多。  首先刷脸登录要保证安全性问题,一个最简单例子就是当一名黑客拿着一张主人的照片时,是否可以顺利地&蒙混过关&。其次作为一套成熟系统,Win10除了要保证白天能够正确识别主人,晚上是否也能达到同样的效果,也是必须要考虑的一个方面。而这两点,对于现行的摄像头都是难以做到的。图02 尽管很多人都认为Windows Hello的原理并不复杂,但事实上细节足以让你瞠目(图片来源网络)  2. Intel RealSense 3D  Intel RealSense 3D是一套由多组软硬件构成的一体化解决方案,最大特点是除了具备一套高精度摄像头以外,还自带了深度传感器。举个例子说,它可以为你的&脸&建立一个多达70个采样点的&3D立体脸模&,哪里高哪里低,在这个3D模型里一目了然。由此杜绝了&照片黑客&的入侵,甚至即使是看起来相貌很接近的双胞胎,也会因为脸部的一点点差别而被摄像头&拒之门外&。图03 2015年IDF大会,Intel对于RealSense 3D的介绍(图片来源网络)  此外高精度1080P摄像头还可以轻松实现虹膜采样,利用多摄像头所带来的&视差&效应,可以实现类似于人眼效果的高精度侦测,从而为Windows Hello提供更安全的登录体验。  其次Intel RealSense 3D还具有其他方面的一些亮点,比方说借助前后置双摄像头(均在一个平面,这个和我们传统手机上的前后摄像头的概念是不一样的),电脑可以探知你当前的情绪是好是坏,并以此开发出一些更加有趣的应用。或者它还可以借助自带的深度传感器对物体进行测量(长、宽、高),计量一下家具是否合适、衣服是否全体之类。不过最有趣的当数神奇的&光场摄影&,它其实就是平时我们所称的&先拍摄后对焦&,拍摄前只管按下快门,拍摄后再根据需要选择对焦点,一来可以轻松搞定一些突发拍摄,二来也可以更容易&调&出一些画面更佳的美图PP来。图04 除了3D刷脸之外,Intel RealSense 3D还能轻松实现&先拍摄后对焦&(图片来源网络)  3. 如何使用RealSense 3D  当然说再好,自己用不上也是白搭。目前Intel为我们提供两种方案,一种是直接购买一台带有Intel RealSense 3D摄像头的新式电脑。当然这个方案比较坑爹,一是新电脑本身价格不菲,再加上一套Intel RealSense 3D摄像头,估计瞬间可以在价格上把一些普通电脑秒上一条街了。其次就是单独购买一套Intel RealSense 3D摄像头,当然目前的售价也不便宜,网上普遍都在1000元左右,也可以看作是体验的一次代价吧。图05 Intel RealSense 3D摄像头模组可以单购使用,但很显然价格还是一个坎  写在最后  目前看Windows Hello在安全性上还是不用担心的,从外媒公布的视频来看,解锁速度也是很OK的!但由于目前兼容Windows Hello的硬件设备几乎只有Intel RealSense 3D一种,因此总体售价有些偏高。但随着Win10正式发布,相信后期这方面的成本会迅速下降,或许用不了多长时间,我们就能真正进入到崭新的&刷脸&时代!
The soon to be released Intel RealSense 3d camera now has two working open source drivers. Intel’s adherence to the USB Video Class standard meant that drivers don’t need to be written from scratch, just tweaked to work with not yet standardized features to get basic functionality. The camera also has a proprietary interface with unknown capabilities that aren’t critical for it’s use as a depth capture device.
Pixel Formats
For the first two months of working with the camera I was only able to get one video format despite clearly enumerated video formats with different bits per pixel in the USB descriptors, and a list of the same 7 formats via the Linux kernel driver. I read almost all of the USB Video Class (UVC) standard, and checked the USB logs for every value of every struct as I came to them in the documentation. I was mostly looking for anything weird when setting up the infrared video stream. It seemed that infrared was just another video format. Suspecting that something was amiss with setting the format using v4l2-ctl I added code to enumerate and select a format to my depthview program. What I found was that although v4l2-ctl let me pick a pixel format by index at the kernel level the only way to pick a format was via a fourcc code that the kernel already knew about. The only way to select those other formats was to patch the kernel.
Kernel Patch
Though I’ve read through Linux Device Drivers 3rd edition, and even had some patches accepted to the kernel before they were just code cleanup. When I went to read the kernel source I was extremely lucky that the most recent
to the uvcvideo driver was to add a video pixel format.
All it took to get 6 formats was adding them and reloading the driver of the running system. With one of the updates I made a mistake that caused a kernel segfault, but that wasn’t enough to crash the system. It’s more resilient than I thought. I haven’t sent my patch upstream yet because I was trying to figure out some details of the depth formats first. Specifically the scale of the values, and if they are linear, or exponential. Patch is available
Format Details
You may have noticed that I said 7 formats at the beginning, but in the last paragraph I said 6. The YUYV format listed first in the descriptor has an index of 2 instead of 1 so when drivers are asked for the YUYV format they return the second format instead. I haven’t tried hacking in a quirk to select index 1 to see if there really is an unselectable YUYV format. This is assumed to be a bug, but it may have been key in my first day of trying to get the camera working with Linux because it is the only format that the kernel recognized. Indexes in UVC are 1 based.
All of the fourcc codes that are officially part of the UVC standard are also the first 4 characters of the GUID in ASCII. I will refer to the formats by their 4 character ASCII names as derived from the GUID’s.
Index 2 INVZ
INVZ is the default format because of the indexing bug and so it’s the format described in the first 2 blog posts.
Index 3 INZI is a 24 bit format.
The first 16 bits are depth data, and the last 8 are infrared.
Index 4 INVR is another 16 bit depth format.
The difference from INVZ is currently unknown.
Index 5 INRI is another 24 bit 16:8 format.
Presumably there are two combined formats so that either depth format can be used with infrared at the same time. The first two formats have Z, and the second two have R, this might be a pattern indicating how the four formats are grouped.
Index 6 INVI is the infrared stream by itself.
Index 7 RELI
A special infrared stream where alternating frames have the projector on, or off. Useful for distinguishing ambient lighting from projector lighting.
Second Driver
Before I patched the kernel to add pixel format support I looked for other drivers that might work without modification. I didn’t find any that worked out of the box, but I did find that libuvc was close. It has the additional advantage that it is based on libusb, and thus cross platform by default. The author Ken Tossell helped me out, but telling me what needed to be modified to add the real sense pixel formats. It turned out that wasn’t sufficient as it didn’t have support for two cameras on the same device yet. I decided to put it off until after I got calibration working, and this was fortunate because someone already had that part working. Steve McGuire contacted me on March 26 to let me know that my blog posts had been useful in getting the . I tested it today on Linux, and can confirm that it can produce color, infrared, and depth video. The example app regularly stopped responding so it might not be 100% ready, but it’s just a patch on an already stable driver so anything that needs to be fixed is likely to be minor. Libuvc is a cross platform driver so this probably means there is now open source driver support for MacOS, and Windows now, but I haven’t been able to test this yet.
Three Ways to Calibrate
team was the first to achieve a practical calibration on Linux. Their methodology was to write an
for the windows SDK that copied all of the mappings between color, and depth to a file, and then a
for Linux to use that lookup table. They first reported success on January 27. Their software was released on March 22.
I used a Robot Operating System (ROS)
to calibrate using a checkerboard pattern. My first success was March 23.
Steve McGuire wrote that he used a tool called
Amazon Picking Challenge
Both the Dorabot team, Steve McGuire’s team
as well as one of my crowdfunding sponsors are participants. In all cases the Intel Real Sense F200 was picked because it functions at a closer range than other depth cameras. The challenge takes place at the
in Seattle Washington May 26-30. Maybe I’ll have a chance to go to Seattle, and meet them.
Robot Operating System ()
I converted my demo program depthview to be part of the ROS build system both so that I could use the ROS camera_calibration tool, and the rviz visualization tool to view generated point clouds. In the process I learned a lot about ROS, and decided it was an ideal first target platform for a Linux and MacOS SDK. The core of ROS is glue between various open source tools that are useful with robotics. Often these tools were started years ago separately, and don’t have compatible data formats. Besides providing common formats, and conversion tools it provides ways to visualize data on a running system. Parts of a running ROS system can be in the same process, separate executables, or even on different machines connected via a network. The modular approach with standard data formats is ideal for rapid prototyping. People can simultaneously work together, and autonomously without concern for breaking other peoples projects. It is also better documented than most of the individual packages that it glues together. They have their own Q and A system that acts a lot like stack overflow that has had an answer for ready for almost all of my problems. The info page for every package is a wiki so anyone can easily improve documentation. The biggest bonus is that many of the people doing research in computer vision are doing it for robotics projects using ROS.
It took me a long time to get the build system working right. There are many things to learn that have nothing to do with depth image processing. Integrating modules is easiest via networking, but that has a high serialization deserialization overhead. There are options for moving data between modules without network, or even copying overhead, but they take longer to learn. The networking has neither encryption, nor authentication support. If you are using that feature across a network you will need to do some combination of firewall, and VPN, or IPsec.
Point Clouds
Pretty much every 3d imaging device with an open source driver is supported by ROS, and that provided a lot of example code for getting a point cloud display from the Intel Real Sense 3d camera. ROS is a pubsub system. From depthview I had to publish images, and camera calibration information. Processing a depth image into a point cloud typically has two steps, getting rid of camera distortion commonly called rectifying, and projecting the points from the 2d image into 3 dimensions. Using a point cloud with rviz requires an extra step of publishing a 3d transformation between world, or map space, and the coordinate frame of the camera. There are modules for all of these things which can be started separately as separate commands in any order, or they can be started with a launch file. The first time I started up this process I skipped the rectification step to save time, and because the infrared video has no noticeable distortion even when holding a straight edge to the screen. I imagine it’s possible that the processor on the camera is rectifying the image as part of the range finding pipeline, and so that step is actually redundant. When writing my first launch file I added the rectifying step, and it started severly changing the range of many points leading to a visible cone shape from the origin to the correctly positioned points. My guess is that the image rectifying module can’t handle the raw integer depth values, and I need to convert them to calibrated floats first.
Exporting Point Clouds
ROS has a tool for recording an arbitrary list of published data into “bag” files. To get a scan of my head I started recording, and tried to get an image of my head from every angle. Then I dumped all the point clouds with “rosrun pcl_ros bag_to_pcd”. It was surprisingly difficult to find any working tool to convert from .pcd to a 3d format that any other tool accepted. An internet search found 5 years of forum posts of frustrated people. There happened to also be an ascii version of the .pcd file format, and it looks fairly similar to the .obj file format. I converted all the files with a bash one liner.
for i in *. do pcl_convert_pcd_ascii_binary "$i" "${i%.*}.ascii" 0; grep -v nan "${i%.*}.ascii" | sed -e '1,11d;s/^/v /' & "${i%.*}.obj" ; rm "${i%.*}.ascii" ; done
Converting Point Clouds to Meshes
This step is much harder. The only decent open source tool I’ve found is called MeshLab. It’s really designed to work with meshes more than point clouds, but there don’t seem to be any good user friendly tools for point clouds specifically. It displays, and manipulates point clouds so fast that I suspect there is no inherent reason why rviz was at max getting 15 FPS. MeshLab was getting FPS in the hundreds with the same data. I had to go through a bunch of tutorials to do anything useful, but it’s workflow is okay once you get past the learning step with one major exception. It crashes a lot. Usually 5 minutes or more into processing data with no save points. There was lots of shaking of fists in the air in frustration. Building a 3d model from point clouds requires aligning clouds from multiple depth frames precisely, and connecting the points into triangles. The solver for aligning clouds needs human help to get a rough match, and the surface finder is the part that crashes.
At the moment I don’t have a good solution, but I think ROS has the needed tools. It has multiple implementations of a tool for stitching 3d maps together called Simultaneous Location And Mapping (SLAM). It could even use an accelerometer attached to the camera to help stitch point clouds together.
has functions to do all the needed things with point clouds it just needs a user friendly interface.
Planned Features
There are three primary features that I want to achieve with an open source depth camera sdk. The obvious 3d scanning part is started. Face tracking should be easy because
already does that. Gesture support is the hardest. There is an open source tool called Skeltrack that can find joints in human from depth images which could be used as a starting point. The founder of Skeltrack Joaquim Rocha works at CERN, and has offered to help getting it to work on the real sense camera.
Next Steps
One downside of prototyping at the same time as figuring out how something works is that it leads to technical debt at an above average pace. Since knowing how to do things right is dependent on having a working prototype refactoring is around 50% of all coding time. The primary fix that needs to be done here is setting up conditional compilation so that features with major dependencies can be disabled at compile time. This will let me merge the main branches back together. The driver interface should be split off from Qt. It should interface to both working drivers so programmers don’t have to pick which one to support. It needs to pick the right calibration file based on camera serial number as part of multiple camera support. There needs to be a calibration tool that doesn’t require ROS because ROS hasn’t been ported to Windows. The calibration tool I used was just using two features of OpenCV so that should be easy.
I’ve been posting minor updates on my crowdfunding page. If you sponsor my project for any amount of money you’ll get regular updates directly to your inbox.
Business opportunities
The companies contacting me include one of the biggest companies that exist, and one of the most valued startups ever. The interest has convinced me that I'm working on something important. Most of them want me to sign an NDA before I find out what the deal is. I can't currently afford to pay a lawyer to tell me if it's a good deal or not. One of the companies even wants it to be a secret that they have talked to me at all. I would rather keep a secret because I want to maintain a good relationship than because of the force of law punishing me. I don't work well with negative reinforcement. Work I have done in the past that was under NDA has made it difficult to get work because I had nothing to show for it. This project on the other hand is very open, and it has resulted in people calling me regularly to see if I can be a part of their cool startup. That has never happened before. The one with the craziest idea for something that has never been done before will probably win. In the mean time I'll be sleeping on my dad's living room couch where the cost of living is low unless people crowd fund me enough money to afford first months rent.
Intel RealSense 3D Camera Management Software
This package provides the software for the Intel RealSense 3D Camera Management and is supported on the Inspiron 2350 running the Windows 8.1 64-bit operating system.
Fixes: -Fixed path to executable is no under quotes on serivices.Enhancements: -Enhance system security.
文件格式:Update Package for Microsoft(R) Windows(R)
格式说明Microsoft Windows 32位格式中的戴尔更新包(DUP)设计为可以在Microsoft Windows 64位操作系统上运行。Microsoft Windows 64位格式中的戴尔更新包(DUP)仅可在Microsoft Windows 64位操作系统上运行。选择设备驱动程序更新时,请务必选择适合您的操作系统的更新。
开始下载软件,即表示您接受 ( 英文版 ) 条款。
Windows 8.1, 64-bit
Intel RealSense 3D Camera Management
Dell Update Package (DUP) InstructionsDownload1. Click Download File to download the file.2. When the File Download window appears, click Save to save the file to your hard drive.Installation1. Browse to the location where you downloaded the file and double-click the new file.2. Read over the release information presented in the dialog window.3. Download and install any prerequisites identified in the dialog window before proceeding.4. Click the Install button.5. Follow the remaining prompts to perform the update.
